May 12, 2014
On 05/11/2014 08:18 PM, Rainer Schuetze wrote:
>
> 1. Use a scheme that takes a snapshot of the heap, stack and registers
> at the moment of collection and do the actual collection in another
> thread/process while the application can continue to run. This is the
> way Leandro Lucarellas concurrent GC works
> (http://dconf.org/2013/talks/lucarella.html), but it relies on "fork"
> that doesn't exist on every OS/architecture. A manual copy of the memory
> won't scale to very large memory, though it might be compressed to
> possible pointers. Worst case it will need twice as much memory as the
> current heap.

There is a problem with this scheme, copy-on-write is extremely expensive when a mutation happens. That's one page fault (context switch) + copying a whole page + mapping the new page. It's much worse with huge pages (2MB page size).
May 12, 2014
On Monday, 12 May 2014 at 19:13:50 UTC, Ola Fosheim Grøstad wrote:
> On Monday, 12 May 2014 at 18:07:51 UTC, Kapps wrote:
>> Depending on how tunable the GC is, I feel like it should be
>> possible to get away with a GC even for soft real-time programs
>> like games.
>
> Even if you manage to make it work for game clients you also should make it work for low latency game servers, as code sharing is an important advantage.
>
> What a game/world server requires differs a lot, but highly dynamic and flexible worlds have to keep the physics to a single node (or tight cluster) for a region. That means you want to have as many players as possible tied to that node.
>
> In essence you want both performance, low latency, reliability, and little overhead in an evolutionary context (it has to support heavy modification over time).
>
> My gut feeling is that a runtime satisfying one game design will not satisfy another one as long as one insists on one global GC. In essence, it will never really work well. IMO, the same goes for ARC since RC does not perform well with multi-threading even when you use near optimal patterns and strategies. If ARC is only to be used where speed does not matter then you might as well use shared_ptr.

.NET allows configuring the garbage collector by specifying
workstation (concurrent, background [allow generation 0/1
collection while a generation 2 collection is going], one primary
heap and a large object heap) or server (not certain if
concurrent/background, but multiple heaps that get handled in
parallel during collections). Or in situations where you have
many processes running at once, disabling concurrent collection
to reduce context switching overhead. In reality, most people
leave the default concurrent collector, which is what I'd hope
the default for D would be, but if it was sufficiently tunable
something like vibe.d could decide to go with something more
similar to what .NET uses for servers (which ASP.NET uses by
default).

I haven't been able to find good concrete numbers online, but the
few sources I've found say that generation 0/1 collection tends
to take <1 to 2-3 milliseconds and is not run concurrently
because it's so short. This is quite sufficient for most
projects, but perhaps could be tweaked a bit more for certain
aspects like gaming, possibly even enabling concurrent collection
for generation 0/1, but I'm not sure if this works well or is
feasible. Still, the important thing is to get a good general one
to use first, like the default one .NET uses for workstation
applications.
May 12, 2014
On Monday, 12 May 2014 at 22:27:06 UTC, Kapps wrote:
> because it's so short. This is quite sufficient for most
> projects, but perhaps could be tweaked a bit more for certain
> aspects like gaming, possibly even enabling concurrent collection
> for generation 0/1, but I'm not sure if this works well or is
> feasible. Still, the important thing is to get a good general one
> to use first, like the default one .NET uses for workstation
> applications.

I agree that getting a good (100% precise) GC is an important first step.  I am not so sure about generation based GC when you have a window on a world map that you move around which roughly is FIFO (first in, first out).

But to get good speed I think you are better off having multiple pools that can be released with no collection when a network-connection drops (if you have one conceptual pool per connection), and optimized allocators that give you pre-initialized objects etc.

In the ideal world all of this is transparent once you have specified your memory model (in detail), so you only have to issue a "new PlayerConnection" in the main logic of your program and can tweak the memory handling elsewhere. That is not the D way, from what I can tell from the forum posts so far, because "new" is going to stay tied to one global GC heap. So you have to write utility functions… which makes programs less legible.
May 12, 2014
On 5/12/2014 2:32 PM, Steven Schveighoffer wrote:
>> It's still forbidden. Andrei wrote a template that will verify this at
>> runtime, but I don't recall its name.
>
> Can you cite the spec where it says it's forbidden? Forgotten templates are not
> a convincing argument.
>
> Regardless, Java can use a moving GC, and allows self references. The idea that
> self references prevent a moving GC is simply false. If you think about it a
> bit, you will understand why.


I see this is not specified in the documentation. Not sure what happened here, but I'll have to think about it.
May 13, 2014
On 5/12/2014 2:28 PM, Xavier Bigand wrote:
> All compile time things of D are marvelous.
> This with the compile time and the language less error prone make me want D.
> I am not sure I need safety so much. It's nice but not mandatory for any of my
> projects. The only one which has to be safe is DQuick.

Safety becomes a big concern when you're developing code as part of a team.

May 13, 2014

On 12.05.2014 13:53, "Marc Schütz" <schuetzm@gmx.net>" wrote:
>
> I'm surprised that you didn't include:
>
> 3. Thread-local GC, isolated zones (restricting where references to
> objects of a particular heap can be placed), exempting certain threads
> from GC completely, ...

This comes up from time to time, but to me it is very blurry how this can work in reality.

Considering how "shared" is supposed to be used to be useful (do some locking, then cast away "shared") there is no guarantee by the language that any object is actually thread local (no references from other threads). Working with immutable (e.g. strings) is shared by design.
May 13, 2014

On 13.05.2014 00:15, Martin Nowak wrote:
> On 05/11/2014 08:18 PM, Rainer Schuetze wrote:
>>
>> 1. Use a scheme that takes a snapshot of the heap, stack and registers
>> at the moment of collection and do the actual collection in another
>> thread/process while the application can continue to run. This is the
>> way Leandro Lucarellas concurrent GC works
>> (http://dconf.org/2013/talks/lucarella.html), but it relies on "fork"
>> that doesn't exist on every OS/architecture. A manual copy of the memory
>> won't scale to very large memory, though it might be compressed to
>> possible pointers. Worst case it will need twice as much memory as the
>> current heap.
>
> There is a problem with this scheme, copy-on-write is extremely
> expensive when a mutation happens. That's one page fault (context
> switch) + copying a whole page + mapping the new page.

I agree that this might be critical, but it is a one time cost per page. It seems unrealistic to do this with user mode exceptions, but the OS should have this optimized pretty well.

> It's much worse
> with huge pages (2MB page size).

How common are huge pages nowadays?
May 13, 2014
On Tuesday, 13 May 2014 at 06:12:46 UTC, Rainer Schuetze wrote:
>
>
> On 13.05.2014 00:15, Martin Nowak wrote:

>> There is a problem with this scheme, copy-on-write is extremely
>> expensive when a mutation happens. That's one page fault (context
>> switch) + copying a whole page + mapping the new page.
>
> I agree that this might be critical, but it is a one time cost per page. It seems unrealistic to do this with user mode exceptions, but the OS should have this optimized pretty well.

As I pointed out this won't help dynamic games that easily can touch 50000 pages per frame if you use a single global allocator. 2000 cycles * 50K = 100K = frame drop. What's worse, if you are low on memory you will start to swap to disk (or compress pages).

So that means you have to optimize for collections, use dedicated allocators that keep dynamic data on the same pages etc... Basically you get the disadvantage of manual memory management and in the worst case the memory requirements of a copying GC without the benefits...
May 13, 2014
On Tuesday, 13 May 2014 at 06:06:40 UTC, Rainer Schuetze wrote:
>
>
> On 12.05.2014 13:53, "Marc Schütz" <schuetzm@gmx.net>" wrote:
>>
>> I'm surprised that you didn't include:
>>
>> 3. Thread-local GC, isolated zones (restricting where references to
>> objects of a particular heap can be placed), exempting certain threads
>> from GC completely, ...
>
> This comes up from time to time, but to me it is very blurry how this can work in reality.
>
> Considering how "shared" is supposed to be used to be useful (do some locking, then cast away "shared") there is no guarantee by the language that any object is actually thread local (no references from other threads). Working with immutable (e.g. strings) is shared by design.

Yes, but only a part of the data is shared. I suspect the majority of the data in typical programs will be thread-local. If you use a message passing model, you can improve that even further (though it requires a way to move an object to another thread's heap). This way, you can - in the best case - avoid the shared heap completely.
May 13, 2014
On Tuesday, 13 May 2014 at 06:06:40 UTC, Rainer Schuetze wrote:
>
> This comes up from time to time, but to me it is very blurry how this can work in reality.
>
The paper I linked on Friday [0] presents a collector like this.  Are there concerns I've missed that make that not applicable?

> Considering how "shared" is supposed to be used to be useful (do some locking, then cast away "shared") there is no guarantee by the language that any object is actually thread local (no references from other threads). Working with immutable (e.g. strings) is shared by design.

I'm not seeing much in the documentation, but from what I can tell (per the FAQ), shared in D just guarantees it's on the global heap?

-Wyatt

[0] https://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/local-gc.pdf