Thread overview
Changes in the D2 design to help the GC?
Jul 15, 2009
bearophile
Jul 16, 2009
KennyTM~
Jul 16, 2009
Lutger
Jul 16, 2009
Stewart Gordon
Jul 17, 2009
bearophile
Jul 21, 2009
Iivari Mokelainen
July 15, 2009
In Java the GC is able to collect garbage very quickly, so people in Java allocate many small objects quite often.
In functional-style languages, like Scala, Clojure, F#, etc, most data is immutable, so again the GC has lot of pressure in allocating and freeing many small structures all the time.

D2 syntax allows both styles of programming (you can program in D almost as Java, if you want), but if you follow one of those two styles of programming you will see that the current D GC is much less efficient, and leads to low performance, compared to Java/F#. (Scoped classes are not enough).

I am not expert of GCs yet, but I'm certain there are ways to improve the current situation. Beside improving the GC itself, there can be ways to modify a bit the current design of D2 to help the design of a more efficient GC. Do you have ideas?

Time ago I have suggested to split the D pointers in two types, the GC-managed ones and the ones that work on the C heap, that the GC never touches. The type system can assure they never get mixed by mistake. Now I think (just an idea) the type of GC-managed pointers can be split in two types: the ones that are fully managed by a moving GC (see below) and the ones managed by a conservative GC, such memory is pinned, and the GC doesn't move it around. The type system will assure such three groups doesn't mix unless the programmer is really determined to mix them :-)

A simple idea of mine to improve the GC (not to change the D2 language yet) is to split the D GC in two parts, one is a moving one, that acts like a Java-style GC, especially useful in SafeD code, such GC will become the one used in OOP/functional-style code, probably it is the GC that will be used in most of the code of most D programs. A second part of the GC acts in a conservative way, like the current GC, it's safer. The second part of the GC manages "pinned" blocks of memory, that can't be moved, such memory is usually the one managed in lower level D modules, by user-written collections, etc. The performance of this second part of the GC will be lower (like the current one), but most data will not be managed by it anyway.

When you use LDC the slow GC is one of the few parts of D language that have low performance still (the other two part are that currently D isn't able to inline closures and virtual methods. Such things too will eventually need to be addressed if D wants to become high-performance. I can leave such topic to other posts/threads).

Bye,
bearophile
July 16, 2009
bearophile wrote:
> In Java the GC is able to collect garbage very quickly, so people in Java allocate many small objects quite often.
> In functional-style languages, like Scala, Clojure, F#, etc, most data is immutable, so again the GC has lot of pressure in allocating and freeing many small structures all the time.
> 
> D2 syntax allows both styles of programming (you can program in D almost as Java, if you want), but if you follow one of those two styles of programming you will see that the current D GC is much less efficient, and leads to low performance, compared to Java/F#. (Scoped classes are not enough).
> 
> I am not expert of GCs yet, but I'm certain there are ways to improve the current situation. Beside improving the GC itself, there can be ways to modify a bit the current design of D2 to help the design of a more efficient GC. Do you have ideas?
> 
> Time ago I have suggested to split the D pointers in two types, the GC-managed ones and the ones that work on the C heap, that the GC never touches. The type system can assure they never get mixed by mistake. Now I think (just an idea) the type of GC-managed pointers can be split in two types: the ones that are fully managed by a moving GC (see below) and the ones managed by a conservative GC, such memory is pinned, and the GC doesn't move it around. The type system will assure such three groups doesn't mix unless the programmer is really determined to mix them :-)
> 

No way, 3 kinds of constness is confusing enough.

> A simple idea of mine to improve the GC (not to change the D2 language yet) is to split the D GC in two parts, one is a moving one, that acts like a Java-style GC, especially useful in SafeD code, such GC will become the one used in OOP/functional-style code, probably it is the GC that will be used in most of the code of most D programs. A second part of the GC acts in a conservative way, like the current GC, it's safer. The second part of the GC manages "pinned" blocks of memory, that can't be moved, such memory is usually the one managed in lower level D modules, by user-written collections, etc. The performance of this second part of the GC will be lower (like the current one), but most data will not be managed by it anyway.
> 
> When you use LDC the slow GC is one of the few parts of D language that have low performance still (the other two part are that currently D isn't able to inline closures and virtual methods. Such things too will eventually need to be addressed if D wants to become high-performance. I can leave such topic to other posts/threads).
> 
> Bye,
> bearophile
July 16, 2009
I'm worried too about this, but haven't a clue as to what is needed to overcome the performance gap. I don't think extending the type system in a major way for some extra performance is worth it. Still, there may be some ways to make less drastic adjustments so that a (more) precise GC can be built. Or to put it another way: to not make a high performance GC for D impossible in the future.




July 16, 2009
bearophile wrote:
<snip>
> Time ago I have suggested to split the D pointers in two types, the GC-managed ones and the ones that work on the C heap, that the GC never touches. The type system can assure they never get mixed by mistake.

I can imagine this making interfacing external APIs a pain in the rear end....

> Now I think (just an idea) the type of GC-managed pointers
> can be split in two types: the ones that are fully managed by a
> moving GC (see below) and the ones managed by a conservative GC, such
> memory is pinned, and the GC doesn't move it around. The type system
> will assure such three groups doesn't mix unless the programmer is
> really determined to mix them :-)
<snip>

I'm not sure that having two separate, independent GCs will work.  But having two GC heaps along these lines might.

One way I can see is having an "immovable" type modifier in line with const and invariant.  Anything that isn't allocated as immovable, the GC may move around if it's clever enough.  But an immovable reference could just as well be implicitly convertible to a non-immovable reference - the GC'll know which heap it points into.

Immovable might be useful for interfacing external APIs.  We could also spec that only immovable pointer/reference types may be used in a union.

BTW even D1 needs some work in the area of moving GC:
http://d.puremagic.com/issues/show_bug.cgi?id=679

Stewart.
July 17, 2009
Can D steal the future GC of Mono?
http://mono-project.com/Compacting_GC
http://www.go-mono.com/meeting06/mono-sgen.pdf
It manages pinned objects too, but it will be tuned for few of them, while in D they are probably a bit more common

Bye,
bearophile
July 21, 2009
bearophile wrote:
> In Java the GC is able to collect garbage very quickly, so people in Java allocate many small objects quite often.
> In functional-style languages, like Scala, Clojure, F#, etc, most data is immutable, so again the GC has lot of pressure in allocating and freeing many small structures all the time.
> 
> D2 syntax allows both styles of programming (you can program in D almost as Java, if you want), but if you follow one of those two styles of programming you will see that the current D GC is much less efficient, and leads to low performance, compared to Java/F#. (Scoped classes are not enough).
> 
> I am not expert of GCs yet, but I'm certain there are ways to improve the current situation. Beside improving the GC itself, there can be ways to modify a bit the current design of D2 to help the design of a more efficient GC. Do you have ideas?
> 
> Time ago I have suggested to split the D pointers in two types, the GC-managed ones and the ones that work on the C heap, that the GC never touches. The type system can assure they never get mixed by mistake. Now I think (just an idea) the type of GC-managed pointers can be split in two types: the ones that are fully managed by a moving GC (see below) and the ones managed by a conservative GC, such memory is pinned, and the GC doesn't move it around. The type system will assure such three groups doesn't mix unless the programmer is really determined to mix them :-)
> 
> A simple idea of mine to improve the GC (not to change the D2 language yet) is to split the D GC in two parts, one is a moving one, that acts like a Java-style GC, especially useful in SafeD code, such GC will become the one used in OOP/functional-style code, probably it is the GC that will be used in most of the code of most D programs. A second part of the GC acts in a conservative way, like the current GC, it's safer. The second part of the GC manages "pinned" blocks of memory, that can't be moved, such memory is usually the one managed in lower level D modules, by user-written collections, etc. The performance of this second part of the GC will be lower (like the current one), but most data will not be managed by it anyway.
> 
> When you use LDC the slow GC is one of the few parts of D language that have low performance still (the other two part are that currently D isn't able to inline closures and virtual methods. Such things too will eventually need to be addressed if D wants to become high-performance. I can leave such topic to other posts/threads).
> 
> Bye,
> bearophile

C# has a 'fixed' keyword, which assures that the variable in the fixed scope wont be moved by the GC. Such variables can be native pointers used for interpo'ing with OS or working fast with arrays (generating bitmaps in memory).

But i dont think that is a viable solution - it's too big. Two GC's? no-no.