Thread overview
IDEA: invariant-related gc hints
Jul 24, 2008
Russell Lewis
Jul 24, 2008
BCS
Jul 24, 2008
bearophile
Jul 25, 2008
Marianne Gagnon
July 24, 2008
Say that you create a new variable on the heap.  You fill it up with data, then cast it to invariant because you know that the contents will never change again.

That object might become garbage, of course, but you know that nothing it points to can possibly become garbage until the root object (the one that you cast to invariant) becomes garbage.

It seems to me that this information would be useful to the GC.  Just brainstorming here...but certainly, we could say, "until object X is garbage collected, objects A,B,C,D should all get marked 'live' in every GC sweep."

Moreover, the GC knows that all accessible data is also invariant, at least until the root reference goes away, so none of those objects need to be re-swept during a mark-and-sweep pass.

This sort of knowledge might really improve performance on programs that had a large set of invariant data.

Other ideas?
July 24, 2008
Reply to Russell,

> Say that you create a new variable on the heap.  You fill it up with
> data, then cast it to invariant because you know that the contents
> will never change again.
> 
> That object might become garbage, of course, but you know that nothing
> it points to can possibly become garbage until the root object (the
> one that you cast to invariant) becomes garbage.
> 
> It seems to me that this information would be useful to the GC.  Just
> brainstorming here...but certainly, we could say, "until object X is
> garbage collected, objects A,B,C,D should all get marked 'live' in
> every GC sweep."
> 
> Moreover, the GC knows that all accessible data is also invariant, at
> least until the root reference goes away, so none of those objects
> need to be re-swept during a mark-and-sweep pass.
> 
> This sort of knowledge might really improve performance on programs
> that had a large set of invariant data.
> 
> Other ideas?
> 

the easiest way to effect this might be to put all the allocations in the same memory section. If you know you wont drop anything, a sequential allocator (all memory allocations will be adjacent) might do the trick

OTOH if the root becomes garbage but the rets is not you get a rather large block of garbage hanging around.


July 24, 2008
Russell Lewis:
> Other ideas?

When Java was "born" for applets it was really slow, but years later it's has become almost as fast as normally compiled languages, for many purposes fast enough. One of the things that has allowed it to become this fast (beside lot of money, I presume) is a good enough GC (maybe not perfect yet, and not good for everything, but good enough for many purposes).

My knowledge of the detailed internals of a modern GC is limited still (it seems the more I learn the less I seem to know, this newsgroup is good for humility), but I have seen that they are quite refined and complex, and smart enough. From various benchmarks I have seen that the Phobos GC uses 2-12 times less RAM than the Java GC (often 5-8 times less), but I think the HotSpot GC often leads to faster performance of the code compared to the Phobos GC, sometimes much faster.

People have told that beside being (possibly) less refined and complex than the HotSpot GC, the Phobos one is designed for a lower level language, that allows unions and raw pointers too, etc. Such limitations of Java instead of making the Java code slower may instead allow the GC to work at a higher level, allowing faster code overall (here I am talking about heavy-OOP kind of code, of course).

So can't a way be invented to tell the D compiler that some parts of D code are restricted in Java-like capabilities (no C-style pointer management), so the GC can work on them (if they contain OOP) at that higher level with hopefully higher performance on very-OOP code (this may requires a quite more complex GC)? I don't know if this can be done, because you can't make such parts of code transitively safe. This is another possible advantage of the "safe D" Walter was talking about.

(Sorry for any silly thing I may have said).

Bye,
bearophile
July 25, 2008
> So can't a way be invented to tell the D compiler that some parts of D code are restricted in Java-like capabilities (no C-style pointer management), so the GC can work on them (if they contain OOP) at that higher level with hopefully higher performance on very-OOP code (this may requires a quite more complex GC)? I don't know if this can be done, because you can't make such parts of code transitively safe. This is another possible advantage of the "safe D" Walter was talking about.
> 

I think that's a little stretched,  enabling sections with less features is not how i imagine a clean and well-designed language
July 25, 2008
"bearophile" <bearophileHUGS@lycos.com> wrote in message news:g6b4h3$1748$1@digitalmars.com...

> So can't a way be invented to tell the D compiler that some parts of D code are restricted in Java-like capabilities (no C-style pointer management), so the GC can work on them (if they contain OOP) at that higher level with hopefully higher performance on very-OOP code (this may requires a quite more complex GC)? I don't know if this can be done, because you can't make such parts of code transitively safe. This is another possible advantage of the "safe D" Walter was talking about.

Hmmmm.  If only....... oh!  wait, that's right.

http://www.digitalmars.com/d/2.0/safed.html

"When you enter SafeD, you leave your pointers, unchecked casts and unions at the door. Memory management is provided to you courtesy of Garbage Collection. Class objects are passed around using opaque handles. Arrays and strings are bound-checked (it's possible to turn the checks off with a compiler switch, but then you're no longer in SafeD). You may still write code that will throw a runtime exception (e.g., array out-of-bounds error, or uninitialized-class-object error), but you won't be able to overwrite memory that hasn't been allocated to you or that has already been recycled."

The only issue then is slicing.  Slicing still allows you to point into the middle of memory blocks, which slows down the GC.  Unless, of course, a slice were to have three pointers -- one to the beginning of the memory block, one to the beginning of the slice, and one to the end of the slice. Ahhh, now it's all making sense why W wants to separate slices from normal arrays ;)