Jump to page: 1 27  
Page
Thread overview
GC, the simple solution
Jun 04, 2006
Frank Benoit
Jun 04, 2006
Daniel Keep
Jun 04, 2006
Frank Benoit
Jun 05, 2006
Daniel Keep
Jun 05, 2006
Lars Ivar Igesund
Jun 05, 2006
Larry Evans
Jun 08, 2006
Daniel Keep
Jun 04, 2006
Lionello Lunesu
Jun 04, 2006
Frank Benoit
Jun 04, 2006
Lionello Lunesu
Jun 04, 2006
Frank Benoit
Jun 04, 2006
Lars Ivar Igesund
Jun 05, 2006
Frank Benoit
Jun 05, 2006
Lars Ivar Igesund
Jun 05, 2006
Sean Kelly
Jun 04, 2006
Walter Bright
Jun 05, 2006
Frank Benoit
Jun 05, 2006
Daniel Keep
Jun 05, 2006
Lionello Lunesu
Jun 05, 2006
Daniel Keep
Jun 05, 2006
Frank Benoit
Jun 05, 2006
Walter Bright
Jun 05, 2006
Frank Benoit
Jun 05, 2006
Walter Bright
Jun 07, 2006
Kevin Bealer
Jun 07, 2006
Bruno Medeiros
Jun 07, 2006
Sean Kelly
Jun 10, 2006
Bruno Medeiros
Jun 10, 2006
Sean Kelly
Jun 12, 2006
Bruno Medeiros
Jun 14, 2006
Sean Kelly
Jun 14, 2006
Daniel Keep
Jun 14, 2006
Bruno Medeiros
Jun 14, 2006
Sean Kelly
Jun 14, 2006
Lionello Lunesu
Jun 14, 2006
Sean Kelly
Jun 15, 2006
Lionello Lunesu
Jun 15, 2006
Sean Kelly
Jun 16, 2006
Lionello Lunesu
Jun 16, 2006
Sean Kelly
Jun 16, 2006
Don Clugston
Jun 16, 2006
Sean Kelly
Jun 17, 2006
Bruno Medeiros
Jun 17, 2006
Sean Kelly
Jun 05, 2006
Sean Kelly
Jun 07, 2006
Jeff Parsons
Jun 08, 2006
Sean Kelly
Jun 09, 2006
Larry Evans
Jun 09, 2006
Walter Bright
Jun 09, 2006
Sean Kelly
Jun 09, 2006
Walter Bright
Jun 09, 2006
Derek Parnell
Jun 09, 2006
Sean Kelly
Jun 09, 2006
Walter Bright
Jun 09, 2006
Sean Kelly
Jun 09, 2006
Walter Bright
Jun 09, 2006
David Gileadi
Jun 09, 2006
John Reimer
Jun 09, 2006
Dave
Jun 12, 2006
Chad J
Jun 04, 2006
renox
June 04, 2006
Another garbage collector thread :)

The best garbage collector I can think about, is a reference counting one. To implement this:

* Every object needs a reference counter field.
* Code using references, need to have code for incrementing/decrementing
the counters.
* Destructors are called immediatly if the counter goes zero.

What you get is a precise, incremental, realtime garbage collector.
* all logic behaviour is deterministic
* Member objects are still valid, while a destructor is called. => no
more dispose/close patterns
* implement games and realtime apps without worry about lags
* implement secure application without worry about GC attack by valid
refence data values.
* consume only the needed memory
* no more need for the auto object

June 04, 2006
"GC, the simple solution"

No such thing.

Summary:

* Ref counting by default: good grief no!
* Ref counting as an option: good grief yes!
* Is there a perfect solution for memory management: don't be ridiculous.

Frank Benoit wrote:
> Another garbage collector thread :)
> 
> The best garbage collector I can think about, is a reference counting one. To implement this:
> 
> * Every object needs a reference counter field.
> * Code using references, need to have code for incrementing/decrementing
> the counters.
> * Destructors are called immediatly if the counter goes zero.
> 
> What you get is a precise, incremental, realtime garbage collector.
> * all logic behaviour is deterministic
> * Member objects are still valid, while a destructor is called. => no
> more dispose/close patterns
> * implement games and realtime apps without worry about lags
> * implement secure application without worry about GC attack by valid
> refence data values.
> * consume only the needed memory
> * no more need for the auto object

What you also get that you might not want:

* because references are on the object, you can't have array slices
* it's also incompatible with pointers: if the pointer is to a member of
an object, how on earth do you update the refcount then?
* slower in the immediate sense (incref & decref calls, plus it'll mean
less code in L1 and L2 CPU cache, which can kill performance in some
architectures)
* requires all objects to have destructors to decref child objects
(implicit or otherwise)
* without auto, makes it impossible to ensure an object's destruction
when you leave scope: you can leak references
* inability to free cycles
* inability to even *run* destructors where cycles are involved, because
there is no order in which they can logically be executed and still be valid

Those last two are the main problem with reference counting, along with the others and slightly added complexity in the compiler.

Python, however, does not suffer the cycle problem.  Know why?

Because they have a *mark & sweep* garbage collector to clean up the cycles.

Seriously, I don't want to see the current GC replaced by a reference counting scheme.  The current system is just fine for the majority of applications.  Where it falls down are real-time systems which can't allow for the non-deterministic GC pauses, and where you want to keep your memory profile to a minimum.  Anything that has pauses of activity can just run a background GC thread to keep the memory footprint down.

To this end, I think implementing an incremental or concurrent GC should be the next port of call for optimising D's memory management.

As it stands, you can implement reference counting yourself: you just need to write an allocator, and call the incref and decref functions manually.  However, I do think that having built-in *support* for Reference counting would be fantastic.  The less that stuff like this is left in the hands of the programmer, the better.

What I think we all need to keep in mind is that NO memory management scheme is good for all cases.  There will always be some weird corner case which doesn't work well with our scheme of choice, and that we don't give a rat's about, but is VERY important to someone else.

So far D does automatic GC (the best default, IMHO [1]), manual management, and RAII.  Throw optional ref counting in to the mix, and I think we'll have pretty much 99% of cases covered.

	-- Daniel

[1] The best default since it's unobtrusive, and *safe*: unless you're doing strange things, it's very hard to leave dangling references with a GC.  This allows people to get on with writing the damn program, and worry about managing memory later (if ever).

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
June 04, 2006
> * Destructors are called immediatly if the counter goes zero.

This is not such a good thing! You don't really know who releases the last reference, so you don't really know when the object is deleted. If the object is being referenced from multiple threads, you don't know in what thread the destructor will run.

> * all logic behaviour is deterministic

See above.

> * Member objects are still valid, while a destructor is called. => no more dispose/close patterns

When an object is being destructed, the references it has to other objects are being released. So in the destructor of object B, object A might have released about half its references.

> * consume only the needed memory

What about circular references? They'll never get released, no clean-up.

L.


June 04, 2006
In nearly every other GC implementation you will need also a read or write barrier. With reference counting you have a write barriere.

Yes, the unprecise, non-incremental, "Stop-the-world", non-concurrent, non-generational, conservative Phobos GC implementation is an exception.

But a write barrier alone doesn't make ref counting ridiculous. Well you are right, it is not the solution for all and everything, but I think a big step in the right direction. And mark&sweep GC should be the option.

> * because references are on the object, you can't have array slices

Don't understand that point.

> * it's also incompatible with pointers: if the pointer is to a member of an object, how on earth do you update the refcount then?

In the actual implememtation of the gc, every object's memory range is registered by the gc. With that you can get the objects refcounter address.

> * slower in the immediate sense (incref & decref calls, plus it'll mean less code in L1 and L2 CPU cache, which can kill performance in some architectures)

Performance has two sides. Throughput is one thing, worst latency the
other. We can think about techniques to optimize unecessary
increments/decrements.
BTW all incremental GCs will also need read and/or write barriers, which
will have the same perfomance issues.

> * requires all objects to have destructors to decref child objects
> (implicit or otherwise)

Yes, all objects having child objects. But if you think, that destructor calls are a performance issue, you will get problems with the mark&sweep also.

> * without auto, makes it impossible to ensure an object's destruction when you leave scope: you can leak references

How can that happen, if we have them native? Well ok, e.g. if you store
the reference to a static variable, then it is not freed. But this is
really a programmers bug. But I think that is something an advanced look
at it will find solutions. First we have to decide to go for ref counting :)

> * inability to free cycles
> * inability to even *run* destructors where cycles are involved, because
> there is no order in which they can logically be executed and still be valid

Right, here a manual ref=null is needed. And perhaps a mark&sweep algorithm to detect such cycles in the debug version, called on request. So if you know you program, you can avoid cycles, or you can break them before releasing the last reference.

Doing a realtime GC seems to be not possible. The existing "realtime GC" seems to me, incremental ones, which stop the world for a short limited time. And that time goes into your worst latency. So if I want to do things with a latency less 100us, there is no chance to implement a GC. Ref counting whould be a solution, but this can only work if the whole program does not depend on "Stop-the-world".


> As it stands, you can implement reference counting yourself: you just need to write an allocator, and call the incref and decref functions manually.  However, I do think that having built-in *support* for Reference counting would be fantastic.  The less that stuff like this is left in the hands of the programmer, the better.
No, i can't.
* The dereference operator cannot be overwritten.
* Existing classes, native arrays etc. does not work like this.
I do not need a few classes managed with ref counting. I want to avoid
the "Stop the world". And phobos needs it.

Where is the proof that ref counting is so slow? In mark&sweep you have to run over the whole memory. If the memory is low, you have to do that often. This is really slow than.
June 04, 2006
Lionello Lunesu schrieb:
>> * Destructors are called immediatly if the counter goes zero.
> 
> This is not such a good thing! You don't really know who releases the last reference, so you don't really know when the object is deleted. If the object is being referenced from multiple threads, you don't know in what thread the destructor will run.
> 

Where is the disadvantage? If you release concurrent in different
threads you need some kind of syncronization. Than you know it. But this
is no issue of ref counting.
If you release a reference in the phobos GC, the object is collected
"perhaps" the next time. If there are no valid integer values,
"pointing"to your object. And after collecting, the sequence of
destructor calls is not defined.

>> * all logic behaviour is deterministic
> 
> See above.
See above.

> 
>> * Member objects are still valid, while a destructor is called. => no more dispose/close patterns
> 
> When an object is being destructed, the references it has to other objects are being released. So in the destructor of object B, object A might have released about half its references.
> 
If object B Dtor is called, its member variables are valid, and valid is the object A. The B.~this() can do something with A. After the Dtor is finished, the reference to A is nulled, and A is perhaps also deleted. Seams perfect to me.



>> * consume only the needed memory
> 
> What about circular references? They'll never get released, no clean-up.

Yes, they need a solution. Perhaps a fall back mark&sweep collector, to check for cicles.

> 
> L.
> 
> 
June 04, 2006
"Frank Benoit" <keinfarbton@nospam.xyz> wrote in message news:e5uojm$18ji$1@digitaldaemon.com...
> Lionello Lunesu schrieb:
>>> * Destructors are called immediatly if the counter goes zero.
>>
>> This is not such a good thing! You don't really know who releases the
>> last
>> reference, so you don't really know when the object is deleted. If the
>> object is being referenced from multiple threads, you don't know in what
>> thread the destructor will run.
>>
>
> Where is the disadvantage? If you release concurrent in different
> threads you need some kind of syncronization. Than you know it. But this
> is no issue of ref counting.
> If you release a reference in the phobos GC, the object is collected
> "perhaps" the next time. If there are no valid integer values,
> "pointing"to your object. And after collecting, the sequence of
> destructor calls is not defined.

There are many resources that must be released by the thread by which they were acquired. There's not much synchronisation can do. If the last reference just happens to be released in a thread other than the one that created the object, then the resource will be released in the wrong thread.

As I've understood from Boehm's paper, one advantages of a GC is to fix this issue: a GC collect will call finalizers for the objects that were created in that thread. It's like keeping an object from being destructed until the thread that created the object calls malloc (which is were the GC does its thing).

Anyway, Boehm's paper is a good read. He (she?) also addresses the problems with reference counting.

http://www.hpl.hp.com/personal/Hans_Boehm/gc/

>>> * Member objects are still valid, while a destructor is called. => no more dispose/close patterns
>>
>> When an object is being destructed, the references it has to other
>> objects
>> are being released. So in the destructor of object B, object A might have
>> released about half its references.
>
> If object B Dtor is called, its member variables are valid, and valid is the object A. The B.~this() can do something with A. After the Dtor is finished, the reference to A is nulled, and A is perhaps also deleted. Seams perfect to me.

I meant my example like this: object A has a reference to object B. During the destruction of A, the references it has to other objects are released, causing some objects being destructed, like B. Now B might (indirectly) use data from A, but A's in the middle of destruction. What I mean to tell with all this is that you still have to be carefull with your destructors. Especially when using a strong-pointer template. You can't always tell what state other objects are in in a destructor. They might be half way during destruction, having released some, but not all, of their references.

L.


June 04, 2006
> As I've understood from Boehm's paper, one advantages of a GC is to fix this issue: a GC collect will call finalizers for the objects that were created in that thread. It's like keeping an object from being destructed until the thread that created the object calls malloc

This is not true for the phobos GC. The GC runs in the thread which
calls new or gc.fullCollect.
But the Dtors are not called while other threads are running, because
the GC pauses all other threads. This is called the "Stop-the-world".


> I meant my example like this: object A has a reference to object B. During the destruction of A, the references it has to other objects are released, causing some objects being destructed, like B. Now B might (indirectly) use data from A, but A's in the middle of destruction. What I mean to tell with all this is that you still have to be carefull with your destructors. Especially when using a strong-pointer template. You can't always tell what state other objects are in in a destructor. They might be half way during destruction, having released some, but not all, of their references.
> 

Object A cannot be destroyed, if it is reachable directly or indirectly. That is, the ref counting is for. So if A.~this() is called, nobody else has a reference to A.
June 04, 2006
Frank Benoit wrote:

> Another garbage collector thread :)
> 
> The best garbage collector I can think about, is a reference counting one. To implement this:
> 
> * Every object needs a reference counter field.
> * Code using references, need to have code for incrementing/decrementing
> the counters.
> * Destructors are called immediatly if the counter goes zero.
> 
> What you get is a precise, incremental, realtime garbage collector.
> * all logic behaviour is deterministic
> * Member objects are still valid, while a destructor is called. => no
> more dispose/close patterns
> * implement games and realtime apps without worry about lags
> * implement secure application without worry about GC attack by valid
> refence data values.
> * consume only the needed memory
> * no more need for the auto object

Not entirely true in the general case, Frank ;)

There are always techniques to get around problems with an algorithm, depending on what you want to achieve, but in general, a reference counted GC is said to have these _negative_ properties: (paraphrased from Garbage Collection: Algorithms for Automatic Dynamic Memory Management by Jones and Lins)

* The cost of removing the last pointer to an object is unbounded
* Even if the general latency is good, the total overhead of adjusting
references is significantly greater than that of a tracing GC
* It has a substantial space overhead, making it less useful for smaller
heaps
* Cyclic references (already mentioned)

I think these makes RC less suited for the general case, whereas it might fit very well in a tightly controlled system where low latency is required (which I'm well aware that you need ;).

IMO, the best general GC (the default) would be one which can handle as many
cases as possible, and which is manipulable in those (preferably known)
cases which it don't handle out of the box. I think that one need to
combine features from several GC techniques (possibly also RC) to achieve
this.

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource & #D: larsivi
June 04, 2006
Frank Benoit wrote:
> The best garbage collector I can think about, is a reference counting
> one.

Reference counting has its advantages, but some severe disadvantages:

1) cyclical data structures won't get free'd

2) every pointer copy requires an increment and a corresponding decrement - including when simply passing a reference to a function

3) in a multithreaded app, the incs and decs must be synchronized

4) exception handlers (finally's) must be inserted to handle all the decs so there are no leaks. Contrary to assertions otherwise, there is no such thing as "zero overhead exceptions."

5) in order to support slicing and interior pointers, as well as supporting reference counting on arbitrary allocations of non-object data, a separate "wrapper" object must be allocated for each allocation to be ref counted. This essentially doubles the number of allocations needed.

6) The wrapper object will mean that all pointers will need to be double-dereferenced to access the data.

7) Fixing the compiler to hide all this stuff from the programmer will make it difficult to interface cleanly with C.

8) Ref counting can fragment the heap thereby consuming more memory just like the gc can, though the gc typically will consume more memory overall.

9) Ref counting does not eliminated latency problems, it just reduces them.

The proposed C++ shared_ptr<>, which implements ref counting, suffers from all these faults. I haven't seen a heads up benchmark of shared_ptr<> vs mark/sweep, but I wouldn't be surprised if shared_ptr<> turned out to be a significant loser in terms of both performance and memory consumption.

That said, I'd like for D to optionally support some form of ref counting, as rc is better for managing scarce resources like file handles.
June 04, 2006
Frank Benoit wrote:
> Another garbage collector thread :)
> 
> The best garbage collector I can think about, is a reference counting
> one.

Uh, what's your definition of "best"?
The simplest? Then maybe.

If you take a look at Java, it doesn't use a reference counting GC and there is a good reason for this: AFAIK modern GC are far better performance wise than reference counting GC.
One GC that I remember (no I'm not an expert) is Mark Copy GC, see "MarkCopy:Fast copying GC with less space overhead".

The only thing that reference counting GCs have for them is that this scheme is simple, more or less robust but "best", best for what?

I doubt very much that the best GC used for say batch computations is the best for GUI clients..

Regards,
RenoX
« First   ‹ Prev
1 2 3 4 5 6 7