View mode: basic / threaded / horizontal-split · Log in · Help
June 04, 2006
GC, the simple solution
Another garbage collector thread :)

The best garbage collector I can think about, is a reference counting
one. To implement this:

* Every object needs a reference counter field.
* Code using references, need to have code for incrementing/decrementing
the counters.
* Destructors are called immediatly if the counter goes zero.

What you get is a precise, incremental, realtime garbage collector.
* all logic behaviour is deterministic
* Member objects are still valid, while a destructor is called. => no
more dispose/close patterns
* implement games and realtime apps without worry about lags
* implement secure application without worry about GC attack by valid
refence data values.
* consume only the needed memory
* no more need for the auto object
June 04, 2006
Re: GC, the simple solution
"GC, the simple solution"

No such thing.

Summary:

* Ref counting by default: good grief no!
* Ref counting as an option: good grief yes!
* Is there a perfect solution for memory management: don't be ridiculous.

Frank Benoit wrote:
> Another garbage collector thread :)
> 
> The best garbage collector I can think about, is a reference counting
> one. To implement this:
> 
> * Every object needs a reference counter field.
> * Code using references, need to have code for incrementing/decrementing
> the counters.
> * Destructors are called immediatly if the counter goes zero.
> 
> What you get is a precise, incremental, realtime garbage collector.
> * all logic behaviour is deterministic
> * Member objects are still valid, while a destructor is called. => no
> more dispose/close patterns
> * implement games and realtime apps without worry about lags
> * implement secure application without worry about GC attack by valid
> refence data values.
> * consume only the needed memory
> * no more need for the auto object

What you also get that you might not want:

* because references are on the object, you can't have array slices
* it's also incompatible with pointers: if the pointer is to a member of
an object, how on earth do you update the refcount then?
* slower in the immediate sense (incref & decref calls, plus it'll mean
less code in L1 and L2 CPU cache, which can kill performance in some
architectures)
* requires all objects to have destructors to decref child objects
(implicit or otherwise)
* without auto, makes it impossible to ensure an object's destruction
when you leave scope: you can leak references
* inability to free cycles
* inability to even *run* destructors where cycles are involved, because
there is no order in which they can logically be executed and still be valid

Those last two are the main problem with reference counting, along with
the others and slightly added complexity in the compiler.

Python, however, does not suffer the cycle problem.  Know why?

Because they have a *mark & sweep* garbage collector to clean up the cycles.

Seriously, I don't want to see the current GC replaced by a reference
counting scheme.  The current system is just fine for the majority of
applications.  Where it falls down are real-time systems which can't
allow for the non-deterministic GC pauses, and where you want to keep
your memory profile to a minimum.  Anything that has pauses of activity
can just run a background GC thread to keep the memory footprint down.

To this end, I think implementing an incremental or concurrent GC should
be the next port of call for optimising D's memory management.

As it stands, you can implement reference counting yourself: you just
need to write an allocator, and call the incref and decref functions
manually.  However, I do think that having built-in *support* for
Reference counting would be fantastic.  The less that stuff like this is
left in the hands of the programmer, the better.

What I think we all need to keep in mind is that NO memory management
scheme is good for all cases.  There will always be some weird corner
case which doesn't work well with our scheme of choice, and that we
don't give a rat's about, but is VERY important to someone else.

So far D does automatic GC (the best default, IMHO [1]), manual
management, and RAII.  Throw optional ref counting in to the mix, and I
think we'll have pretty much 99% of cases covered.

	-- Daniel

[1] The best default since it's unobtrusive, and *safe*: unless you're
doing strange things, it's very hard to leave dangling references with a
GC.  This allows people to get on with writing the damn program, and
worry about managing memory later (if ever).

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even
make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
June 04, 2006
Re: GC, the simple solution
> * Destructors are called immediatly if the counter goes zero.

This is not such a good thing! You don't really know who releases the last 
reference, so you don't really know when the object is deleted. If the 
object is being referenced from multiple threads, you don't know in what 
thread the destructor will run.

> * all logic behaviour is deterministic

See above.

> * Member objects are still valid, while a destructor is called. => no
> more dispose/close patterns

When an object is being destructed, the references it has to other objects 
are being released. So in the destructor of object B, object A might have 
released about half its references.

> * consume only the needed memory

What about circular references? They'll never get released, no clean-up.

L.
June 04, 2006
Re: GC, the simple solution
In nearly every other GC implementation you will need also a read or
write barrier. With reference counting you have a write barriere.

Yes, the unprecise, non-incremental, "Stop-the-world", non-concurrent,
non-generational, conservative Phobos GC implementation is an exception.

But a write barrier alone doesn't make ref counting ridiculous. Well you
are right, it is not the solution for all and everything, but I think a
big step in the right direction. And mark&sweep GC should be the option.

> * because references are on the object, you can't have array slices

Don't understand that point.

> * it's also incompatible with pointers: if the pointer is to a member of
> an object, how on earth do you update the refcount then?

In the actual implememtation of the gc, every object's memory range is
registered by the gc. With that you can get the objects refcounter address.

> * slower in the immediate sense (incref & decref calls, plus it'll mean
> less code in L1 and L2 CPU cache, which can kill performance in some
> architectures)

Performance has two sides. Throughput is one thing, worst latency the
other. We can think about techniques to optimize unecessary
increments/decrements.
BTW all incremental GCs will also need read and/or write barriers, which
will have the same perfomance issues.

> * requires all objects to have destructors to decref child objects
> (implicit or otherwise)

Yes, all objects having child objects. But if you think, that destructor
calls are a performance issue, you will get problems with the mark&sweep
also.

> * without auto, makes it impossible to ensure an object's destruction
> when you leave scope: you can leak references

How can that happen, if we have them native? Well ok, e.g. if you store
the reference to a static variable, then it is not freed. But this is
really a programmers bug. But I think that is something an advanced look
at it will find solutions. First we have to decide to go for ref counting :)

> * inability to free cycles
> * inability to even *run* destructors where cycles are involved, because
> there is no order in which they can logically be executed and still be valid

Right, here a manual ref=null is needed. And perhaps a mark&sweep
algorithm to detect such cycles in the debug version, called on request.
So if you know you program, you can avoid cycles, or you can break them
before releasing the last reference.

Doing a realtime GC seems to be not possible. The existing "realtime GC"
seems to me, incremental ones, which stop the world for a short limited
time. And that time goes into your worst latency. So if I want to do
things with a latency less 100us, there is no chance to implement a GC.
Ref counting whould be a solution, but this can only work if the whole
program does not depend on "Stop-the-world".


> As it stands, you can implement reference counting yourself: you just
> need to write an allocator, and call the incref and decref functions
> manually.  However, I do think that having built-in *support* for
> Reference counting would be fantastic.  The less that stuff like this is
> left in the hands of the programmer, the better.
No, i can't.
* The dereference operator cannot be overwritten.
* Existing classes, native arrays etc. does not work like this.
I do not need a few classes managed with ref counting. I want to avoid
the "Stop the world". And phobos needs it.

Where is the proof that ref counting is so slow? In mark&sweep you have
to run over the whole memory. If the memory is low, you have to do that
often. This is really slow than.
June 04, 2006
Re: GC, the simple solution
Lionello Lunesu schrieb:
>> * Destructors are called immediatly if the counter goes zero.
> 
> This is not such a good thing! You don't really know who releases the last 
> reference, so you don't really know when the object is deleted. If the 
> object is being referenced from multiple threads, you don't know in what 
> thread the destructor will run.
> 

Where is the disadvantage? If you release concurrent in different
threads you need some kind of syncronization. Than you know it. But this
is no issue of ref counting.
If you release a reference in the phobos GC, the object is collected
"perhaps" the next time. If there are no valid integer values,
"pointing"to your object. And after collecting, the sequence of
destructor calls is not defined.

>> * all logic behaviour is deterministic
> 
> See above.
See above.

> 
>> * Member objects are still valid, while a destructor is called. => no
>> more dispose/close patterns
> 
> When an object is being destructed, the references it has to other objects 
> are being released. So in the destructor of object B, object A might have 
> released about half its references.
> 
If object B Dtor is called, its member variables are valid, and valid is
the object A. The B.~this() can do something with A. After the Dtor is
finished, the reference to A is nulled, and A is perhaps also deleted.
Seams perfect to me.



>> * consume only the needed memory
> 
> What about circular references? They'll never get released, no clean-up.

Yes, they need a solution. Perhaps a fall back mark&sweep collector, to
check for cicles.

> 
> L. 
> 
>
June 04, 2006
Re: GC, the simple solution
"Frank Benoit" <keinfarbton@nospam.xyz> wrote in message 
news:e5uojm$18ji$1@digitaldaemon.com...
> Lionello Lunesu schrieb:
>>> * Destructors are called immediatly if the counter goes zero.
>>
>> This is not such a good thing! You don't really know who releases the 
>> last
>> reference, so you don't really know when the object is deleted. If the
>> object is being referenced from multiple threads, you don't know in what
>> thread the destructor will run.
>>
>
> Where is the disadvantage? If you release concurrent in different
> threads you need some kind of syncronization. Than you know it. But this
> is no issue of ref counting.
> If you release a reference in the phobos GC, the object is collected
> "perhaps" the next time. If there are no valid integer values,
> "pointing"to your object. And after collecting, the sequence of
> destructor calls is not defined.

There are many resources that must be released by the thread by which they 
were acquired. There's not much synchronisation can do. If the last 
reference just happens to be released in a thread other than the one that 
created the object, then the resource will be released in the wrong thread.

As I've understood from Boehm's paper, one advantages of a GC is to fix this 
issue: a GC collect will call finalizers for the objects that were created 
in that thread. It's like keeping an object from being destructed until the 
thread that created the object calls malloc (which is were the GC does its 
thing).

Anyway, Boehm's paper is a good read. He (she?) also addresses the problems 
with reference counting.

http://www.hpl.hp.com/personal/Hans_Boehm/gc/

>>> * Member objects are still valid, while a destructor is called. => no
>>> more dispose/close patterns
>>
>> When an object is being destructed, the references it has to other 
>> objects
>> are being released. So in the destructor of object B, object A might have
>> released about half its references.
>
> If object B Dtor is called, its member variables are valid, and valid is
> the object A. The B.~this() can do something with A. After the Dtor is
> finished, the reference to A is nulled, and A is perhaps also deleted.
> Seams perfect to me.

I meant my example like this: object A has a reference to object B. During 
the destruction of A, the references it has to other objects are released, 
causing some objects being destructed, like B. Now B might (indirectly) use 
data from A, but A's in the middle of destruction. What I mean to tell with 
all this is that you still have to be carefull with your destructors. 
Especially when using a strong-pointer template. You can't always tell what 
state other objects are in in a destructor. They might be half way during 
destruction, having released some, but not all, of their references.

L.
June 04, 2006
Re: GC, the simple solution
> As I've understood from Boehm's paper, one advantages of a GC is to fix this 
> issue: a GC collect will call finalizers for the objects that were created 
> in that thread. It's like keeping an object from being destructed until the 
> thread that created the object calls malloc 

This is not true for the phobos GC. The GC runs in the thread which
calls new or gc.fullCollect.
But the Dtors are not called while other threads are running, because
the GC pauses all other threads. This is called the "Stop-the-world".


> I meant my example like this: object A has a reference to object B. During 
> the destruction of A, the references it has to other objects are released, 
> causing some objects being destructed, like B. Now B might (indirectly) use 
> data from A, but A's in the middle of destruction. What I mean to tell with 
> all this is that you still have to be carefull with your destructors. 
> Especially when using a strong-pointer template. You can't always tell what 
> state other objects are in in a destructor. They might be half way during 
> destruction, having released some, but not all, of their references.
> 

Object A cannot be destroyed, if it is reachable directly or indirectly.
That is, the ref counting is for. So if A.~this() is called, nobody else
has a reference to A.
June 04, 2006
Re: GC, the simple solution
Frank Benoit wrote:

> Another garbage collector thread :)
> 
> The best garbage collector I can think about, is a reference counting
> one. To implement this:
> 
> * Every object needs a reference counter field.
> * Code using references, need to have code for incrementing/decrementing
> the counters.
> * Destructors are called immediatly if the counter goes zero.
> 
> What you get is a precise, incremental, realtime garbage collector.
> * all logic behaviour is deterministic
> * Member objects are still valid, while a destructor is called. => no
> more dispose/close patterns
> * implement games and realtime apps without worry about lags
> * implement secure application without worry about GC attack by valid
> refence data values.
> * consume only the needed memory
> * no more need for the auto object

Not entirely true in the general case, Frank ;)

There are always techniques to get around problems with an algorithm,
depending on what you want to achieve, but in general, a reference counted
GC is said to have these _negative_ properties: (paraphrased from Garbage
Collection: Algorithms for Automatic Dynamic Memory Management by Jones and
Lins)

* The cost of removing the last pointer to an object is unbounded
* Even if the general latency is good, the total overhead of adjusting
references is significantly greater than that of a tracing GC
* It has a substantial space overhead, making it less useful for smaller
heaps
* Cyclic references (already mentioned)

I think these makes RC less suited for the general case, whereas it might
fit very well in a tightly controlled system where low latency is required
(which I'm well aware that you need ;).

IMO, the best general GC (the default) would be one which can handle as many
cases as possible, and which is manipulable in those (preferably known)
cases which it don't handle out of the box. I think that one need to
combine features from several GC techniques (possibly also RC) to achieve
this. 

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource & #D: larsivi
June 04, 2006
Re: GC, the simple solution
Frank Benoit wrote:
> The best garbage collector I can think about, is a reference counting
> one.

Reference counting has its advantages, but some severe disadvantages:

1) cyclical data structures won't get free'd

2) every pointer copy requires an increment and a corresponding 
decrement - including when simply passing a reference to a function

3) in a multithreaded app, the incs and decs must be synchronized

4) exception handlers (finally's) must be inserted to handle all the 
decs so there are no leaks. Contrary to assertions otherwise, there is 
no such thing as "zero overhead exceptions."

5) in order to support slicing and interior pointers, as well as 
supporting reference counting on arbitrary allocations of non-object 
data, a separate "wrapper" object must be allocated for each allocation 
to be ref counted. This essentially doubles the number of allocations 
needed.

6) The wrapper object will mean that all pointers will need to be 
double-dereferenced to access the data.

7) Fixing the compiler to hide all this stuff from the programmer will 
make it difficult to interface cleanly with C.

8) Ref counting can fragment the heap thereby consuming more memory just 
like the gc can, though the gc typically will consume more memory overall.

9) Ref counting does not eliminated latency problems, it just reduces them.

The proposed C++ shared_ptr<>, which implements ref counting, suffers 
from all these faults. I haven't seen a heads up benchmark of 
shared_ptr<> vs mark/sweep, but I wouldn't be surprised if shared_ptr<> 
turned out to be a significant loser in terms of both performance and 
memory consumption.

That said, I'd like for D to optionally support some form of ref 
counting, as rc is better for managing scarce resources like file handles.
June 04, 2006
Re: GC, the simple solution
Frank Benoit wrote:
> Another garbage collector thread :)
> 
> The best garbage collector I can think about, is a reference counting
> one.

Uh, what's your definition of "best"?
The simplest? Then maybe.

If you take a look at Java, it doesn't use a reference counting GC and 
there is a good reason for this: AFAIK modern GC are far better 
performance wise than reference counting GC.
One GC that I remember (no I'm not an expert) is Mark Copy GC, see 
"MarkCopy:Fast copying GC with less space overhead".

The only thing that reference counting GCs have for them is that this 
scheme is simple, more or less robust but "best", best for what?

I doubt very much that the best GC used for say batch computations is 
the best for GUI clients..

Regards,
RenoX
« First   ‹ Prev
1 2 3 4 5
Top | Discussion index | About this forum | D home