Mixing GC and non-GC in D. (AKA, "don't touch GC-references from DTOR, preferably don't use DTOR at all")

December 12, 2010
[D-runtime] Mixing GC and non-GC in D. (AKA, "don't touch GC-references from DTOR, preferably don't use DTOR at all")
Posted by Ulrik Mikaelsson
Permalink
Ulrik Mikaelsson
Permalink
Hi,

DISCLAIMER: I'm developing for D1/Tango. It is possible these issues are already resolved for druntime/D2. If so, I've failed to find any information about it, please do tell.

Recently, I've been trying to optimize my application by swapping out some resource allocation (file-descriptors for one) to reference-counted allocation instead of GC. I've hit some problems.

Problem
=======

Basically, the core of all my problems is something expressed in
http://bartoszmilewski.wordpress.com/2009/08/19/the-anatomy-of-reference-counting/
as "An object?s destructor must not access any garbage-collected
objects embedded in it.".

This becomes a real problem for various allocation-schemes, be it hierarchic allocation, reference counting, or a bunch of other custom resource-schemes. _It renders the destructor of mostly D useless for anything but mere C-binding-cleanup._

Consequence
===========
For the Reference-Counted example, the only working solution is to
have the counted object malloced, instead of GC-allocated. One could
argue that "correct" programs with reference-counting should do the
memory management completely explicit anyways, and yes, it's largely
true. The struct-dtor of D2 makes the C++ "smartptr"-construct
possible, making refcount-use mostly natural and automatic anyways.

However, it also means, that the refcounted object itself, can never use GC-allocated structures, such as mostly ANYTHING from the stdlib! In effect, as soon as you leave the GC behind, you leave over half of all useful things of D behind.

This is a real bummer. What first attracted me to D, and I believe is still the one of the key strengths of D, is the possibilities of hybrid GC/other memory-schemes. It allows the developer to write up something quick-n-dirty, and then improve in the places where it's actually needed, such as for open files, or gui-context-handles, or other expensive/limited resources.

As another indication that is really is a problem: In Tango, this have lead to the introduction of an additional destructor-type method "dispose", which is doing AFAICT what the destructor should have done, but is only invoked for deterministic destruction by "delete" or scope-guards. IMO, that can only lead to a world of pain and misunderstandings, having two different "destructors" ran depending on WHY the object were destroyed.

Proposed Solution
=================
Back to the core problem "An object?s destructor must not access any
garbage-collected objects embedded in it.".

As far as I can tell (but I'm no GC expert), this is a direct effect
of the current implementation of the GC, more specifically the loop
starting at http://www.dsource.org/projects/druntime/browser/trunk/src/gc/gcx.d#L2492.
In this loop, all non-marked objects gets their finalizers run, and
immediately after, they get freed. If I understand the code right,
this freeing is what's actually causing the problems, namely that if
the order in the loop don't match the order of references in the freed
object (which it can't, especially for circular references), it might
destroy a memory-area before calling the next finalizer, attempting to
use the just freed reference.

Wouldn't it instead be possible to split this loop into two separate loops, the first calling all finalizers, letting them run with all objects still intact, and then afterwards run a second pass, actually destroying the objects? AFAICT, this would resolve the entire problem, allowing for much more mixed-use of allocation strategies as well as making the destructor much more useful.

Ideas, opinions? Perhaps this have been discussed before?

Regards
/ Ulrik
Forums