Precise garbage collection

Hi,

I want to make precise garbage collection as presented at the D conference ready for inclusion into druntime. I have recently updated the branch at https://github.com/rainers/druntime/tree/gcx_precise to merge with Leandros changes to the GC module layout.

Before creating a pull request, I'd like to hear opinions on whether this should be included, if other choices would be better and where it should be improved.

Precise garbage collection must be able to work with different memory areas, namely the heap, global/thread data in the binary and stack/registers per thread. I think we do not have a feasible solution for the latter pair, so let's focus on the former two.

1. Heap

In a nutshell, the implementation uses the RTInfo template to generate a bitmap of booleans that indicate whether the corresponding field in a struct/class might be a pointer or not. For built-in types, this information is predefined in object.d and the typeinfo.ti_* modules.
When memory is allocated, the TypeInfo for the allocated type is passed to the garbage collector, which copies the bitmap into memory alongside the pages allocated by the pool (another GCBits member with one bit per word). During a collection, scanning the heap can then look up this information to detect false pointers and avoid keeping garbage alive.

There are a number of issues that should be discussed:

a. the compiler sometimes does not generate the RTInfo for a struct, but instead generates 0/1 into the respective m_RTInfo field, depending on whether this struct contains references or not. (As far as I can tell this happens if it is only the backend that needs the TypeInfo, e.g. when generating an array concatenation call.) That's why there are rtinfoNoPointers/rtinfoHasPointers enums also used by the TypeInfos for the builtin types.

I consider this behaviour a bug that should be fixed, as it also disallows other usages of the RTInfo.

There is also an issue with not creating RTInfo for associative array types, but there is an "easy" workaround.

b. there are already other application of the RTInfo template, so there should probably be some way to combine multiple "generators". My idea is to let RTInfo!T point to some immutable(RTInfoData) struct that can then have multiple members for different generators. It'd be nice to change the return type of TypeInfo.rtInfo() from void* to immutable(RTInfoData)* to avoid casting.

c. The GC interface has to be extended to pass type information to
the GC. I guess just passing the respective TypeInfo pointer is obvious and correct.

d. The alternative to using a pointer bitmap alongside the pool memory is to store the TypeInfo pointer instead. It's major advantage is that it adds minimal performance overhead to the allocation.

d1. This needs more memory for small allocations, but less for larger. If it is stored in the same memory as the allocation itself, it should be at the end to avoid alignment issues (at the beginning it always adds 16 bytes). This would mostly reuse unused memory due to the alignment of allocations to a power of 2. The worst effect would be for allocations of just below or equal to a power of 2. We could mitigate that effect by allowing other sizes of allocations aswell, or by not reserving space for the TypeInfo pointer if the block is allocated NOSCAN.

d2. Both dynamic and associative arrays currently allocate memory chunks and use them in a "non-standard" way that cannot be described by a simple TypeInfo. For example, dynamic arrays keep track of the allocated size of the array by placing it at the very end of allocations < 4k, but at the beginning for allocations >= 4k, moving the data to an offset of 16 in the latter case. Associative arrays combine hash-list-node, key and value into a single allocation, sometimes even with the value type unavailable.

My implementation solves these issues by "emplacing" the appropriate type information at the given address, assuming pointers if no type information is available (using the new gc_emplace function).

If only a TypeInfo pointer is available, I'm not sure how this can be solved without _allocating_ a new TypeInfo. My best guess is that
it could be done if a generic array scan function could be called.

d3. These leads to the idea to generate a scanning function for each type instead of the pointer bitmap. Depending on the sparseness of pointers this can be shorter or can create a lot of code bloat. As a compromise a generic version might use the pointer bitmap, but it can be overloaded to implement arrays or even unions. On the downside, it makes using std.emplace much more complicated if you want precise scanning.

e. Currently, there is only one TypeInfo_Class for both a class instance and a class reference. There are currently assumptions made which one is meant depending on context (if used as a "root" it is usually an instance, when following TypeInfo.next it is assumed a reference). This does no longer work reliably when combined with "emplace". I think we need to add TypeInfo_Reference to describe the field that is a pointer to a class instance. To be honest I have no idea how much code might break by adding this indirection when traversing type information.

f. Currently, the precise GC can be versioned in/out, but I think making it a runtime option is preferable. A problem here is that the user has no way to change the default configuration before the GC is initialized.
I could imagine always starting with the precise GC, but the user can opt out anytime. The GC could then release any additional memory it has allocated for precise scanning. If the user opts back in, the data structures are rebuilt assuming everything allocated so far as void[].

Another option might be to make it a link time option, but that would mean the "standard" gc could not be part of the runtime library to be exchangeable.

This text is already too long, so I postpone discussing the global/tls data section until later.

Summing up the questions raised:

1. Assuming RTInfo generation is fixed, should we go for adding an indiretion through RTInfoData to allow multiple "generators"?

2. Are you ok with extending the gc_* function with TypeInfo parameters where appropriate?

3. Do you think the pointer bitmap aside the pool memory and adding gc_emplace is ok? Or should we investigate other alternatives first?

4. Would you object adding TypeInfo_Reference?

5. How do you prefer enabling/disabling precise garbage collection? Versioning/linking/runtime?

Best,
Rainer
_______________________________________________
D-runtime mailing list
D-runtime@puremagic.com
http://lists.puremagic.com/mailman/listinfo/d-runtime

Forums