Jump to page: 1 2 3
Thread overview
[RFC]Proposal for better garbage collection
Feb 22, 2012
Benjamin Thaut
Feb 22, 2012
H. S. Teoh
Feb 22, 2012
Benjamin Thaut
Feb 22, 2012
H. S. Teoh
Feb 23, 2012
deadalnix
Feb 23, 2012
H. S. Teoh
Feb 23, 2012
Martin Nowak
Feb 23, 2012
Jacob Carlborg
Feb 25, 2012
Timon Gehr
Feb 22, 2012
Dmitry Olshansky
Feb 22, 2012
Dmitry Olshansky
Feb 22, 2012
H. S. Teoh
Feb 22, 2012
H. S. Teoh
Feb 23, 2012
Benjamin Thaut
Feb 23, 2012
Robert Jacques
Feb 22, 2012
Sean Kelly
Feb 23, 2012
H. S. Teoh
Feb 23, 2012
Jonathan M Davis
Feb 23, 2012
H. S. Teoh
Feb 23, 2012
Martin Nowak
Feb 23, 2012
Sean Kelly
Feb 23, 2012
deadalnix
Feb 23, 2012
Manu
Feb 23, 2012
H. S. Teoh
Feb 24, 2012
deadalnix
Feb 23, 2012
Artur Skawina
Feb 23, 2012
H. S. Teoh
February 22, 2012
As I'm not satisfied with the current GC D has and don't see the situation improving in the future without significant changes to the compiler I wrote the following document that points out all the possible issues with garbage collection I could think of and possible solutions for them. This document is just a draft and only a proposal, critics and comments are welcome.

Kind Regards
Benjamin Thaut


1) Preface

All modern Languages with fast and efficient GCs have build in support for garbage collection within the compiler / virtual machine. Currently the D language does have as much GC support as C++ has but fully relies on the GC with a lot of language features and the standard library. As a result the GC is slow, non parallel and always needs to stop all threads before collection. This document suggests to build in better support for garbage collection into the language so that creating a fast and efficient GC becomes possible. A list of features that would be required is described in this document.

2) Tracking references on the stack:

The D compiler always needs to emit a full stack frame so that the GC can walk up the stack at any time in the program. The stack frame of every function generated by the D compiler starts which a bitfield (usually the size of a machine register) where each bit indicates that these bytes are a pointer / reference. The bitfield needs to be large enough to cover the whole stack frame of the function.

For example on x86: 11001...
1 = bytes 0 to 4 are a pointer
1 = bytes 4 to 8 are a pointer
00 = bytes 8 to 16 are not a pointer
1 = bytes 16 to 20 are a pointer

The last bit indicates whether the bitfield is continued in the following bytes or not. 1 means continued 0 means finished.
Every scope generated by the D compiler would need additional code at the start and end of the scope. When the scope is entered the bitfield would be patched to represent the new variables inside the scope and when the scope is left the bitfield is patched again to remove the changes that were made on entering the scope.
Every time a function gets called that did not get generated by the D compiler ( C / C++ etc functions) the compiler generates a call into the runtime and passes the current stack pointer and stack base pointer to it.

void _d_externalCallStart(void* stackptr, void* baseptr);

Every time such a function returns the compiler   generates a call into the the runtime too.

void _d_externalCallEnd(void* stackptr, void* baseptr);

Every time a functions that can get called from other languages (extern(C) etc) are executed the end callback is inserted at the start of the functions and the start callback is inserted at the end of the function.
Using these callbacks the GC can mark certain parts of the stack as "non-D" and ignore them when scanning for bit fields and references/pointers. It can just skip parts of the stack that are "non-D" and therefore does not need a full stack frame within these "non-D" sections.
All these features are required so that the GC can precisely scan pointers/references on the stack and change them as necessary.
Remaining issues: The D compiler can freely move around value types on the stack. With such move operations it would be necessary to fix up all the bit fields. I needs to be investigated if this is doable.

3) Tracking references on the heap

For every class / struct a mixin template which is defined inside the runtime gets instantiated. This template can then use introspection to generate the necessary information to allow the GC to scan the pointers within that struct / class precisely.

4) Thread local / global memory

A callback into the runtime needs to happen in the following cases:
- a __gshared variable is assigned
- a reference / pointer is casted to immutable
- a reference / pointer is casted to shared

void _d_castToGlobalMem(void* ptr);

This can be used by the GC to keep thread local pools of memory and move a memory block to a global memory pool as soon as it is needed there.

5) pointer / reference changed callback

Every time a pointer / reference is changed the D compiler emits a call into the runtime and passes the new value of the reference / pointer with it.

void _d_pointerChanged(void *ptr);

This can be used when writing a generational GC to have separate pools for young and old generations. Every time the young generation needs to be collected it can be avoided to scan the old generations pool because it is sufficient to only check the pointers that have changed since the last time the young generation was collected. With the above mentioned callback it is easily possible to track these references.

Remaining issues:
-If the GC interrupts a thread right before any of the above mentioned callbacks happen it will cause a invalid state for the GC and the GC might access invalid pointers. It has to be investigated if this leads to invalid behavior. It can be fixed by not interrupting a thread but pause it the next time it calls any of callbacks, or other functions that can be interrupted by the GC. This in turn could cause a thread to be non pausable because it is stuck in a polling loop. The compiler could identify loops without any GC interruptible function and manually insert one.
-When pointers/references are passed inside processor registers the GC cannot know if these values are actually pointers/references or represent other values. If threads are paused at defined points in the code as mentioned before this issues would be fixed because the state at these points is known and can be handled accordingly.

6) Different interface for the GC

The current interface to the GC would have to change because the "this block of memory might contain a pointer" approach wouldn't work anymore. For example a block of memory and a delegate which iterates over all pointers within the memory block could be used for user allocated memory blocks. There should be a separate allocator function provided by the GC that allocates memory that does not get moved around so it can be used to pass it to non garbage collected code.

7) Compiler Options

Each of the above mentioned groups of features should be exposed as compiler options so that you can turn them on/off depending on which type of GC you use. Default on/off states for these features are set within a config file depending on which type of GC currently ships per default with druntime.

8) Conclusion

Garbage Collection brings a lot of advantages for the programmer using the language but is not free and shouldn't be treated as free. Full support for garbage collection is required to build a fast and efficient GC. This additional support requires additional features within the compiler but should result in a overall better performing language.
February 22, 2012
On Wed, Feb 22, 2012 at 07:56:15PM +0100, Benjamin Thaut wrote:
> As I'm not satisfied with the current GC D has and don't see the situation improving in the future without significant changes to the compiler I wrote the following document that points out all the possible issues with garbage collection I could think of and possible solutions for them. This document is just a draft and only a proposal, critics and comments are welcome.

Have you seen this?

	http://www.llucax.com.ar/proj/dgc/index.html


[...]
> 2) Tracking references on the stack:
> 
> The D compiler always needs to emit a full stack frame so that the GC can walk up the stack at any time in the program. The stack frame of every function generated by the D compiler starts which a bitfield (usually the size of a machine register) where each bit indicates that these bytes are a pointer / reference. The bitfield needs to be large enough to cover the whole stack frame of the function.

This adds a lot of overhead to the runtime stack, esp. if you have deep recursion. It's also not necessarily faster, since the GC now has to parse a bitfield (a variable-length encoded bitfield, no less), instead of just scanning words directly, which can be optimized by CPU-specific microcode depending on the target platform.


[...]
> Every scope generated by the D compiler would need additional code at the start and end of the scope. When the scope is entered the bitfield would be patched to represent the new variables inside the scope and when the scope is left the bitfield is patched again to remove the changes that were made on entering the scope.

This would introduce quite a lot of overhead per scope. It will also lead to strange things like:

	if (x) y();	// faster
	if (x) { y(); }	// slower

which will encourage people to omit {} after if, which makes code more fragile and hard to read.


> Every time a function gets called that did not get generated by the D compiler ( C / C++ etc functions) the compiler generates a call into the runtime and passes the current stack pointer and stack base pointer to it.
> 
> void _d_externalCallStart(void* stackptr, void* baseptr);
> 
> Every time such a function returns the compiler   generates a call into the the runtime too.
> 
> void _d_externalCallEnd(void* stackptr, void* baseptr);
> 
> Every time a functions that can get called from other languages
> (extern(C) etc) are executed the end callback is inserted at the
> start of the functions and the start callback is inserted at the end
> of the function.
> Using these callbacks the GC can mark certain parts of the stack as
> "non-D" and ignore them when scanning for bit fields and
> references/pointers. It can just skip parts of the stack that are
> "non-D" and therefore does not need a full stack frame within these
> "non-D" sections.

This may not be a bad idea, though it does introduce some overhead when you cross language boundaries.


> All these features are required so that the GC can precisely scan pointers/references on the stack and change them as necessary. Remaining issues: The D compiler can freely move around value types on the stack. With such move operations it would be necessary to fix up all the bit fields. I needs to be investigated if this is doable.

This can only make the GC slower, especially if it needs to update variable-length encoded bitfields. Of course, you may be able to offset this by making it possible to do real-time GC, (reduced throughput but less waiting time for collection cycles) but that's a very complex problem.


> 3) Tracking references on the heap
> 
> For every class / struct a mixin template which is defined inside the runtime gets instantiated. This template can then use introspection to generate the necessary information to allow the GC to scan the pointers within that struct / class precisely.

So basically you're proposing a compacting precise-scanning GC instead of the current conservative GC. There are pros and cons in either approach; it'd be nice if you could compare them.


[...]
> 5) pointer / reference changed callback
> 
> Every time a pointer / reference is changed the D compiler emits a call into the runtime and passes the new value of the reference / pointer with it.

This introduces a LOT of overhead, especially in a language like D which manipulates a lot of pointers quite often (esp. if you use slices a lot).


[...]
> -If the GC interrupts a thread right before any of the above mentioned callbacks happen it will cause a invalid state for the GC and the GC might access invalid pointers. It has to be investigated if this leads to invalid behavior. It can be fixed by not interrupting a thread but pause it the next time it calls any of callbacks, or other functions that can be interrupted by the GC.

This adds a lot of intermittent pauses in program execution.

The link I posted at the top has a GC implementation that doesn't introduce *any* of this overhead (the GC runs concurrently with the program), with no pause during a collection cycle (garbage is incrementally collected when allocating new memory).


[...]
> 6) Different interface for the GC
> 
> The current interface to the GC would have to change because the "this block of memory might contain a pointer" approach wouldn't work anymore. For example a block of memory and a delegate which iterates over all pointers within the memory block could be used for user allocated memory blocks. There should be a separate allocator function provided by the GC that allocates memory that does not get moved around so it can be used to pass it to non garbage collected code.

It would be nice if there was a way for GCs to be pluggable, especially in the compiler. Currently, we can only swap GCs that implement the same interface as the existing one, but to switch to a different GC model like you're proposing would require a lot of compiler support.


> 7) Compiler Options
> 
> Each of the above mentioned groups of features should be exposed as compiler options so that you can turn them on/off depending on which type of GC you use. Default on/off states for these features are set within a config file depending on which type of GC currently ships per default with druntime.

This would be nice.


> 8) Conclusion
> 
> Garbage Collection brings a lot of advantages for the programmer using the language but is not free and shouldn't be treated as free.

I don't know, but after reading this:

	http://www.cs.purdue.edu/homes/grr/snapshot-gc.ps

I think there might be a possibility of (almost) free GC.


> Full support for garbage collection is required to build a fast and efficient GC. This additional support requires additional features within the compiler but should result in a overall better performing language.

I agree in principle, although for specific GC proposals, we'd need to evaluate the pros and cons to determine what is improved and what degrades. Unfortunately, GC is a complex problem, and different GCs work better with different apps. I wouldn't be so quick to make claims about the performance of GCs. It depends on what the app does.


T

-- 
EMACS = Extremely Massive And Cumbersome System
February 22, 2012
Am 22.02.2012 20:40, schrieb H. S. Teoh:
> On Wed, Feb 22, 2012 at 07:56:15PM +0100, Benjamin Thaut wrote:
>> As I'm not satisfied with the current GC D has and don't see the
>> situation improving in the future without significant changes to the
>> compiler I wrote the following document that points out all the
>> possible issues with garbage collection I could think of and
>> possible solutions for them. This document is just a draft and only
>> a proposal, critics and comments are welcome.
>
> Have you seen this?
>
> 	http://www.llucax.com.ar/proj/dgc/index.html

Yes I know about dgc it is better but still not on par with for example the GC that is shipped with the .NET 4.0
All I'm saying is that without propper support from the compiler we are not going to get GCs as good as in other modern languages.

>
>
> [...]
>> 2) Tracking references on the stack:
>>
>> The D compiler always needs to emit a full stack frame so that the
>> GC can walk up the stack at any time in the program. The stack frame
>> of every function generated by the D compiler starts which a
>> bitfield (usually the size of a machine register) where each bit
>> indicates that these bytes are a pointer / reference. The bitfield
>> needs to be large enough to cover the whole stack frame of the
>> function.
>
> This adds a lot of overhead to the runtime stack, esp. if you have deep
> recursion. It's also not necessarily faster, since the GC now has to
> parse a bitfield (a variable-length encoded bitfield, no less), instead
> of just scanning words directly, which can be optimized by CPU-specific
> microcode depending on the target platform.
>

If you have a better idea for percise stack scanning I'm open for suggestions.

>
> [...]
>> Every scope generated by the D compiler would need additional code
>> at the start and end of the scope. When the scope is entered the
>> bitfield would be patched to represent the new variables inside the
>> scope and when the scope is left the bitfield is patched again to
>> remove the changes that were made on entering the scope.
>
> This would introduce quite a lot of overhead per scope. It will also
> lead to strange things like:
>
> 	if (x) y();	// faster
> 	if (x) { y(); }	// slower
>
> which will encourage people to omit {} after if, which makes code more
> fragile and hard to read.
>

Scopeds that don't have variables declared inside them don't need the bitfield patching. so that argument is completely pointless. Scopes that contain varaibles that are not pointers or refrences also don't need the patching.

>
>> Every time a function gets called that did not get generated by the
>> D compiler ( C / C++ etc functions) the compiler generates a call
>> into the runtime and passes the current stack pointer and stack base
>> pointer to it.
>>
>> void _d_externalCallStart(void* stackptr, void* baseptr);
>>
>> Every time such a function returns the compiler   generates a call
>> into the the runtime too.
>>
>> void _d_externalCallEnd(void* stackptr, void* baseptr);
>>
>> Every time a functions that can get called from other languages
>> (extern(C) etc) are executed the end callback is inserted at the
>> start of the functions and the start callback is inserted at the end
>> of the function.
>> Using these callbacks the GC can mark certain parts of the stack as
>> "non-D" and ignore them when scanning for bit fields and
>> references/pointers. It can just skip parts of the stack that are
>> "non-D" and therefore does not need a full stack frame within these
>> "non-D" sections.
>
> This may not be a bad idea, though it does introduce some overhead when
> you cross language boundaries.
>
>
>> All these features are required so that the GC can precisely scan
>> pointers/references on the stack and change them as necessary.
>> Remaining issues: The D compiler can freely move around value types
>> on the stack. With such move operations it would be necessary to fix
>> up all the bit fields. I needs to be investigated if this is doable.
>
> This can only make the GC slower, especially if it needs to update
> variable-length encoded bitfields. Of course, you may be able to offset
> this by making it possible to do real-time GC, (reduced throughput but
> less waiting time for collection cycles) but that's a very complex
> problem.
>
>
>> 3) Tracking references on the heap
>>
>> For every class / struct a mixin template which is defined inside
>> the runtime gets instantiated. This template can then use
>> introspection to generate the necessary information to allow the GC
>> to scan the pointers within that struct / class precisely.
>
> So basically you're proposing a compacting precise-scanning GC instead
> of the current conservative GC. There are pros and cons in either
> approach; it'd be nice if you could compare them.
>

I'm proposing a compacting percise-scanning generantional gc that has thread local pools and can scan these thread local pools without stopping the other threads. Also it will be able to collect young generations without the need to scan the old generations.

>
> [...]
>> 5) pointer / reference changed callback
>>
>> Every time a pointer / reference is changed the D compiler emits a
>> call into the runtime and passes the new value of the reference /
>> pointer with it.
>
> This introduces a LOT of overhead, especially in a language like D which
> manipulates a lot of pointers quite often (esp. if you use slices a
> lot).
>

I did not make this up, I know a smalltalk implementation that actually does this and is pretty efficient.

>
> [...]
>> -If the GC interrupts a thread right before any of the above mentioned
>> callbacks happen it will cause a invalid state for the GC and the GC
>> might access invalid pointers. It has to be investigated if this leads
>> to invalid behavior. It can be fixed by not interrupting a thread but
>> pause it the next time it calls any of callbacks, or other functions
>> that can be interrupted by the GC.
>
> This adds a lot of intermittent pauses in program execution.

Why should there be pauses, there is just a additional check in every callback to the gc there already is. When the gc wants to collect he sets the pause flag to true and waits until all required threads paused themselfs.

>
> The link I posted at the top has a GC implementation that doesn't
> introduce *any* of this overhead (the GC runs concurrently with the
> program), with no pause during a collection cycle (garbage is
> incrementally collected when allocating new memory).
>

Any non percise scanning algorithms will not be able to deal with memory fragmentation and will also have uneccessary overhead for scanning regions of memory that don't contain any pointers. Also they can leak memory because some int value has the same value as a pointer and therefore the gc does not free that block of memory.

>
> [...]
>> 6) Different interface for the GC
>>
>> The current interface to the GC would have to change because the
>> "this block of memory might contain a pointer" approach wouldn't
>> work anymore. For example a block of memory and a delegate which
>> iterates over all pointers within the memory block could be used for
>> user allocated memory blocks. There should be a separate allocator
>> function provided by the GC that allocates memory that does not get
>> moved around so it can be used to pass it to non garbage collected
>> code.
>
> It would be nice if there was a way for GCs to be pluggable, especially
> in the compiler. Currently, we can only swap GCs that implement the same
> interface as the existing one, but to switch to a different GC model
> like you're proposing would require a lot of compiler support.
>
>
>> 7) Compiler Options
>>
>> Each of the above mentioned groups of features should be exposed as
>> compiler options so that you can turn them on/off depending on which
>> type of GC you use. Default on/off states for these features are set
>> within a config file depending on which type of GC currently ships
>> per default with druntime.
>
> This would be nice.
>
>
>> 8) Conclusion
>>
>> Garbage Collection brings a lot of advantages for the programmer
>> using the language but is not free and shouldn't be treated as free.
>
> I don't know, but after reading this:
>
> 	http://www.cs.purdue.edu/homes/grr/snapshot-gc.ps
>
> I think there might be a possibility of (almost) free GC.
>

There is no free GC. The only question is which trade offs you want to make. Modern implementations like the .NET 4.0 garbage collector show that all the things mentioned here are possible and are faster then primitve implementations.

>
>> Full support for garbage collection is required to build a fast and
>> efficient GC. This additional support requires additional features
>> within the compiler but should result in a overall better performing
>> language.
>
> I agree in principle, although for specific GC proposals, we'd need to
> evaluate the pros and cons to determine what is improved and what
> degrades. Unfortunately, GC is a complex problem, and different GCs work
> better with different apps. I wouldn't be so quick to make claims about
> the performance of GCs. It depends on what the app does.
>
>
> T
>

Fully agree on this

I want to add that I did not make all this up. Most of the mentioned features here are actually used in a Smalltalk implementation that compiles Smalltalk to C for faster execution.

February 22, 2012
On 22.02.2012 22:56, Benjamin Thaut wrote:
> As I'm not satisfied with the current GC D has and don't see the
> situation improving in the future without significant changes to the
> compiler I wrote the following document that points out all the possible
> issues with garbage collection I could think of and possible solutions
> for them. This document is just a draft and only a proposal, critics and
> comments are welcome.
>
> Kind Regards
> Benjamin Thaut
>
>
> 1) Preface
>
> All modern Languages with fast and efficient GCs have build in support
> for garbage collection within the compiler / virtual machine. Currently
> the D language does have as much GC support as C++ has but fully relies
> on the GC with a lot of language features and the standard library. As a
> result the GC is slow, non parallel and always needs to stop all threads
> before collection. This document suggests to build in better support for
> garbage collection into the language so that creating a fast and
> efficient GC becomes possible. A list of features that would be required
> is described in this document.
>
> 2) Tracking references on the stack:
>
> The D compiler always needs to emit a full stack frame so that the GC
> can walk up the stack at any time in the program.

 I think walking up the stack to collect this info again and again (the stack has a lot of "heavy frames" on the bottom, right?) sounds like a tremendously slow way of getting necessary memory ranges. I'm no expert though.

The stack frame of
> every function generated by the D compiler starts which a bitfield
> (usually the size of a machine register) where each bit indicates that
> these bytes are a pointer / reference. The bitfield needs to be large
> enough to cover the whole stack frame of the function.
>
> For example on x86: 11001...
> 1 = bytes 0 to 4 are a pointer
> 1 = bytes 4 to 8 are a pointer
> 00 = bytes 8 to 16 are not a pointer
> 1 = bytes 16 to 20 are a pointer
>
> The last bit indicates whether the bitfield is continued in the
> following bytes or not. 1 means continued 0 means finished.
> Every scope generated by the D compiler would need additional code at
> the start and end of the scope. When the scope is entered the bitfield
> would be patched to represent the new variables inside the scope and
> when the scope is left the bitfield is patched again to remove the
> changes that were made on entering the scope.

Again I'm no expert, but what happens when GC starts collecting a thread stack amid this patching operation?

> Every time a function gets called that did not get generated by the D
> compiler ( C / C++ etc functions) the compiler generates a call into the
> runtime and passes the current stack pointer and stack base pointer to it.
>
> void _d_externalCallStart(void* stackptr, void* baseptr);
>
> Every time such a function returns the compiler generates a call into
> the the runtime too.
>
> void _d_externalCallEnd(void* stackptr, void* baseptr);
>

Why would you need these? And if you start calling callback on every operation, you may just as well pass direct ranges of memory to GC without stack walk.
+ you can call D function from C one that in turn calls D one, think extern(C) and callbacks.

> Every time a functions that can get called from other languages
> (extern(C) etc) are executed the end callback is inserted at the start
> of the functions and the start callback is inserted at the end of the
> function.
> Using these callbacks the GC can mark certain parts of the stack as
> "non-D" and ignore them when scanning for bit fields and
> references/pointers. It can just skip parts of the stack that are
> "non-D" and therefore does not need a full stack frame within these
> "non-D" sections.
> All these features are required so that the GC can precisely scan
> pointers/references on the stack and change them as necessary.
> Remaining issues: The D compiler can freely move around value types on
> the stack. With such move operations it would be necessary to fix up all
> the bit fields. I needs to be investigated if this is doable.
>
> 3) Tracking references on the heap
>
> For every class / struct a mixin template which is defined inside the
> runtime gets instantiated. This template can then use introspection to
> generate the necessary information to allow the GC to scan the pointers
> within that struct / class precisely.
>
> 4) Thread local / global memory
>
> A callback into the runtime needs to happen in the following cases:
> - a __gshared variable is assigned
> - a reference / pointer is casted to immutable
> - a reference / pointer is casted to shared
>
> void _d_castToGlobalMem(void* ptr);
>
> This can be used by the GC to keep thread local pools of memory and move
> a memory block to a global memory pool as soon as it is needed there.
>
> 5) pointer / reference changed callback
>
> Every time a pointer / reference is changed the D compiler emits a call
> into the runtime and passes the new value of the reference / pointer
> with it.

Bye-bye any speed of p++ ? I mean I'm horrified, and I bet I'm not alone.

>
> void _d_pointerChanged(void *ptr);
>
> This can be used when writing a generational GC to have separate pools
> for young and old generations. Every time the young generation needs to
> be collected it can be avoided to scan the old generations pool because
> it is sufficient to only check the pointers that have changed since the
> last time the young generation was collected. With the above mentioned
> callback it is easily possible to track these references.
>
> Remaining issues:
> -If the GC interrupts a thread right before any of the above mentioned
> callbacks happen it will cause a invalid state for the GC and the GC
> might access invalid pointers. It has to be investigated if this leads
> to invalid behavior. It can be fixed by not interrupting a thread but
> pause it the next time it calls any of callbacks, or other functions
> that can be interrupted by the GC. This in turn could cause a thread to
> be non pausable because it is stuck in a polling loop. The compiler
> could identify loops without any GC interruptible function and manually
> insert one.
> -When pointers/references are passed inside processor registers the GC
> cannot know if these values are actually pointers/references or
> represent other values. If threads are paused at defined points in the
> code as mentioned before this issues would be fixed because the state at
> these points is known and can be handled accordingly.
>
> 6) Different interface for the GC
>
> The current interface to the GC would have to change because the "this
> block of memory might contain a pointer" approach wouldn't work anymore.
> For example a block of memory and a delegate which iterates over all
> pointers within the memory block could be used for user allocated memory
> blocks. There should be a separate allocator function provided by the GC
> that allocates memory that does not get moved around so it can be used
> to pass it to non garbage collected code.
>
> 7) Compiler Options
>
> Each of the above mentioned groups of features should be exposed as
> compiler options so that you can turn them on/off depending on which
> type of GC you use. Default on/off states for these features are set
> within a config file depending on which type of GC currently ships per
> default with druntime.

Combinatorial explosion of sets of options that doesn't necessary allow a particular GC?
>
> 8) Conclusion
>
> Garbage Collection brings a lot of advantages for the programmer using
> the language but is not free and shouldn't be treated as free. Full
> support for garbage collection is required to build a fast and efficient
> GC. This additional support requires additional features within the
> compiler but should result in a overall better performing language.

Well, that was critic ;)

-- 
Dmitry Olshansky
February 22, 2012
On 23.02.2012 0:03, Dmitry Olshansky wrote:
> On 22.02.2012 22:56, Benjamin Thaut wrote:
>> As I'm not satisfied with the current GC D has and don't see the
>> situation improving in the future without significant changes to the
>> compiler I wrote the following document that points out all the possible
>> issues with garbage collection I could think of and possible solutions
>> for them. This document is just a draft and only a proposal, critics and
>> comments are welcome.
>>
>> Kind Regards
>> Benjamin Thaut
>>
BTW check out ideas on GC http://prowiki.org/wiki4d/wiki.cgi?GSOC_2012_Ideas

February 22, 2012
On Wed, Feb 22, 2012 at 08:53:45PM +0100, Benjamin Thaut wrote:
> Am 22.02.2012 20:40, schrieb H. S. Teoh:
> >On Wed, Feb 22, 2012 at 07:56:15PM +0100, Benjamin Thaut wrote:
> >>As I'm not satisfied with the current GC D has and don't see the situation improving in the future without significant changes to the compiler I wrote the following document that points out all the possible issues with garbage collection I could think of and possible solutions for them. This document is just a draft and only a proposal, critics and comments are welcome.
> >
> >Have you seen this?
> >
> >	http://www.llucax.com.ar/proj/dgc/index.html
> 
> Yes I know about dgc it is better but still not on par with for
> example the GC that is shipped with the .NET 4.0
> All I'm saying is that without propper support from the compiler we
> are not going to get GCs as good as in other modern languages.

I agree. Better compiler support would definitely be beneficial.


[...]
> >>Every scope generated by the D compiler would need additional code at the start and end of the scope. When the scope is entered the bitfield would be patched to represent the new variables inside the scope and when the scope is left the bitfield is patched again to remove the changes that were made on entering the scope.
> >
> >This would introduce quite a lot of overhead per scope. It will also lead to strange things like:
> >
> >	if (x) y();	// faster
> >	if (x) { y(); }	// slower
> >
> >which will encourage people to omit {} after if, which makes code more fragile and hard to read.
> >
> 
> Scopeds that don't have variables declared inside them don't need the bitfield patching. so that argument is completely pointless. Scopes that contain varaibles that are not pointers or refrences also don't need the patching.

That wasn't clear from your description. It makes more sense now.


[...]
> >>5) pointer / reference changed callback
> >>
> >>Every time a pointer / reference is changed the D compiler emits a call into the runtime and passes the new value of the reference / pointer with it.
> >
> >This introduces a LOT of overhead, especially in a language like D which manipulates a lot of pointers quite often (esp. if you use slices a lot).
> >
> 
> I did not make this up, I know a smalltalk implementation that actually does this and is pretty efficient.

OK.


[...]
> >>-If the GC interrupts a thread right before any of the above mentioned callbacks happen it will cause a invalid state for the GC and the GC might access invalid pointers. It has to be investigated if this leads to invalid behavior. It can be fixed by not interrupting a thread but pause it the next time it calls any of callbacks, or other functions that can be interrupted by the GC.
> >
> >This adds a lot of intermittent pauses in program execution.
> 
> Why should there be pauses, there is just a additional check in every callback to the gc there already is. When the gc wants to collect he sets the pause flag to true and waits until all required threads paused themselfs.

Whereas with a scheme like dgc there is no need for threads to pause at all.


> >The link I posted at the top has a GC implementation that doesn't introduce *any* of this overhead (the GC runs concurrently with the program), with no pause during a collection cycle (garbage is incrementally collected when allocating new memory).
> >
> 
> Any non percise scanning algorithms will not be able to deal with memory fragmentation

There are ways to deal with this. Though, granted, they're imperfect.


> and will also have uneccessary overhead for scanning regions of memory that don't contain any pointers.

True. But if the scanning is running in its own thread anyway, and no other thread needs to wait for it, then this doesn't really matter, does it?


> Also they can leak memory because some int value has the same value as a pointer and therefore the gc does not free that block of memory.

Yes, this is definitely a problem. It's not easy to fix this in a language like D, though, without adding some overhead.


[...]
> >>8) Conclusion
> >>
> >>Garbage Collection brings a lot of advantages for the programmer using the language but is not free and shouldn't be treated as free.
> >
> >I don't know, but after reading this:
> >
> >	http://www.cs.purdue.edu/homes/grr/snapshot-gc.ps
> >
> >I think there might be a possibility of (almost) free GC.
> >
> 
> There is no free GC. The only question is which trade offs you want to make. Modern implementations like the .NET 4.0 garbage collector show that all the things mentioned here are possible and are faster then primitve implementations.

True. But then again, D's GC is only a simple implementation. Just because an advanced implementation of a GC beats D's GC doesn't necessarily mean that that particular implementation's GC model is the best.

But I'm not trying to argue against your proposal. I'm just saying we should evaluate different GC models to see which one works best. But for that we need a way to easily plug in different GC models, which requires compiler support.


[...]
> >I agree in principle, although for specific GC proposals, we'd need to evaluate the pros and cons to determine what is improved and what degrades. Unfortunately, GC is a complex problem, and different GCs work better with different apps. I wouldn't be so quick to make claims about the performance of GCs. It depends on what the app does.
> >
> >
> >T
> >
> 
> Fully agree on this
> 
> I want to add that I did not make all this up. Most of the mentioned features here are actually used in a Smalltalk implementation that compiles Smalltalk to C for faster execution.

The thing is, Smalltalk is different enough from D that it's hard to draw conclusions about the performance of the GC when used in D just by looking at its performance in Smalltalk. In Smalltalk, the programmer doesn't have direct access to things like pointers and byte representations of stuff. This allows the compiler to make optimizations that would you couldn't safely do with a D program. It also means programmers may implement the same algorithms differently in D than in Smalltalk. So the memory usage patterns of a D program are quite different from a Smalltalk program, and this will affect how a particular GC behaves when applied to D.

Without actual testing we have no way to know for sure.

But regardless, we need better compiler support. No argument about that. :)


T

-- 
Lottery: tax on the stupid. -- Slashdotter
February 22, 2012
On Wed, Feb 22, 2012 at 07:56:15PM +0100, Benjamin Thaut wrote: [...]
> 2) Tracking references on the stack:
> 
> The D compiler always needs to emit a full stack frame so that the GC can walk up the stack at any time in the program. The stack frame of every function generated by the D compiler starts which a bitfield (usually the size of a machine register) where each bit indicates that these bytes are a pointer / reference. The bitfield needs to be large enough to cover the whole stack frame of the function.
[...]

I was thinking about this a bit more, and I had an idea: why bother with storing bitfields on the stack? Any function's local pointer variables are known at compile-time. So store a function ID (probably a pointer) that maps to some static storage where this information is stored. Then we always only need 1 word of extra storage on the stack frame, and the GC can follow the pointer to get the info it needs. A recursively called function won't incur the cost of duplicated copies of bitfields, its ID points to same place. You can even have two different functions share the same ID if they have pointer variables in exactly the same places.

The static storage can then be an array of relative stack offsets to the function's pointer variables, so the GC can easily use this info to find roots. No need to complicate the GC with manipulating bitfields, it's just an int[].

If you want to get fancy, have the compiler reorder local variables so that pointers are clustered together in blocks, then in the static storage you can just encode pointer blocks by offset+length. (Although this may not help much with (pointer,length) pairs on the stack, like slices.) Or the compiler can reorder variables to maximize ID merges.

The same thing can be done for scopes, since their local variables are also all known at compile-time.


T

-- 
Spaghetti code may be tangly, but lasagna code is just cheesy.
February 22, 2012
On Thu, Feb 23, 2012 at 12:03:59AM +0400, Dmitry Olshansky wrote: [...]
> >7) Compiler Options
> >
> >Each of the above mentioned groups of features should be exposed as compiler options so that you can turn them on/off depending on which type of GC you use. Default on/off states for these features are set within a config file depending on which type of GC currently ships per default with druntime.
> 
> Combinatorial explosion of sets of options that doesn't necessary allow a particular GC?

Yeah, this is not a good way to go. Better would be for each GC come with a GC description file, that describes what hooks/info it needs from the compiler.  Then the compiler can read this description file (passed as a *single* compile flag) and do the right thing for that particular GC. It can even automatically link in that particular GC without needing you to specify anything further.

The GC description file can contain info like:

- Where/when to insert calls to GC functions (e.g., start collect cycle,
  start mark cycle)
- Which function to use for allocating memory
- Any additional info required:
  - e.g., map of each function's pointer local variables so that the GC
    knows where the roots are;
  - Whether or not hooks are needed for function entry/exit, and which
    GC function to map them to;
  - Any additional GC-specific info to insert into functions / structs /
    etc..
  - Whether or not pointer reads/writes need to include GC-specific code
    (the description can include code to insert, if needed).
- Path to GC source(s) to be compiled into the program.


T

-- 
If Java had true garbage collection, most programs would delete themselves upon execution. -- Robert Sewell
February 22, 2012
On Feb 22, 2012, at 10:56 AM, Benjamin Thaut wrote:
> 
> 5) pointer / reference changed callback
> 
> Every time a pointer / reference is changed the D compiler emits a call into the runtime and passes the new value of the reference / pointer with it.
> 
> void _d_pointerChanged(void *ptr);

D can call assembler, C routines like memset(), plain old opaque C library code, etc.  What should the D compiler do in light of all the sources of memory changes that it can't monitor?

> 6) Different interface for the GC
> 
> The current interface to the GC would have to change because the "this block of memory might contain a pointer" approach wouldn't work anymore. For example a block of memory and a delegate which iterates over all pointers within the memory block could be used for user allocated memory blocks. There should be a separate allocator function provided by the GC that allocates memory that does not get moved around so it can be used to pass it to non garbage collected code.

I posted a suggested new GC interface do the runtime mailing list 6 or so months ago.  In short, I do think the current interface is lacking.  Also, CDGC does support precise scanning and runs with Druntime.  The big problem there is that CDGC is based on the Tango GC (where Druntime's GC originated) and someone needs to review all the GC changes since the Druntime project was created to see what may need to be merged into CDGC.  I started on this once, but it turned out to be more work than I had time for.
February 23, 2012
On Wed, Feb 22, 2012 at 03:05:38PM -0800, Sean Kelly wrote:
> On Feb 22, 2012, at 10:56 AM, Benjamin Thaut wrote:
> > 
> > 5) pointer / reference changed callback
> > 
> > Every time a pointer / reference is changed the D compiler emits a call into the runtime and passes the new value of the reference / pointer with it.
> > 
> > void _d_pointerChanged(void *ptr);
> 
> D can call assembler, C routines like memset(), plain old opaque C library code, etc.  What should the D compiler do in light of all the sources of memory changes that it can't monitor?
[...]

Yeah, the GC should be capable of dealing with non-@safe code. Otherwise it would just be too limited to be used for large non-trivial D projects.

But still, some benchmarks do appear to be showing signs of a large performance hit on the GC when there happens to be many integers that look like valid pointers. This may be beyond the programmer's control, since it could be the OS that gives the GC an address segment that just happens to span integer values very commonly used throughout the app.


T

-- 
What doesn't kill me makes me stranger.
« First   ‹ Prev
1 2 3