January 10, 2014
On 1/10/14 3:29 AM, Dmitry Olshansky wrote:
> 10-Jan-2014 12:03, Andrei Alexandrescu пишет:
>> * I plan to rely on static introspection for the mark function, i.e:
>>
>> void mark(T)(ref T obj);
>>
>> would mark obj and everything transitively referred to by it. The
>> function would probably take a second parameter that's the heap that obj
>> is sitting in.
>
> This will not work AFAICT. Since nobody can tell what kind of type a
> user-defined program may have and GC is certainly not templated on all
> types.
> GC would have to rely on TypeInfos or rather RTInfo-s that describe the
> layout of Ts in a uniform way (bitmaps to indicate pointers, etc.).
> I think Walter had some ideas on how to represent these to reduce space.

You know I wouldn't write that unless I had an ace up my sleeve. Watch me :o).

>> * I plan to segregate all objects that don't include references and
>> pointers (e.g. int, int[], Tuple!(int, double) etc) into a completely
>> different heap than the "interesting" objects. For the simpler objects
>> there's no need to save detailed type information so they can be stored
>> in a more compact, efficient manner.
>>
>
> Yup. I though GC already did something like that, with BlkAttr.NO_SCAN.

That design will disappear. The attribute won't be settable once initialized, and will never be conservative ("scan all words in this block").

>> * There will be no possibility to change the type of certain objects
>> once allocated. An allocation for an immutable object cannot e.g. make
>> it later a mutable one. (This is an essential issue with the current
>> allocator, I think.)
>
> Would that mean 3 separate heaps, so that addresses given for immutable
> stuff would never be (sometime later) assigned to (thread-local)
> mutable? I might have got this point of yours wrong, but I think there
> is a merit in having separate heap at least for shared stuff.

Yah, separation of shared data allocation is crucial. We'll see.


Andrei
January 10, 2014
On Friday, 10 January 2014 at 08:03:03 UTC, Andrei Alexandrescu
wrote:
> The GC comes up repeatedly in discussions around here. I've been thinking for a while about breaking it down into components, and now it seems the time is ripe.
>
> * The design will foster the small, composable components with required/optional primitives that worked for std.allocator.
>
> * I plan to review and probably discard all of the pointers-to-functions nonsense in the current GC.

This only exists because the runtime is statically linked and
supporting D in a DLL was a requirement.  Grafting together GCs
from multiple runtimes was the easiest way to do this at the
time, but the correct solution is really just to put the runtime
in a DLL and jettison the whole idea of a GC proxy.  In short,
I'm all for it.


> * I plan to rely on static introspection for the mark function, i.e:
>
> void mark(T)(ref T obj);
>
> would mark obj and everything transitively referred to by it. The function would probably take a second parameter that's the heap that obj is sitting in.

This will take some work.  Maybe thread_scanAll can take an alias
to such a function instead of a function pointer as it does
today.  Something along those lines anyway.  In short, the
template part is what's tricky about this approach.


> * I plan to segregate all objects that don't include references and pointers (e.g. int, int[], Tuple!(int, double) etc) into a completely different heap than the "interesting" objects. For the simpler objects there's no need to save detailed type information so they can be stored in a more compact, efficient manner.

Yay!  I don't like how the current GC mixes these and indicates
the presence of pointers via a bit flag.


> * There will be no possibility to change the type of certain objects once allocated. An allocation for an immutable object cannot e.g. make it later a mutable one. (This is an essential issue with the current allocator, I think.)

I think this is reasonable, since it also may affect which pool
the data is allocated in.  I really would love to aim for a
concurrent allocator scheme.  Kinda like HOARD but with a GC
attached.  But this is really only feasible if you make some hard
assertions about type conversion.
January 10, 2014
Oh, I assume this will eventually include your ideas about having
the GC manage mapped files?  That one has lingered on the
"someday" pile for too long.
January 10, 2014
Am 10.01.2014 09:03, schrieb Andrei Alexandrescu:
>
> Destroy.
>
> Andrei

Just one question. Did you read "the garbage collection handbook"?
January 10, 2014
On 1/10/14 10:57 AM, Sean Kelly wrote:
> Oh, I assume this will eventually include your ideas about having
> the GC manage mapped files?  That one has lingered on the
> "someday" pile for too long.

Yah, such integration would make a lot of sense. Thanks for the reminder, I'd forgotten about it.

Andrei
January 10, 2014
On 1/10/14 11:34 AM, Benjamin Thaut wrote:
> Am 10.01.2014 09:03, schrieb Andrei Alexandrescu:
>>
>> Destroy.
>>
>> Andrei
>
> Just one question. Did you read "the garbage collection handbook"?

Yah, the new edition. Unfortunately I didn't find a lot of new stuff or things that would help in a practical implementation.

Andrei
January 10, 2014
Am Fri, 10 Jan 2014 00:03:03 -0800
schrieb Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>:

> The GC comes up repeatedly in discussions around here. I've been thinking for a while about breaking it down into components, and now it seems the time is ripe.
> 
> The design at http://goo.gl/ZVCJeB seems to be a win. It works well, comprehends all major allocation tropes, someone implemented a subset of it in C++ and measured good results, and a coworker is considering adopting the design for a C++ project as well.
> 
> I've started with the next logical step - higher-level allocation that is aware of the type of the object being allocated, and realized that integrating a notion of tracing is appropriate at that level, and actually quite easy. So I'm thinking of just doing it.
> 
> A few highlights:
> 
> * The design will foster the small, composable components with required/optional primitives that worked for std.allocator.
> 
> * I plan to review and probably discard all of the pointers-to-functions nonsense in the current GC.
> 
> * At this point I won't worry about discovering roots; I assume druntime has the appropriate mechanisms in place. Instead I'll focus on primitives such as "given this root, mark all that transitively follows from it" (conservatively or not).
> 
> * I plan to rely on static introspection for the mark function, i.e:
> 
> void mark(T)(ref T obj);
> 
> would mark obj and everything transitively referred to by it. The function would probably take a second parameter that's the heap that obj is sitting in.
> 
> * I plan to segregate all objects that don't include references and pointers (e.g. int, int[], Tuple!(int, double) etc) into a completely different heap than the "interesting" objects. For the simpler objects there's no need to save detailed type information so they can be stored in a more compact, efficient manner.
> 
> * There will be no possibility to change the type of certain objects once allocated. An allocation for an immutable object cannot e.g. make it later a mutable one. (This is an essential issue with the current allocator, I think.)
> 
> * At this point I'm unclear on how generations can be componentized, but am cautiously optimistic. Will see once I get to it.
> 
> One thing that would be great now would be to make an effort to review and merge the current precise GC work. I'm sure it will be of great help with breaking into components.
> 
> 
> Destroy.
> 
> Andrei

Nothing to destroy yet. But it does feel good that someone is putting the pieces together to come up with the second generation of D garbage collection. With pieces I mean the discussions about making it optional or allowing user defined allocators, how immutable can be used to an advantage, precise scanning and generational GC. Basically there are enough use cases now that one can shape a new GC model around it.

-- 
Marco

January 10, 2014
On Fri, Jan 10, 2014 at 08:22:56AM -0800, Andrei Alexandrescu wrote:
> On 1/10/14 3:29 AM, Dmitry Olshansky wrote:
> >10-Jan-2014 12:03, Andrei Alexandrescu пишет:
> >>* I plan to rely on static introspection for the mark function, i.e:
> >>
> >>void mark(T)(ref T obj);
> >>
> >>would mark obj and everything transitively referred to by it. The function would probably take a second parameter that's the heap that obj is sitting in.
> >
> >This will not work AFAICT. Since nobody can tell what kind of type a user-defined program may have and GC is certainly not templated on all types.  GC would have to rely on TypeInfos or rather RTInfo-s that describe the layout of Ts in a uniform way (bitmaps to indicate pointers, etc.).  I think Walter had some ideas on how to represent these to reduce space.
> 
> You know I wouldn't write that unless I had an ace up my sleeve. Watch me :o).

I don't see why having a templated mark() is a problem. Nothing says you
can't do this:

	module core.gc;
	void mark(T)(ref T obj) {
		size_t objSize = T.sizeof;
		// ... and whatever other type-specific info you need
		// here.

		// N.B.: non-template function call
		__mark_impl(cast(void*)&obj, objSize, /* ... whatever else is needed */);
	}


> >>* I plan to segregate all objects that don't include references and pointers (e.g. int, int[], Tuple!(int, double) etc) into a completely different heap than the "interesting" objects. For the simpler objects there's no need to save detailed type information so they can be stored in a more compact, efficient manner.
> >>
> >
> >Yup. I though GC already did something like that, with BlkAttr.NO_SCAN.
> 
> That design will disappear. The attribute won't be settable once initialized, and will never be conservative ("scan all words in this block").
[...]

So this will be the beginning of the precise GC that we've been talking about for who knows how long now? I like that!


T

-- 
Любишь кататься - люби и саночки возить.
January 10, 2014

On 10.01.2014 09:03, Andrei Alexandrescu wrote:
> The GC comes up repeatedly in discussions around here. I've been
> thinking for a while about breaking it down into components, and now it
> seems the time is ripe.
>
> The design at http://goo.gl/ZVCJeB seems to be a win. It works well,
> comprehends all major allocation tropes, someone implemented a subset of
> it in C++ and measured good results, and a coworker is considering
> adopting the design for a C++ project as well.
>
> I've started with the next logical step - higher-level allocation that
> is aware of the type of the object being allocated, and realized that
> integrating a notion of tracing is appropriate at that level, and
> actually quite easy. So I'm thinking of just doing it.
>
> A few highlights:
>
> * The design will foster the small, composable components with
> required/optional primitives that worked for std.allocator.
>
> * I plan to review and probably discard all of the pointers-to-functions
> nonsense in the current GC.

You mean the proxy functions? Yes, please axe 'em, that only works in very simple single-threaded programs.

>
> * At this point I won't worry about discovering roots; I assume druntime
> has the appropriate mechanisms in place.

It currently does not have precise information, but it is dearly needed, too.

> Instead I'll focus on
> primitives such as "given this root, mark all that transitively follows
> from it" (conservatively or not).
>
> * I plan to rely on static introspection for the mark function, i.e:
>
> void mark(T)(ref T obj);
>
> would mark obj and everything transitively referred to by it. The
> function would probably take a second parameter that's the heap that obj
> is sitting in.

I guess the mark function will be stored somewhere in the TypeInfo. How is this going to work with dynamic and associative array operations if these are not also templated?

>
> * I plan to segregate all objects that don't include references and
> pointers (e.g. int, int[], Tuple!(int, double) etc) into a completely
> different heap than the "interesting" objects. For the simpler objects
> there's no need to save detailed type information so they can be stored
> in a more compact, efficient manner.

So no more std.emplace?

>
> * There will be no possibility to change the type of certain objects
> once allocated. An allocation for an immutable object cannot e.g. make
> it later a mutable one. (This is an essential issue with the current
> allocator, I think.)

I think this might help in having different heaps (especially thread local heaps if you include "shared"), but I'm not sure this works in unsafe D where casting is allowed.

> * At this point I'm unclear on how generations can be componentized, but
> am cautiously optimistic. Will see once I get to it.
>
> One thing that would be great now would be to make an effort to review
> and merge the current precise GC work. I'm sure it will be of great help
> with breaking into components.

As written in the other thread ("how to contribute to GC"), I have just made an attempt to make it more reviewable: https://github.com/rainers/druntime/commits/gcx_precise2

The necessary compiler fixes are here: https://github.com/D-Programming-Language/dmd/pull/2480

>
>
> Destroy.
>
> Andrei
January 10, 2014
On 1/10/14 12:37 PM, Rainer Schuetze wrote:
>> * At this point I won't worry about discovering roots; I assume druntime
>> has the appropriate mechanisms in place.
>
> It currently does not have precise information, but it is dearly needed,
> too.

Yah, that's where I'm counting on your work :o).

>> * I plan to rely on static introspection for the mark function, i.e:
>>
>> void mark(T)(ref T obj);
>>
>> would mark obj and everything transitively referred to by it. The
>> function would probably take a second parameter that's the heap that obj
>> is sitting in.
>
> I guess the mark function will be stored somewhere in the TypeInfo. How
> is this going to work with dynamic and associative array operations if
> these are not also templated?

I need to write some code to explain it all. An figure it all :o).

>> * I plan to segregate all objects that don't include references and
>> pointers (e.g. int, int[], Tuple!(int, double) etc) into a completely
>> different heap than the "interesting" objects. For the simpler objects
>> there's no need to save detailed type information so they can be stored
>> in a more compact, efficient manner.
>
> So no more std.emplace?

std.emplace will continue to work as a way to build an object at a specified address. I suspect that allocating and manipulating objects on the GC heap in particular may have certain restrictions. One possibility to avoid such restrictions is to have a function typify(T)(void* p) which ascribes type T to heap location p.

>> * There will be no possibility to change the type of certain objects
>> once allocated. An allocation for an immutable object cannot e.g. make
>> it later a mutable one. (This is an essential issue with the current
>> allocator, I think.)
>
> I think this might help in having different heaps (especially thread
> local heaps if you include "shared"), but I'm not sure this works in
> unsafe D where casting is allowed.

Type-forging casts have always been somewhat implementation-defined. The way I see it they'll continue to only work for people who really know what they're doing. I'm not too worried about that.

>> * At this point I'm unclear on how generations can be componentized, but
>> am cautiously optimistic. Will see once I get to it.
>>
>> One thing that would be great now would be to make an effort to review
>> and merge the current precise GC work. I'm sure it will be of great help
>> with breaking into components.
>
> As written in the other thread ("how to contribute to GC"), I have just
> made an attempt to make it more reviewable:
> https://github.com/rainers/druntime/commits/gcx_precise2
>
> The necessary compiler fixes are here:
> https://github.com/D-Programming-Language/dmd/pull/2480

Yah, time for you to get some destruction first :o).


Andrei