January 11, 2014
On Fri, Jan 10, 2014 at 08:53:44PM -0800, Andrei Alexandrescu wrote:
> On 1/10/14 8:26 PM, Timon Gehr wrote:
> >On 01/10/2014 09:03 AM, Andrei Alexandrescu wrote:
> >>
> >>* There will be no possibility to change the type of certain objects once allocated. An allocation for an immutable object cannot e.g. make it later a mutable one. (This is an essential issue with the current allocator, I think.)
> >
> >I assume you are aware that there is implicit casting to immutable upon return from a strongly pure function. What about it?
> 
> I don't know. Need to think about it. Maybe it's a wrong design decision (or maybe separate heaps are wrong).
[...]

I dunno, separate heaps, esp. for immutable vs. mutable, seems like a very powerful GC optimization opportunity. Missing it seems like such a pity, esp. given the price we pay for immutability (transitivity, actual immutability guarantees unlike C/C++ const, etc.).


T

-- 
If you want to solve a problem, you need to address its root cause, not just its symptoms. Otherwise it's like treating cancer with Tylenol...
January 11, 2014

On 11.01.2014 00:58, deadalnix wrote:
> On Friday, 10 January 2014 at 20:37:53 UTC, Rainer Schuetze wrote:
>> I think this might help in having different heaps (especially thread
>> local heaps if you include "shared"), but I'm not sure this works in
>> unsafe D where casting is allowed.
>>
>
> One of the key to a fast GC is segregating the heap in smaller
> parts. This is why generational collectors are so popular in java
> for instance.

> Segregating the heap imply so complications. For instance, it is
> necessary to track reference from one heap to another, typically
> from young to older objects.
>
> In D, the type system provide a natural segregation. It would be
> a great missed opportunity to not take advantage of it. This
> segregation is really nice as pointers go from one to another in
> a single direction, avoiding the need to track references
> changes, and all element of the heap part share some common
> characteristics. These characteristics allow for specialized GC
> algorithms (for instance, a GC can run on the immutable heap
> without the program even knowing about it).
>
> Casting is by essence bypassing the type system. In this case,
> you must be help responsible for what happen. The language cannot
> provide guarantees.

This might work with safe D. I haven't yet figured the consequences of implicit heap segregation by type. What restrictions will this impose on system programming?

As compacting data is also pretty problematic as long as part of the scanning is conservative, we might as well consider storing generation and type information per allocation. This will destroy some optimization possibilities, though.
January 11, 2014

On 10.01.2014 22:42, Andrei Alexandrescu wrote:
> On 1/10/14 12:37 PM, Rainer Schuetze wrote:
>>> * At this point I won't worry about discovering roots; I assume druntime
>>> has the appropriate mechanisms in place.
>>
>> It currently does not have precise information, but it is dearly needed,
>> too.
>
> Yah, that's where I'm counting on your work :o).

I was thinking about using the proposed module info extension to generate data similar to RTInfo by interpreting global/tls data of a module as a struct with type info. I don't know if this is feasable.

>
>>> * I plan to rely on static introspection for the mark function, i.e:
>>>
>>> void mark(T)(ref T obj);
>>>
>>> would mark obj and everything transitively referred to by it. The
>>> function would probably take a second parameter that's the heap that obj
>>> is sitting in.
>>
>> I guess the mark function will be stored somewhere in the TypeInfo. How
>> is this going to work with dynamic and associative array operations if
>> these are not also templated?
>
> I need to write some code to explain it all. An figure it all :o).

Waiting eagerly for the code :-)

>
>>> * I plan to segregate all objects that don't include references and
>>> pointers (e.g. int, int[], Tuple!(int, double) etc) into a completely
>>> different heap than the "interesting" objects. For the simpler objects
>>> there's no need to save detailed type information so they can be stored
>>> in a more compact, efficient manner.
>>
>> So no more std.emplace?
>
> std.emplace will continue to work as a way to build an object at a
> specified address. I suspect that allocating and manipulating objects on
> the GC heap in particular may have certain restrictions. One possibility
> to avoid such restrictions is to have a function typify(T)(void* p)
> which ascribes type T to heap location p.

That sounds similar to my gc_emplace function. The problematic part is how to save that information in the GC.

>>> * At this point I'm unclear on how generations can be componentized, but
>>> am cautiously optimistic. Will see once I get to it.
>>>
>>> One thing that would be great now would be to make an effort to review
>>> and merge the current precise GC work. I'm sure it will be of great help
>>> with breaking into components.
>>
>> As written in the other thread ("how to contribute to GC"), I have just
>> made an attempt to make it more reviewable:
>> https://github.com/rainers/druntime/commits/gcx_precise2
>>
>> The necessary compiler fixes are here:
>> https://github.com/D-Programming-Language/dmd/pull/2480
>
> Yah, time for you to get some destruction first :o).

Ready to get some blows...
January 11, 2014
Am 10.01.2014 20:40, schrieb Andrei Alexandrescu:
> On 1/10/14 11:34 AM, Benjamin Thaut wrote:
>> Am 10.01.2014 09:03, schrieb Andrei Alexandrescu:
>>>
>>> Destroy.
>>>
>>> Andrei
>>
>> Just one question. Did you read "the garbage collection handbook"?
>
> Yah, the new edition. Unfortunately I didn't find a lot of new stuff or
> things that would help in a practical implementation.
>
> Andrei

Very good,

In my opinion it should be required for everyone who wants to participate in D's GC to read that book. So everyone has at least a basic understanding of the problem at Hand.

But if you read the book I don't understand, why you simply declare the hardest problem, percise pointer discovery, as done. To be able to actually implement and test a GC that is not a reimplementation of what we already have one needs percise pointer discovery of _all_ pointers, write barriers and GC halting points. So please enlighten me why you simply decalre this done?

Kind Regards
Benjamin Thaut
January 11, 2014
On 2014-01-11 10:30, Rainer Schuetze wrote:

> I was thinking about using the proposed module info extension to
> generate data similar to RTInfo by interpreting global/tls data of a
> module as a struct with type info. I don't know if this is feasable.

You mean this: https://github.com/D-Programming-Language/dmd/pull/2271 ? I really hope this can be merged sometime.

-- 
/Jacob Carlborg
January 11, 2014
On 1/11/14 3:52 AM, Benjamin Thaut wrote:
> Am 10.01.2014 20:40, schrieb Andrei Alexandrescu:
>> On 1/10/14 11:34 AM, Benjamin Thaut wrote:
>>> Am 10.01.2014 09:03, schrieb Andrei Alexandrescu:
>>>>
>>>> Destroy.
>>>>
>>>> Andrei
>>>
>>> Just one question. Did you read "the garbage collection handbook"?
>>
>> Yah, the new edition. Unfortunately I didn't find a lot of new stuff or
>> things that would help in a practical implementation.
>>
>> Andrei
>
> Very good,
>
> In my opinion it should be required for everyone who wants to
> participate in D's GC to read that book. So everyone has at least a
> basic understanding of the problem at Hand.
>
> But if you read the book I don't understand, why you simply declare the
> hardest problem, percise pointer discovery, as done. To be able to
> actually implement and test a GC that is not a reimplementation of what
> we already have one needs percise pointer discovery of _all_ pointers,
> write barriers and GC halting points. So please enlighten me why you
> simply decalre this done?

I'm not considering done as much as separable as a concern. All I'm saying is I hope to be able to separate the low-level part of discovering roots from the high-level part of marking used memory. BTW the way I see this done is "mostly precise", i.e. there will be at least for a while some words that will be handled conservatively (stack, registers, certain union members).

If there's anything in the GC book that suggest that would be impossible, please do let me know!


Andrei

January 11, 2014
Am 11.01.2014 21:44, schrieb Andrei Alexandrescu:
> On 1/11/14 3:52 AM, Benjamin Thaut wrote:
>> Am 10.01.2014 20:40, schrieb Andrei Alexandrescu:
>>> On 1/10/14 11:34 AM, Benjamin Thaut wrote:
>>>> Am 10.01.2014 09:03, schrieb Andrei Alexandrescu:
>>>>>
>>>>> Destroy.
>>>>>
>>>>> Andrei
>>>>
>>>> Just one question. Did you read "the garbage collection handbook"?
>>>
>>> Yah, the new edition. Unfortunately I didn't find a lot of new stuff or
>>> things that would help in a practical implementation.
>>>
>>> Andrei
>>
>> Very good,
>>
>> In my opinion it should be required for everyone who wants to
>> participate in D's GC to read that book. So everyone has at least a
>> basic understanding of the problem at Hand.
>>
>> But if you read the book I don't understand, why you simply declare the
>> hardest problem, percise pointer discovery, as done. To be able to
>> actually implement and test a GC that is not a reimplementation of what
>> we already have one needs percise pointer discovery of _all_ pointers,
>> write barriers and GC halting points. So please enlighten me why you
>> simply decalre this done?
>
> I'm not considering done as much as separable as a concern. All I'm
> saying is I hope to be able to separate the low-level part of
> discovering roots from the high-level part of marking used memory. BTW
> the way I see this done is "mostly precise", i.e. there will be at least
> for a while some words that will be handled conservatively (stack,
> registers, certain union members).
>
> If there's anything in the GC book that suggest that would be
> impossible, please do let me know!
>
>
> Andrei
>

Well as far as my understanding of GCs goes, you have two options:

1) Impercise pointer discovery => limiting yourself to a mark & sweep
2) Percise pointer disccovery => option to use a semi space GC in combination with a mark & sweep and generations, which allows for superior handling of short lived allocations (which is the biggest problem of D's current GC).

Also without percise pointer discovery you will never be able to move objects from one heap into another. This would be especially a problem for the immutable heap, because you might want to allocate all strings on the thread local heap until you discover that you actually need them to be shared between threads. You might also need to move objects into antoher heap whenever a casts happens or global variable is assigned.

Kind Regards
Benjamin Thaut
January 11, 2014
On 1/11/14 1:20 PM, Benjamin Thaut wrote:
> 1) Impercise pointer discovery => limiting yourself to a mark & sweep
> 2) Percise pointer disccovery => option to use a semi space GC in
> combination with a mark & sweep and generations, which allows for
> superior handling of short lived allocations (which is the biggest
> problem of D's current GC).

The way I see it/hope it turns out, precision is a separate concern from what I'm working on now, which is tracing.

Andrei

P.S. s/percise/precise/g
January 12, 2014
Am 11.01.2014 22:20, schrieb Benjamin Thaut:
> Also without percise pointer discovery you will never be able to move
> objects from one heap into another. This would be especially a problem
> for the immutable heap, because you might want to allocate all strings
> on the thread local heap until you discover that you actually need them
> to be shared between threads.

The language can help a lot there (pure... and scope, if "fixed"), so that the compiler can emit the necessary calls to make this explicit, because it already knows that it's safe without performing a garbage collection cycle.

January 12, 2014
Am 11.01.2014 05:26, schrieb Timon Gehr:
> On 01/10/2014 09:03 AM, Andrei Alexandrescu wrote:
>>
>> * There will be no possibility to change the type of certain objects
>> once allocated. An allocation for an immutable object cannot e.g. make
>> it later a mutable one. (This is an essential issue with the current
>> allocator, I think.)
>
> I assume you are aware that there is implicit casting to immutable upon
> return from a strongly pure function. What about it?

I'd say the easiest way to deal with it is to let the compiler emit a "gc_isolatedToImmutable"-like call in this case which moves the object from one heap to another, assuming that the language guarantees safety.