June 08, 2013

On 06.06.2013 22:27, Benjamin Thaut wrote:
> Am 06.06.2013 08:28, schrieb Rainer Schuetze:
>>
>>
>> On 05.06.2013 16:14, bearophile wrote:
>>> Andrei Alexandrescu:
>>>
>>>> http://www.reddit.com/r/programming/comments/1fpw2r/dconf_2013_day_2_talk_5_a_precise_garbage/
>>>>
>>>>
>>>>
>>>
>>> Is this useful to make the GC precise regarding the stack too?
>>>
>>> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.5570
>>
>> I was imagining something similar too. The part about multi-threading in
>> the paper isn't too convincing, though. Especially the need for GC safe
>> points in tight loops to block a thread during collection will not get
>> many friends in the D community.
>>
>
> I think you don't really need percise scanning which is thread safe. If
> you have one pool per thread, and you can scan the thread local pools
> percisley within the thread that would be enough. Because you would then
> be able to do generational garbage collection for the thread local
> pools. If you have to scan one of (or the) global pool, percise scanning
> of the stacks is not really needed, the old impercises scanning is
> sufficient, you just have to pin those memory blocks you might think are
> referenced from stack memory.

Wouldn't that mean a write-barrier for every pointer assignment?

>
> But to be able to actually do thread local pools a bit more then write
> barriers would be needed. For each of the following situations a call
> into druntime would be needed.
>
> 1) Creating a shared or immutable object
> 2) Casting to shared or immutable
> 3) Assigning a shared, immutable, __gshared global or static variable

Considering "string" is "immutable(char)[]", would you want to allocate all temporary strings on the global heap? Also, I don't like to have possible expensive operations for casting.

>
> If you have these and you are able to percisley scan the stack for the
> current thread only you will then be able to move all of the referenced
> memory from the thread local pool to the global pool if any of the above
> calls happen.
>
> This would mean that most of the time only thread local garbage
> collection would be neccessary and you won't have to stop other threads
> from doing whatever they are doing. Only in rare cases it should be
> necessary to scan the global pool.

I agree that a thread local pool can give good performance improvements. But as long as you still have a global heap (which you probably cannot eliminate), it's not a simplification to have thread local garbage collections in addition.

The problem to implement it is that shared semantics are still pretty undefined. AFAICT "shared" is only a type modifier that has different conversion rules than non-shared types. There are no runtime guarantees with "shared", even less with the absence of shared, and even if they exist, __gshared and casting are meant to subvert them.
June 08, 2013

On 06.06.2013 21:42, Diggory wrote:
> Interesting talk. It seems there are a few features D could provide
> which would make for some more powerful GCs:
>
> - Hook for when generating code which reads/writes a pointer allowing GC
> to put in read/write barriers where necessary.
>
> - Hook for union assignment so that the GC could emplace the correct
> type bits
>
> - Hook for passing pointers to extern(XXX) functions so that the GC can
> pin the arguments allowing a compacting GC
>

There are also D functions annotated extern(C) to subvert the module system. It might be a bit expensive to pin any pointer passed to such function.

> - Hook for function call/return so that stack type-info can be stored.
> There's actually another way to do this which would have less overhead -
> at compile time the compiler could write the address range of each
> function along with a bitmap for possible pointers in that function to
> some known location. When the GC runs it could unwind the stack and look
> each frame pointer up in the list of functions to find the bit pattern
> to use. When calling a function that does some low-level stuff and might
> prevent the unwinding from working correctly it would be possible to
> switch back to conservative mode until the function returns.

I was considering something like that too (AFAICT exception handling on Win64 works similar), but the problem is to actually properly unwind the stack. If you have called some C function with a callback into your D code, stack unwinding might not be able to find the correct stack frames above the C function, so that you don't scan the stack above that.
June 08, 2013
Am 08.06.2013 09:50, schrieb Rainer Schuetze:
>
>
> On 06.06.2013 22:27, Benjamin Thaut wrote:
>> Am 06.06.2013 08:28, schrieb Rainer Schuetze:
>>>
>>>
>>> On 05.06.2013 16:14, bearophile wrote:
>>>> Andrei Alexandrescu:
>>>>
>>>>> http://www.reddit.com/r/programming/comments/1fpw2r/dconf_2013_day_2_talk_5_a_precise_garbage/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> Is this useful to make the GC precise regarding the stack too?
>>>>
>>>> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.5570
>>>
>>> I was imagining something similar too. The part about multi-threading in
>>> the paper isn't too convincing, though. Especially the need for GC safe
>>> points in tight loops to block a thread during collection will not get
>>> many friends in the D community.
>>>
>>
>> I think you don't really need percise scanning which is thread safe. If
>> you have one pool per thread, and you can scan the thread local pools
>> percisley within the thread that would be enough. Because you would then
>> be able to do generational garbage collection for the thread local
>> pools. If you have to scan one of (or the) global pool, percise scanning
>> of the stacks is not really needed, the old impercises scanning is
>> sufficient, you just have to pin those memory blocks you might think are
>> referenced from stack memory.
>
> Wouldn't that mean a write-barrier for every pointer assignment?

Not for every pointer assignment, only for pointers that are on the heap. Also most of these write barries could be skipped with really cheap tests most of the time.

>
>>
>> But to be able to actually do thread local pools a bit more then write
>> barriers would be needed. For each of the following situations a call
>> into druntime would be needed.
>>
>> 1) Creating a shared or immutable object
>> 2) Casting to shared or immutable
>> 3) Assigning a shared, immutable, __gshared global or static variable
>
> Considering "string" is "immutable(char)[]", would you want to allocate
> all temporary strings on the global heap? Also, I don't like to have
> possible expensive operations for casting.

Good point, didn't think about that yet. In Order to improve the performance of the GC we have to sacrifice performance in other places of the language in my opinion. Given the fact that the current GC is 3 times slower then manual memory mangement (in my test case). It should be easly possible to take a performance hit here and there but end up beeing faster in the end anyway. The current GC is especially bad with short lived allocations and I don't think this is going to get any better when only using a Mark & Sweep. The D community has to see that to get a state of the art GC some perfomance sacrifces have to be done. If those are not wanted we can go back to manual memory management right away, in my opinion. I recently also had a long talk with Claus Gittinger (created the SmalltalkX VM) about the topic. He also had the same opinion on the topic. He also stated that it is going to be very hard if not impossible to implement a state of the Art GC into a language like D which does not have any restrictions in the language which help with that Task.

>
>>
>> If you have these and you are able to percisley scan the stack for the
>> current thread only you will then be able to move all of the referenced
>> memory from the thread local pool to the global pool if any of the above
>> calls happen.
>>
>> This would mean that most of the time only thread local garbage
>> collection would be neccessary and you won't have to stop other threads
>> from doing whatever they are doing. Only in rare cases it should be
>> necessary to scan the global pool.
>
> I agree that a thread local pool can give good performance improvements.
> But as long as you still have a global heap (which you probably cannot
> eliminate), it's not a simplification to have thread local garbage
> collections in addition.

No one said that it is a simplification. Its always going to get more complicated if you want a better GC.

>
> The problem to implement it is that shared semantics are still pretty
> undefined. AFAICT "shared" is only a type modifier that has different
> conversion rules than non-shared types. There are no runtime guarantees
> with "shared", even less with the absence of shared, and even if they
> exist, __gshared and casting are meant to subvert them.

Then maybe the semantics of shared should be defined with a GC in mind.

June 12, 2013
On 06/05/2013 10:23 AM, Andrei Alexandrescu wrote:
> Reddit: http://www.reddit.com/r/programming/comments/1fpw2r/dconf_2013_day_2_talk_5_a_precise_garbage/
> 
> Hackernews: https://news.ycombinator.com/item?id=5825320
> 
> Twitter: https://twitter.com/D_Programming/status/342269600689430529
> 
> Facebook: https://www.facebook.com/dlang.org/posts/651801198166898
> 
> Youtube: http://youtube.com/watch?v=LQY1m_eT37c
> 
> Please drive discussions on the social channels, they help D a lot.
> 
> 
> Andrei


Loved this talk.

Would struct have an extra field in memory pointing to the
needed type info? If all of this is implemented, will this
mean that an array of structs will not have their data contiguous
in memory?

Thanks for the talk!

--jm



June 12, 2013
On 06/05/2013 10:23 AM, Andrei Alexandrescu wrote:
> Reddit: http://www.reddit.com/r/programming/comments/1fpw2r/dconf_2013_day_2_talk_5_a_precise_garbage/
> 
> Hackernews: https://news.ycombinator.com/item?id=5825320
> 
> Twitter: https://twitter.com/D_Programming/status/342269600689430529
> 
> Facebook: https://www.facebook.com/dlang.org/posts/651801198166898
> 
> Youtube: http://youtube.com/watch?v=LQY1m_eT37c
> 
> Please drive discussions on the social channels, they help D a lot.
> 
> 
> Andrei


Loved this talk.

Would struct have an extra field in memory pointing to the
needed type info? If all of this is implemented, will this
mean that an array of structs will not have their data contiguous
in memory?

Thanks for the talk!

--jm



June 12, 2013

On 12.06.2013 15:38, Juan Manuel Cabo wrote:
> On 06/05/2013 10:23 AM, Andrei Alexandrescu wrote:
>> Reddit: http://www.reddit.com/r/programming/comments/1fpw2r/dconf_2013_day_2_talk_5_a_precise_garbage/
>>
>> Hackernews: https://news.ycombinator.com/item?id=5825320
>>
>> Twitter: https://twitter.com/D_Programming/status/342269600689430529
>>
>> Facebook: https://www.facebook.com/dlang.org/posts/651801198166898
>>
>> Youtube: http://youtube.com/watch?v=LQY1m_eT37c
>>
>> Please drive discussions on the social channels, they help D a lot.
>>
>>
>> Andrei
>
>
> Loved this talk.

Thanks.

>
> Would struct have an extra field in memory pointing to the
> needed type info? If all of this is implemented, will this
> mean that an array of structs will not have their data contiguous
> in memory?

A struct allocated on the heap does not need the type info in memory, it is passed with the call invoked by "new" and it is used to supply the bitmap of pointers that is copied to the appropriate memory.

Due to the implementation details of arrays, it is a bit different for them: The memory layout changes slightly depending on the size of the array, so it is not the allocation that creates the pointer bitmap, but the array functions call "emplace" to fill the bitmap from an array type info. The memory layout of the array itself is unchanged.
June 27, 2013
Andrei Alexandrescu:

> http://www.reddit.com/r/programming/comments/1fpw2r/dconf_2013_day_2_talk_5_a_precise_garbage/

Another thing to keep in account while designing a more precise garbage collection is a possible special casing for Algebraic (and Variant, and more generally for some standardized kind of tagged union):

http://d.puremagic.com/issues/show_bug.cgi?id=5057

In an Algebraic there is run-time information for the GC to decide if inside it there are pointers to follow or not. It's mostly a matter of letting the GC recognize and use such information.

Bye,
bearophile
July 01, 2013

On 27.06.2013 19:33, bearophile wrote:
> Andrei Alexandrescu:
>
>> http://www.reddit.com/r/programming/comments/1fpw2r/dconf_2013_day_2_talk_5_a_precise_garbage/
>>
>
> Another thing to keep in account while designing a more precise garbage
> collection is a possible special casing for Algebraic (and Variant, and
> more generally for some standardized kind of tagged union):
>
> http://d.puremagic.com/issues/show_bug.cgi?id=5057
>
> In an Algebraic there is run-time information for the GC to decide if
> inside it there are pointers to follow or not. It's mostly a matter of
> letting the GC recognize and use such information.

In the proposed implementation, the gc_emplace function can be used to pass this information to the GC. This would need to be called whenever the location of pointers changes, so it's not high-performance.


July 01, 2013
Rainer Schuetze:

> In the proposed implementation, the gc_emplace function can be used to pass this information to the GC. This would need to be called whenever the location of pointers changes, so it's not high-performance.

Thank you for the answer.

Let me see if I understand what you are saying. If I have an usage of Algebraic like this, at line 4 x contains no GC-managed pointers, while at line 6 it contains a pointer to the array data:


import std.variant: Algebraic;
void main() {
    alias T = Algebraic!(ulong, int[]);
    T x = 1UL;  // line 4
    auto items = [1, 2];
    x = items; // line 6
    x = 2UL;   // line 7
    x = items; // line 8
}


You say that every time you change the _type_ of the contents of x you have to call gc_emplace.

But isn't the D GC called only when an allocation occurs? So what's the point of calling gc_emplace (at lines 7 and 8) if no garbage collection happens? Isn't it possible for the GC to use (and update) this information lazily, only when a collection occurs?

(Extra note: Algebraic is meant for functional-style programming. In such kind of programming data is mostly immutable. This means you don't change the contents and the type of an algebraic variable once it is assigned. So usually you call gc_emplace only once for one of such variables).

Bye,
bearophile
July 02, 2013

On 01.07.2013 12:30, bearophile wrote:
> Rainer Schuetze:
>
>> In the proposed implementation, the gc_emplace function can be used to
>> pass this information to the GC. This would need to be called whenever
>> the location of pointers changes, so it's not high-performance.
>
> Thank you for the answer.
>
> Let me see if I understand what you are saying. If I have an usage of
> Algebraic like this, at line 4 x contains no GC-managed pointers, while
> at line 6 it contains a pointer to the array data:
>
>
> import std.variant: Algebraic;
> void main() {
>      alias T = Algebraic!(ulong, int[]);
>      T x = 1UL;  // line 4
>      auto items = [1, 2];
>      x = items; // line 6
>      x = 2UL;   // line 7
>      x = items; // line 8
> }
>
>
> You say that every time you change the _type_ of the contents of x you
> have to call gc_emplace.

Yes, that would be Algebraic's responsibility. An optimization is to only do it if the respective pointer info changes, but that would tie it a bit more to the GC implementation.

In your example, x is a stack variable, so as long as there is no precise scanning of the stack, that actually generates a lot of useless interaction with the GC.

>
> But isn't the D GC called only when an allocation occurs? So what's the
> point of calling gc_emplace (at lines 7 and 8) if no garbage collection
> happens? Isn't it possible for the GC to use (and update) this
> information lazily, only when a collection occurs?

I was thinking about allowing some callback to do the scanning, but that has some serious drawbacks:

- it is not easily composable if structs like Algebraic are fields of a class/struct. You would not want to call a function for every field.

- having to call a (virtual) scanning function for every allocated object is probably also a performance killer, though I haven't tried it yet.

- it does not allow very unpredictable usages of memory using std.emplace.

Doing just the pointer bitmap update lazily is an interesting idea, though I'm not sure if it is actually less work to cache this info than to update the bitmap immediately.

>
> (Extra note: Algebraic is meant for functional-style programming. In
> such kind of programming data is mostly immutable. This means you don't
> change the contents and the type of an algebraic variable once it is
> assigned. So usually you call gc_emplace only once for one of such
> variables).
>
> Bye,
> bearophile
1 2
Next ›   Last »