View mode: basic / threaded / horizontal-split · Log in · Help
October 17, 2012
Shared keyword and the GC?
Hello,
in the discussions thread in the recent blog post which 
summarized how GC works(*), the topic of thread-local GC was 
further discussed and I pointed out that by default global 
variables in D are thread local but I was answered that the types 
doesn't tell which global variable are thread local and which are 
shared so the GC cannot use this information, is-it true?
It seems like a wasted opportunity..

BR,
renoX


*:
http://xtzgzorex.wordpress.com/2012/10/11/demystifying-garbage-collectors/
October 17, 2012
Re: Shared keyword and the GC?
On 17-10-2012 10:29, renoX wrote:
> Hello,
> in the discussions thread in the recent blog post which summarized how
> GC works(*), the topic of thread-local GC was further discussed and I
> pointed out that by default global variables in D are thread local but I
> was answered that the types doesn't tell which global variable are
> thread local and which are shared so the GC cannot use this information,
> is-it true?
> It seems like a wasted opportunity..
>
> BR,
> renoX
>
>
> *:
> http://xtzgzorex.wordpress.com/2012/10/11/demystifying-garbage-collectors/

Let's step back for a bit and think about what we want to achieve with 
thread-local garbage collection. The idea is that we look only at a 
single thread's heap (and stack/registers, of course) when doing a 
collection. This means that we can -- theoretically -- stop only one 
thread at a time and only when it needs to be stopped. This is clearly a 
huge win in scalability and raw speed. With a scheme like this, it might 
even be possible to get away with a simple mark-sweep or copying GC per 
thread instead of a complicated generational GC, mainly due to the 
paradigms the isolation model induces.

Rust, as it is today, can do this. Tasks (or threads if you will - 
though they aren't the same thing) are completely isolated. Types that 
can potentially contain pointers into a task's heap cannot be sent to 
other tasks at all. Rust also does not have global variables.

So, let's look at D:

1. We have global variables.
1. Only std.concurrency enforces isolation at a type system level; it's 
not built into the language, so the GC cannot make assumptions.
1. The shared qualifier effectively allows pointers from one thread's 
heap into another's.

It's important to keep in mind that in order for thread-local GC (as 
defined above) to be possible at all, *under no circumstances whatsoever 
must there be a pointer in one thread's heap into another thread's heap, 
ever*. If this happens and you apply the above GC strategy (stop one 
thread at a time and scan only that thread's heap), you're effectively 
dealing with something very similar to the lost object problem on 
concurrent GC.

To clarify with regards to the shared qualifier: It does absolutely 
nothing. It's useless. All it does is slap a pretty "I can be shared 
arbitrarily across threads" label on a type. Even if you have this 
knowledge in the GC, it's not going to help you, because you *still* 
have to deal with the problem that arbitrary pointers can be floating 
around in arbitrary threads.

(And don't even get me started on the lack of clear semantics (and even 
the few semi-agreed-upon but flawed semantics) for shared.)

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
October 17, 2012
Re: Shared keyword and the GC?
On Wednesday, 17 October 2012 at 08:55:55 UTC, Alex Rønne 
Petersen wrote:
> On 17-10-2012 10:29, renoX wrote:
>> Hello,
>> in the discussions thread in the recent blog post which 
>> summarized how
>> GC works(*), the topic of thread-local GC was further 
>> discussed and I
>> pointed out that by default global variables in D are thread 
>> local but I
>> was answered that the types doesn't tell which global variable 
>> are
>> thread local and which are shared so the GC cannot use this 
>> information,
>> is-it true?
>> It seems like a wasted opportunity..
>>
>> BR,
>> renoX
>>
>>
>> *:
>> http://xtzgzorex.wordpress.com/2012/10/11/demystifying-garbage-collectors/
>
> Let's step back for a bit and think about what we want to 
> achieve with thread-local garbage collection. The idea is that 
> we look only at a single thread's heap (and stack/registers, of 
> course) when doing a collection. This means that we can -- 
> theoretically -- stop only one thread at a time and only when 
> it needs to be stopped. This is clearly a huge win in 
> scalability and raw speed. With a scheme like this, it might 
> even be possible to get away with a simple mark-sweep or 
> copying GC per thread instead of a complicated generational GC, 
> mainly due to the paradigms the isolation model induces.
>
> Rust, as it is today, can do this. Tasks (or threads if you 
> will - though they aren't the same thing) are completely 
> isolated. Types that can potentially contain pointers into a 
> task's heap cannot be sent to other tasks at all. Rust also 
> does not have global variables.
>
> So, let's look at D:
>
> 1. We have global variables.
> 1. Only std.concurrency enforces isolation at a type system 
> level; it's not built into the language, so the GC cannot make 
> assumptions.
> 1. The shared qualifier effectively allows pointers from one 
> thread's heap into another's.
>
> It's important to keep in mind that in order for thread-local 
> GC (as defined above) to be possible at all, *under no 
> circumstances whatsoever must there be a pointer in one 
> thread's heap into another thread's heap, ever*. If this 
> happens and you apply the above GC strategy (stop one thread at 
> a time and scan only that thread's heap), you're effectively 
> dealing with something very similar to the lost object problem 
> on concurrent GC.
>
> To clarify with regards to the shared qualifier: It does 
> absolutely nothing. It's useless. All it does is slap a pretty 
> "I can be shared arbitrarily across threads" label on a type. 
> Even if you have this knowledge in the GC, it's not going to 
> help you, because you *still* have to deal with the problem 
> that arbitrary pointers can be floating around in arbitrary 
> threads.
>
> (And don't even get me started on the lack of clear semantics 
> (and even the few semi-agreed-upon but flawed semantics) for 
> shared.)

Introduce the "noshared" keyword.
October 17, 2012
Re: Shared keyword and the GC?
On 17-10-2012 11:50, sclytrack wrote:
> On Wednesday, 17 October 2012 at 08:55:55 UTC, Alex Rønne Petersen wrote:
>> On 17-10-2012 10:29, renoX wrote:
>>> Hello,
>>> in the discussions thread in the recent blog post which summarized how
>>> GC works(*), the topic of thread-local GC was further discussed and I
>>> pointed out that by default global variables in D are thread local but I
>>> was answered that the types doesn't tell which global variable are
>>> thread local and which are shared so the GC cannot use this information,
>>> is-it true?
>>> It seems like a wasted opportunity..
>>>
>>> BR,
>>> renoX
>>>
>>>
>>> *:
>>> http://xtzgzorex.wordpress.com/2012/10/11/demystifying-garbage-collectors/
>>>
>>
>> Let's step back for a bit and think about what we want to achieve with
>> thread-local garbage collection. The idea is that we look only at a
>> single thread's heap (and stack/registers, of course) when doing a
>> collection. This means that we can -- theoretically -- stop only one
>> thread at a time and only when it needs to be stopped. This is clearly
>> a huge win in scalability and raw speed. With a scheme like this, it
>> might even be possible to get away with a simple mark-sweep or copying
>> GC per thread instead of a complicated generational GC, mainly due to
>> the paradigms the isolation model induces.
>>
>> Rust, as it is today, can do this. Tasks (or threads if you will -
>> though they aren't the same thing) are completely isolated. Types that
>> can potentially contain pointers into a task's heap cannot be sent to
>> other tasks at all. Rust also does not have global variables.
>>
>> So, let's look at D:
>>
>> 1. We have global variables.
>> 1. Only std.concurrency enforces isolation at a type system level;
>> it's not built into the language, so the GC cannot make assumptions.
>> 1. The shared qualifier effectively allows pointers from one thread's
>> heap into another's.
>>
>> It's important to keep in mind that in order for thread-local GC (as
>> defined above) to be possible at all, *under no circumstances
>> whatsoever must there be a pointer in one thread's heap into another
>> thread's heap, ever*. If this happens and you apply the above GC
>> strategy (stop one thread at a time and scan only that thread's heap),
>> you're effectively dealing with something very similar to the lost
>> object problem on concurrent GC.
>>
>> To clarify with regards to the shared qualifier: It does absolutely
>> nothing. It's useless. All it does is slap a pretty "I can be shared
>> arbitrarily across threads" label on a type. Even if you have this
>> knowledge in the GC, it's not going to help you, because you *still*
>> have to deal with the problem that arbitrary pointers can be floating
>> around in arbitrary threads.
>>
>> (And don't even get me started on the lack of clear semantics (and
>> even the few semi-agreed-upon but flawed semantics) for shared.)
>
> Introduce the "noshared" keyword.
>
>
>
>
>
>
>
>
>
>

Not a practical solution.

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
October 17, 2012
Re: Shared keyword and the GC?
On 2012-10-17 10:55, Alex Rønne Petersen wrote:

> Let's step back for a bit and think about what we want to achieve with
> thread-local garbage collection. The idea is that we look only at a
> single thread's heap (and stack/registers, of course) when doing a
> collection. This means that we can -- theoretically -- stop only one
> thread at a time and only when it needs to be stopped. This is clearly a
> huge win in scalability and raw speed. With a scheme like this, it might
> even be possible to get away with a simple mark-sweep or copying GC per
> thread instead of a complicated generational GC, mainly due to the
> paradigms the isolation model induces.
>
> Rust, as it is today, can do this. Tasks (or threads if you will -
> though they aren't the same thing) are completely isolated. Types that
> can potentially contain pointers into a task's heap cannot be sent to
> other tasks at all. Rust also does not have global variables.
>
> So, let's look at D:
>
> 1. We have global variables.
> 1. Only std.concurrency enforces isolation at a type system level; it's
> not built into the language, so the GC cannot make assumptions.
> 1. The shared qualifier effectively allows pointers from one thread's
> heap into another's.
>
> It's important to keep in mind that in order for thread-local GC (as
> defined above) to be possible at all, *under no circumstances whatsoever
> must there be a pointer in one thread's heap into another thread's heap,
> ever*. If this happens and you apply the above GC strategy (stop one
> thread at a time and scan only that thread's heap), you're effectively
> dealing with something very similar to the lost object problem on
> concurrent GC.
>
> To clarify with regards to the shared qualifier: It does absolutely
> nothing. It's useless. All it does is slap a pretty "I can be shared
> arbitrarily across threads" label on a type. Even if you have this
> knowledge in the GC, it's not going to help you, because you *still*
> have to deal with the problem that arbitrary pointers can be floating
> around in arbitrary threads.
>
> (And don't even get me started on the lack of clear semantics (and even
> the few semi-agreed-upon but flawed semantics) for shared.)

All TLS data is handled by collectors running in their one single 
thread, as you describe above. Any non-TLS data is handled the same way 
as the GC currently works.

This is how the, now deprecated, Apple GC used by Objective-C works.

-- 
/Jacob Carlborg
October 17, 2012
Re: Shared keyword and the GC?
Why not definitively adopt the following (and already proposed) memory 
scheme (some practice are now considered valid when this scheme is not 
respected) :

Thread local head (one by thread) -> shared heap -> immutable heap

This model have multiple benefices :
 - TL heap only can be processed by only interacting with one thread.
 - immutable head can be collected 100% concurently if we allow some 
floating garbage.
 - shared heap is the only problem, but as its size stay small, the 
problem stay small.
October 18, 2012
Re: Shared keyword and the GC?
On Oct 17, 2012, at 1:55 AM, Alex Rønne Petersen <alex@lycus.org> wrote:
> 
> So, let's look at D:
> 
> 1. We have global variables.
> 1. Only std.concurrency enforces isolation at a type system level; it's not built into the language, so the GC cannot make assumptions.
> 1. The shared qualifier effectively allows pointers from one thread's heap into another's.

Well, the problem is more that a variable can be cast to shared after instantiation, so to allow thread-local collections we'd have to make cast(shared) set a flag on the memory block to indicate that it's shared, and vice-versa for unshared.  Then when a thread terminates, all blocks not flagged as shared would be finalized, leaving the shared blocks alone.  Then any pool from the terminated thread containing a shared block would have to be merged into the global heap instead of released to the OS.

I think we need to head in this direction anyway, because we need to make sure that thread-local data is finalized by its owner thread.  A blocks owner would be whoever allocated the block or if cast to shared and back to unshared, whichever thread most recently cast the block back to unshared.  Tracking the owner of a block gives us the shared state implicitly, making thread-local collections possible.  Who wants to work on this? :-)
October 18, 2012
Re: Shared keyword and the GC?
On 2012-10-18 20:26, Sean Kelly wrote:

> Well, the problem is more that a variable can be cast to shared after instantiation, so to allow thread-local collections we'd have to make cast(shared) set a flag on the memory block to indicate that it's shared, and vice-versa for unshared.  Then when a thread terminates, all blocks not flagged as shared would be finalized, leaving the shared blocks alone.  Then any pool from the terminated thread containing a shared block would have to be merged into the global heap instead of released to the OS.

Or move the shared data to the global heap when it's casted. Don't know 
that's best. This way all data in a give pool will be truly thread local.

-- 
/Jacob Carlborg
October 18, 2012
Re: Shared keyword and the GC?
On Oct 18, 2012, at 11:48 AM, Jacob Carlborg <doob@me.com> wrote:

> On 2012-10-18 20:26, Sean Kelly wrote:
> 
>> Well, the problem is more that a variable can be cast to shared after instantiation, so to allow thread-local collections we'd have to make cast(shared) set a flag on the memory block to indicate that it's shared, and vice-versa for unshared.  Then when a thread terminates, all blocks not flagged as shared would be finalized, leaving the shared blocks alone.  Then any pool from the terminated thread containing a shared block would have to be merged into the global heap instead of released to the OS.
> 
> Or move the shared data to the global heap when it's casted. Don't know that's best. This way all data in a give pool will be truly thread local.

And back down to a local pool when shared is cast away.  Assuming the block is even movable.  I agree that this would be the most efficient use of memory, but I don't know that it's feasible.
October 18, 2012
Re: Shared keyword and the GC?
On 2012-10-18 20:54, Sean Kelly wrote:

> And back down to a local pool when shared is cast away.  Assuming the block is even movable.  I agree that this would be the most efficient use of memory, but I don't know that it's feasible.

You said the thread local heap would be merged with the global on thread 
termination. How is that different?

Alternative it could stay in the global heap. I mean, not many variables 
should be "shared" and even fewer should be casted back and forth.

-- 
/Jacob Carlborg
« First   ‹ Prev
1 2 3 4
Top | Discussion index | About this forum | D home