October 18, 2012
On Oct 18, 2012, at 12:22 PM, Jacob Carlborg <doob@me.com> wrote:

> On 2012-10-18 20:54, Sean Kelly wrote:
> 
>> And back down to a local pool when shared is cast away.  Assuming the block is even movable.  I agree that this would be the most efficient use of memory, but I don't know that it's feasible.
> 
> You said the thread local heap would be merged with the global on thread termination. How is that different?
> 
> Alternative it could stay in the global heap. I mean, not many variables should be "shared" and even fewer should be casted back and forth.

It's different in that a variable's address never actually changes.  When a thread completes it hands all of its pools to the shared allocator, and then per-thread allocators request free pools from the shared allocator before going to the OS.  This is basically how the HOARD allocator works.
October 19, 2012
On 2012-10-18 18:26:08 +0000, Sean Kelly <sean@invisibleduck.org> said:

> Well, the problem is more that a variable can be cast to shared after
> instantiation, so to allow thread-local collections we'd have to make
> cast(shared) set a flag on the memory block to indicate that it's
> shared, and vice-versa for unshared.  Then when a thread terminates, all
> blocks not flagged as shared would be finalized, leaving the shared
> blocks alone.  Then any pool from the terminated thread containing a
> shared block would have to be merged into the global heap instead of
> released to the OS.
> 
> I think we need to head in this direction anyway, because we need to
> make sure that thread-local data is finalized by its owner thread.  A
> blocks owner would be whoever allocated the block or if cast to shared
> and back to unshared, whichever thread most recently cast the block back
> to unshared.  Tracking the owner of a block gives us the shared state
> implicitly, making thread-local collections possible.  Who wants to work
> on this? :-)

All this is nice, but what is the owner thread for immutable data? Because immutable is always implicitly shared, all your strings and everything else that is immutable is thus "shared" and must be tracked by the global heap's collector and can never be handled by a thread-local collector. Even if most immutable data never leaves the thread it was allocated in, there's no way you can know.

I don't think per-thread GCs will work very well without support for immutable data, an for that you need to have a distinction between immutable and shared immutable (just like you have with mutable data). I complained about this almost three years ago when the semantics of shared were being defined, but it got nowhere. Quoting Walter at the time:

> As for a shared gc vs thread local gc, I just see an awful lot of 
strange irreproducible bugs when someone passes data from one to the other. I doubt it's worth it, unless it can be done with compiler guarantees, which seem doubtful.

I think you'll have a hard time convincing Walter it is worth changing the behaviour of type modifiers at this point.

Reference:
<http://lists.puremagic.com/pipermail/dmd-concurrency/2010-January/000132.html>
<http://lists.puremagic.com/pipermail/dmd-concurrency/2010-January/000146.html>

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca/

October 19, 2012
On 17-10-2012 13:51, Jacob Carlborg wrote:
> On 2012-10-17 10:55, Alex Rønne Petersen wrote:
>
>> Let's step back for a bit and think about what we want to achieve with
>> thread-local garbage collection. The idea is that we look only at a
>> single thread's heap (and stack/registers, of course) when doing a
>> collection. This means that we can -- theoretically -- stop only one
>> thread at a time and only when it needs to be stopped. This is clearly a
>> huge win in scalability and raw speed. With a scheme like this, it might
>> even be possible to get away with a simple mark-sweep or copying GC per
>> thread instead of a complicated generational GC, mainly due to the
>> paradigms the isolation model induces.
>>
>> Rust, as it is today, can do this. Tasks (or threads if you will -
>> though they aren't the same thing) are completely isolated. Types that
>> can potentially contain pointers into a task's heap cannot be sent to
>> other tasks at all. Rust also does not have global variables.
>>
>> So, let's look at D:
>>
>> 1. We have global variables.
>> 1. Only std.concurrency enforces isolation at a type system level; it's
>> not built into the language, so the GC cannot make assumptions.
>> 1. The shared qualifier effectively allows pointers from one thread's
>> heap into another's.
>>
>> It's important to keep in mind that in order for thread-local GC (as
>> defined above) to be possible at all, *under no circumstances whatsoever
>> must there be a pointer in one thread's heap into another thread's heap,
>> ever*. If this happens and you apply the above GC strategy (stop one
>> thread at a time and scan only that thread's heap), you're effectively
>> dealing with something very similar to the lost object problem on
>> concurrent GC.
>>
>> To clarify with regards to the shared qualifier: It does absolutely
>> nothing. It's useless. All it does is slap a pretty "I can be shared
>> arbitrarily across threads" label on a type. Even if you have this
>> knowledge in the GC, it's not going to help you, because you *still*
>> have to deal with the problem that arbitrary pointers can be floating
>> around in arbitrary threads.
>>
>> (And don't even get me started on the lack of clear semantics (and even
>> the few semi-agreed-upon but flawed semantics) for shared.)
>
> All TLS data is handled by collectors running in their one single
> thread, as you describe above. Any non-TLS data is handled the same way
> as the GC currently works.
>
> This is how the, now deprecated, Apple GC used by Objective-C works.
>

How does it deal with the problem where a pointer in TLS points to global data, or worse yet, a pointer in the global heap points to TLS?

I'm pretty sure it can't without doing a full pass over the entire heap, which seems to me like it defeats the purpose.

But I may just be missing out on some restriction (type system or whatever) Objective-C has that makes it feasible.

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
October 19, 2012
On 17-10-2012 16:26, deadalnix wrote:
> Why not definitively adopt the following (and already proposed) memory
> scheme (some practice are now considered valid when this scheme is not
> respected) :
>
> Thread local head (one by thread) -> shared heap -> immutable heap
>
> This model have multiple benefices :
>   - TL heap only can be processed by only interacting with one thread.
>   - immutable head can be collected 100% concurently if we allow some
> floating garbage.
>   - shared heap is the only problem, but as its size stay small, the
> problem stay small.

Can you elaborate? I'm not sure I understand.

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
October 19, 2012
On 18-10-2012 20:26, Sean Kelly wrote:
> On Oct 17, 2012, at 1:55 AM, Alex Rønne Petersen <alex@lycus.org> wrote:
>>
>> So, let's look at D:
>>
>> 1. We have global variables.
>> 1. Only std.concurrency enforces isolation at a type system level; it's not built into the language, so the GC cannot make assumptions.
>> 1. The shared qualifier effectively allows pointers from one thread's heap into another's.
>
> Well, the problem is more that a variable can be cast to shared after instantiation, so to allow thread-local collections we'd have to make cast(shared) set a flag on the memory block to indicate that it's shared, and vice-versa for unshared.  Then when a thread terminates, all blocks not flagged as shared would be finalized, leaving the shared blocks alone.  Then any pool from the terminated thread containing a shared block would have to be merged into the global heap instead of released to the OS.
>
> I think we need to head in this direction anyway, because we need to make sure that thread-local data is finalized by its owner thread.  A blocks owner would be whoever allocated the block or if cast to shared and back to unshared, whichever thread most recently cast the block back to unshared.  Tracking the owner of a block gives us the shared state implicitly, making thread-local collections possible.  Who wants to work on this? :-)
>

I'm not really sure how this solves the problem of having pointers from a thread-local heap into the global heap and vice versa. Can you elaborate on that?

The problem is that even if you know whether a piece of memory is flagged shared, you cannot know if some arbitrary number of threads happen to have pointers to it and can thus mutate anything inside it while a thread-local collection is in progress.

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
October 19, 2012
On 2012-10-18 22:29, Sean Kelly wrote:

> It's different in that a variable's address never actually changes.  When a thread completes it hands all of its pools to the shared allocator, and then per-thread allocators request free pools from the shared allocator before going to the OS.  This is basically how the HOARD allocator works.

Ah, now I see.

-- 
/Jacob Carlborg
October 19, 2012
On 2012-10-19 03:06, Michel Fortin wrote:

> All this is nice, but what is the owner thread for immutable data?
> Because immutable is always implicitly shared, all your strings and
> everything else that is immutable is thus "shared" and must be tracked
> by the global heap's collector and can never be handled by a
> thread-local collector. Even if most immutable data never leaves the
> thread it was allocated in, there's no way you can know.
>
> I don't think per-thread GCs will work very well without support for
> immutable data, an for that you need to have a distinction between
> immutable and shared immutable (just like you have with mutable data). I
> complained about this almost three years ago when the semantics of
> shared were being defined, but it got nowhere. Quoting Walter at the time:

Would it be any difference if the immutable data was collected from a different collector than the shared or thread local?

In this case I guess the collector wouldn't try to make a difference between shared and non-shared immutable data.

-- 
/Jacob Carlborg
October 19, 2012
On 2012-10-19 08:48, Alex Rønne Petersen wrote:

> How does it deal with the problem where a pointer in TLS points to
> global data, or worse yet, a pointer in the global heap points to TLS?
>
> I'm pretty sure it can't without doing a full pass over the entire heap,
> which seems to me like it defeats the purpose.
>
> But I may just be missing out on some restriction (type system or
> whatever) Objective-C has that makes it feasible.

I'm not sure how this is handled. But the GC is only used for the Objective-C allocations, i.e. [NSObject alloc] and not for C allocations, i.e. "malloc".

-- 
/Jacob Carlborg
October 19, 2012
> How does it deal with the problem where a pointer in TLS points to global data,

Need to run stop-the-world for shared heap. But it would be interesting to have blocks that have no shared pointers in them.


> or worse yet, a pointer in the global heap points to TLS?
>

Could you give an example?

> I'm pretty sure it can't without doing a full pass over the entire heap, which seems to me like it defeats the purpose.

Yeah.

>
> But I may just be missing out on some restriction (type system or whatever) Objective-C has that makes it feasible.


October 19, 2012
On 19-10-2012 11:07, sclytrack wrote:
>
>> How does it deal with the problem where a pointer in TLS points to
>> global data,
>
> Need to run stop-the-world for shared heap. But it would be interesting
> to have blocks that have no shared pointers in them.

The problem with D is that we have a (more or less) stable language that we can't make major changes to at this point.

>
>
>> or worse yet, a pointer in the global heap points to TLS?
>>
>
> Could you give an example?

I don't know Objective-C, so in D:

void* p; // in TLS

void main()
{
    p = GC.malloc(1024); // a pointer to the global heap is now in TLS
}

Or the more complicated case (for any arbitrary graph of objects):

Object p; // in TLS

class C
{
    Object o;

    this(Object o)
    {
        this.o = o;
    }
}

void main()
{
    p = new C(new Object); // the graph can be arbitrarily complex and any part of it can be allocated with the GC, malloc, or any other mechanism
}

>
>> I'm pretty sure it can't without doing a full pass over the entire
>> heap, which seems to me like it defeats the purpose.
>
> Yeah.

Thread-local GC is all about improving scalability by only stopping threads that need to be stopped. If you can't even do that, then any effort towards thread-local GC is quite pointless IMO.

>
>>
>> But I may just be missing out on some restriction (type system or
>> whatever) Objective-C has that makes it feasible.
>
>


-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org