April 10, 2018
On Monday, 9 April 2018 at 19:50:16 UTC, H. S. Teoh wrote:
> On Mon, Apr 09, 2018 at 07:43:00PM +0000, Dmitry Olshansky via Digitalmars-d wrote:
>> On Monday, 9 April 2018 at 18:27:26 UTC, Per Nordlöw wrote:
> [...]
>> > Which kinds of GC's would be of interest?
>> 
>> I believe we can get away with parallel mark-sweep + snapshot-based concurrency. It has some limitations but in D land with GC not being the single source of memory it should work fine.
>> 
>> > Which attempts have been made already?
>> 
>> I still think that mostly precise Immix style GC would also work, it won’t be 1:1 porting job though. Many things to figure out.
>
> Last I remembered, you were working on a GC prototype for D?

Still there, but my spare time is super limited lately, the other project preempted that for the moment.

> Any news on that, or have you basically given it up?

Might try to hack to the finish line in one good night, it was pretty close to complete. Debugging would be fun though ;)

Will likely try to complete it at DConf hackathon, I’d be glad should anyone want to help.

>
> T


April 10, 2018
On Tuesday, 10 April 2018 at 03:59:33 UTC, Ikeran wrote:
> On Monday, 9 April 2018 at 19:43:00 UTC, Dmitry Olshansky wrote:
>> None of of even close to advanced GCs are pluggable
>
> Eclipse OMR contains a pluggable GC, and it's used in OpenJ9,

Or rather Eclipse OMR is a toolkit for runtimes/VMs and GC plugs into that. I encourage you to try it to implement D-like semantics with this run-time and you’ll see just how pluggable it is.

> which claims to be an enterprise-grade JVM.

I once used OpenJ9, which was IBM J9 I think, right beforce open sourcing.  It was about x2 slower then Hotspot, didn’t dig too deep as to preciese reason. The fact that it was on Power8 was especially surprising, I thought IBM would take advantage of their own hardware.

April 10, 2018
On Tuesday, 10 April 2018 at 05:26:28 UTC, Dmitry Olshansky wrote:
> On Monday, 9 April 2018 at 19:50:16 UTC, H. S. Teoh wrote:
>> Last I remembered, you were working on a GC prototype for D?
>
> Still there, but my spare time is super limited lately, the other project preempted that for the moment.
>
>> Any news on that, or have you basically given it up?
>
> Might try to hack to the finish line in one good night, it was pretty close to complete. Debugging would be fun though ;)

I was thinking about messing with the GC in my free time just yesterday... how hard would it be:

Add a BlkAttr.THREAD_LOCAL, and set it from the runtime if the type or it's members are not shared or __gshared.

Then we could store BlkAttr.THREAD_LOCAL memory in different pages (per thread) without having to setting a mutex. (if we need to get new page from the global pool we set a mutex for that)

If thats possible we could also Just(TM) scan the current thread stack and mark/sweep only those pages. (without a stop the world)

And when a thread ends we could give the pages to the global pool without a mark/sweep.

The idea is it works like it does currently unless something is invisible to other threads, Or am i missing something obvious? (quite likely)
April 10, 2018
On Tuesday, 10 April 2018 at 06:10:10 UTC, David Bennett wrote:
> I was thinking about messing with the GC in my free time just yesterday... how hard would it be:
>
> [snip]
>
> The idea is it works like it does currently unless something is invisible to other threads, Or am i missing something obvious? (quite likely)

Forgot to mention that a non-thread local mark/sweep would still scan all thread stacks and pages like it does currently as a thread local could hold a pointer the the global data (ie a copy of __gshared, void*).

The only why I can think of to break this idea is using cast() or sending something to a C function that then does and adds pointers in global data to the thread local stuff...
April 10, 2018
On Tuesday, 10 April 2018 at 06:10:10 UTC, David Bennett wrote:
> On Tuesday, 10 April 2018 at 05:26:28 UTC, Dmitry Olshansky wrote:
>> On Monday, 9 April 2018 at 19:50:16 UTC, H. S. Teoh wrote:
>>> Last I remembered, you were working on a GC prototype for D?
>>
>> Still there, but my spare time is super limited lately, the other project preempted that for the moment.
>>
>>> Any news on that, or have you basically given it up?
>>
>> Might try to hack to the finish line in one good night, it was pretty close to complete. Debugging would be fun though ;)
>
> I was thinking about messing with the GC in my free time just yesterday... how hard would it be:
>
> Add a BlkAttr.THREAD_LOCAL, and set it from the runtime if the type or it's members are not shared or __gshared.
>
> Then we could store BlkAttr.THREAD_LOCAL memory in different pages (per thread) without having to setting a mutex. (if we need to get new page from the global pool we set a mutex for that)

Lost immutable and that thread-local is often casted to immutable, sometimes by compiler.
See assumeUnique and its ilk in Phobos.

Same with shared - it’s still often the case that you allocate thread-local then cast to shared.

Lastly - thanks to 0-typesafety of delegates it’s trivial to share a single GC-backed stack with multiple threads. So what you deemed thread-local might be used in other thread, transitively so.

D is thread-local except when it’s not.

>
> If thats possible we could also Just(TM) scan the current thread stack and mark/sweep only those pages. (without a stop the world)
>

That is indeed something we should at some point have. Needs cooperation from the language such as explicit functions for shared<->local conversions that run-time is aware of.

> And when a thread ends we could give the pages to the global pool without a mark/sweep.
>
> The idea is it works like it does currently unless something is invisible to other threads, Or am i missing something obvious? (quite likely)

Indeed there are ugly details that while would allow per thread GC in principle will in general crash and burn on most non-trivial programs.



April 10, 2018
On Tuesday, April 10, 2018 06:10:10 David Bennett via Digitalmars-d wrote:
> On Tuesday, 10 April 2018 at 05:26:28 UTC, Dmitry Olshansky wrote:
> > On Monday, 9 April 2018 at 19:50:16 UTC, H. S. Teoh wrote:
> >> Last I remembered, you were working on a GC prototype for D?
> >
> > Still there, but my spare time is super limited lately, the other project preempted that for the moment.
> >
> >> Any news on that, or have you basically given it up?
> >
> > Might try to hack to the finish line in one good night, it was pretty close to complete. Debugging would be fun though ;)
>
> I was thinking about messing with the GC in my free time just yesterday... how hard would it be:
>
> Add a BlkAttr.THREAD_LOCAL, and set it from the runtime if the type or it's members are not shared or __gshared.
>
> Then we could store BlkAttr.THREAD_LOCAL memory in different
> pages (per thread) without having to setting a mutex. (if we need
> to get new page from the global pool we set a mutex for that)
>
> If thats possible we could also Just(TM) scan the current thread
> stack and mark/sweep only those pages. (without a stop the world)
>
> And when a thread ends we could give the pages to the global pool without a mark/sweep.
>
> The idea is it works like it does currently unless something is invisible to other threads, Or am i missing something obvious? (quite likely)

As it stands, it's impossible to have thread-local memory pools. It's quite legal to construct an object as shared or thread-local and cast it to the other. In fact, it's _highly_ likely that that's how any shared object of any complexity is going to be constructed. Similarly, it's extremely common to allocate an object as mutable and then cast it to immutable (either using assumeUnique or by using a pure function where the compiler does the cast implicitly for you if it can guarantee that the return value is unique), and immutable objects are implicitly shared. At minimum, there would have to be runtime hooks to do something like move an object between pools when it is cast to shared or immutable (or back) in order to ensure that an object was in the right pool, but if that requires copying the object rather than just moving the memory block, then it can't be done, because every pointer or reference pointing to that object would have to be rewritten (which isn't supported by the language).

Also, it would be a disaster for shared, because the typical way to use shared is to protect the shared object with a mutex, cast away shared so that it can be operated on as thread-local within that section of code, and then before the mutex is released, all thread-local references then need to be gone. e.g.

synchronized(mutex)
{
    auto threadLocal = cast(MyType)mySharedObject;

    // do something with threadLocal...

    // threadLocal leaves scope and is gone without being cast back
}

// all references to the shared object should now be shared

You really _don't_ want the shared object to move between pools
because of that cast (since it would hurt performance), and in such a
situation, you don't usually cast back to shared. Rather, you have a shared
reference, cast it to get a thread-local reference, and then let the
thread-local reference leave scope. So, the same object temporarily has both
a thread-local and a shared reference to it, and if it were moved to the
thread-local pool with the cast, it would never be moved back when the
thread-local references left scope and the mutex was released.

Having synchronized classes as described in TDPL would make the above code cleaner in the cases where a synchronized class would work, but the basic concept is the same. It would still be doing a cast underneath the hood, and it would still have the same problems. It just wouldn't involve explicit casting. shared's design inherently requires casting away shared, so it just plain isn't going to play well with anything that doesn't play well with such casts - such as having thread-local heaps.

Also, IIRC, at one point, Daniel Murphy explained to me some problem with classes with regards to the virtual table or the TypeInfo that inherently wouldn't work with trying to move it between threads. Unfortunately, I don't remember the details now, but I do remember that there's _something_ there that wouldn't work with thread-local heaps. And if anyone were to seriously try it, I expect that he could probably come up with the reasons again.

Regardless, I think that it's clear that in order to do anything with thread-local pools, we'd have to lock down the type system even further to disallow casts to or from shared or immutable, and that would really be a big problem given the inherent restrictions on those types and how shared is intended to be used. So, while it's a common idea as to how the GC could be improved, and it would be great if we could do it, I think that it goes right along with all of the other ideas that require stuff like read and write barriers everywhere and thus will never be in D's GC.

- Jonathan M Davis

April 10, 2018
On Tuesday, 10 April 2018 at 06:43:28 UTC, Dmitry Olshansky wrote:
> On Tuesday, 10 April 2018 at 06:10:10 UTC, David Bennett wrote:
>> I was thinking about messing with the GC in my free time just yesterday... how hard would it be:
>>
>> [snip]
>
> Lost immutable and that thread-local is often casted to immutable, sometimes by compiler.
> See assumeUnique and its ilk in Phobos.
>
> Same with shared - it’s still often the case that you allocate thread-local then cast to shared.

People cast from thread local to shared? ...okay thats no fun...  :(

I can understand the other way, thats why i was leaning on the conservative side and putting more stuff in the global pools.

>
> Lastly - thanks to 0-typesafety of delegates it’s trivial to share a single GC-backed stack with multiple threads. So what you deemed thread-local might be used in other thread, transitively so.

Oh thats a good point I didn't think of!

>
> D is thread-local except when it’s not.
>
>>
>> If thats possible we could also Just(TM) scan the current thread stack and mark/sweep only those pages. (without a stop the world)
>>
>
> That is indeed something we should at some point have. Needs cooperation from the language such as explicit functions for shared<->local conversions that run-time is aware of.
>

So the language could (in theory) inject a __move_to_global(ref local, ref global) when casting to shared and the GC would need to update all the references in the local pages to point to the new global address?

>> And when a thread ends we could give the pages to the global pool without a mark/sweep.
>>
>> The idea is it works like it does currently unless something is invisible to other threads, Or am i missing something obvious? (quite likely)
>
> Indeed there are ugly details that while would allow per thread GC in principle will in general crash and burn on most non-trivial programs.

Okay, thanks for the points they were very clear so I assume you have spent a lot more brain power on this then I have.
April 10, 2018
On Tuesday, 10 April 2018 at 06:47:53 UTC, Jonathan M Davis wrote:
> As it stands, it's impossible to have thread-local memory pools. It's quite legal to construct an object as shared or thread-local and cast it to the other. In fact, it's _highly_ likely that that's how any shared object of any complexity is going to be constructed. Similarly, it's extremely common to allocate an object as mutable and then cast it to immutable (either using assumeUnique or by using a pure function where the compiler does the cast implicitly for you if it can guarantee that the return value is unique), and immutable objects are implicitly shared.
>

(Honest question:) Do people really cast from local to shared/immutable and expect it to work?
(when ever I cast something more complex then a size_t I almost expect it to blow up... or break sometime in the future)

That said, I can understanding building a shared object from parts of local data... though I try to keep my thread barriers as thin as possible myself. (meaning I tend to copy stuff to the shared and have as few shared's as possible)

>
> At minimum, there would have to be runtime hooks to do something like move an object between pools when it is cast to shared or immutable (or back) in order to ensure that an object was in the right pool, but if that requires copying the object rather than just moving the memory block, then it can't be done, because every pointer or reference pointing to that object would have to be rewritten (which isn't supported by the language).
>

A hook for local to cast(shared) could work... but would require a DIP I guess. I was hoping to make a more incremental improvement the the GC.

>
> Also, it would be a disaster for shared, because the typical way to use shared is to protect the shared object with a mutex, cast away shared so that it can be operated on as thread-local within that section of code, and then before the mutex is released, all thread-local references then need to be gone. e.g.
>
>
> synchronized(mutex)
> {
>     auto threadLocal = cast(MyType)mySharedObject;
>
>     // do something with threadLocal...
>
>     // threadLocal leaves scope and is gone without being cast back
> }
>
> // all references to the shared object should now be shared
>

Yeah thats why I was still scanning all thread stacks and pages when marking global data.
So a shared -> local is a no op but the other way needs thought.

>
> You really _don't_ want the shared object to move between pools
> because of that cast (since it would hurt performance), and in such a
> situation, you don't usually cast back to shared. Rather, you have a shared
> reference, cast it to get a thread-local reference, and then let the
> thread-local reference leave scope. So, the same object temporarily has both
> a thread-local and a shared reference to it, and if it were moved to the
> thread-local pool with the cast, it would never be moved back when the
> thread-local references left scope and the mutex was released.
>
> Having synchronized classes as described in TDPL would make the above code cleaner in the cases where a synchronized class would work, but the basic concept is the same. It would still be doing a cast underneath the hood, and it would still have the same problems. It just wouldn't involve explicit casting. shared's design inherently requires casting away shared, so it just plain isn't going to play well with anything that doesn't play well with such casts - such as having thread-local heaps.
>

I would think a shared class would never be marked as a THREAD_LOCAL as it has a shared member.

>
> Also, IIRC, at one point, Daniel Murphy explained to me some problem with classes with regards to the virtual table or the TypeInfo that inherently wouldn't work with trying to move it between threads. Unfortunately, I don't remember the details now, but I do remember that there's _something_ there that wouldn't work with thread-local heaps. And if anyone were to seriously try it, I expect that he could probably come up with the reasons again.
>
> Regardless, I think that it's clear that in order to do anything with thread-local pools, we'd have to lock down the type system even further to disallow casts to or from shared or immutable, and that would really be a big problem given the inherent restrictions on those types and how shared is intended to be used. So, while it's a common idea as to how the GC could be improved, and it would be great if we could do it, I think that it goes right along with all of the other ideas that require stuff like read and write barriers everywhere and thus will never be in D's GC.
>
> - Jonathan M Davis

Yeah I thought it would have issues, thanks for your feedback!

I'll see if I can come up with a better idea that doesn't break as much stuff.
April 10, 2018
On Tuesday, April 10, 2018 07:55:00 David Bennett via Digitalmars-d wrote:
> On Tuesday, 10 April 2018 at 06:47:53 UTC, Jonathan M Davis wrote:
> > As it stands, it's impossible to have thread-local memory pools. It's quite legal to construct an object as shared or thread-local and cast it to the other. In fact, it's _highly_ likely that that's how any shared object of any complexity is going to be constructed. Similarly, it's extremely common to allocate an object as mutable and then cast it to immutable (either using assumeUnique or by using a pure function where the compiler does the cast implicitly for you if it can guarantee that the return value is unique), and immutable objects are implicitly shared.
>
> (Honest question:) Do people really cast from local to
> shared/immutable and expect it to work?
> (when ever I cast something more complex then a size_t I almost
> expect it to blow up... or break sometime in the future)

Yes. They expect it to work, and as the language is currently designed, it works perfectly well. In fact, it's even built into the language. e.g.

    int[] foo() pure
    {
        return [1, 2, 3, 4];
    }

    void main()
    {
        immutable arr = foo();
    }

compiles thanks to the fact that the compiler can guarantee from the signature of foo that its return value is unique. We also have std.exception.assumeUnique (which is just a cast to immutable) as a way to document that you're guaranteeing that a reference to an object is unique and therefore can be safely cast to immutable.

> That said, I can understanding building a shared object from parts of local data... though I try to keep my thread barriers as thin as possible myself. (meaning I tend to copy stuff to the shared and have as few shared's as possible)

Because of how restrictive shared and immutable are, you frequently have to build them from thread-local, mutable data. And while it's preferable to have as little in your program be shared as possible and to favor solutions such as doing message passing with std.concurrency, there are situations where you pretty much need to have complex shared objects. And since D is a systems language, we're a lot more restricted in the assumptions that we can make in comparison to a language such as Java or C#.

- Jonathan M Davis

April 10, 2018
On Tuesday, 10 April 2018 at 08:10:32 UTC, Jonathan M Davis wrote:
>
> Yes. They expect it to work, and as the language is currently designed, it works perfectly well. In fact, it's even built into the language. e.g.
>
>     int[] foo() pure
>     {
>         return [1, 2, 3, 4];
>     }
>
>     void main()
>     {
>         immutable arr = foo();
>     }
>
> compiles thanks to the fact that the compiler can guarantee from the signature of foo that its return value is unique.
>

Oh is that run at runtime? I thought D was just smart and did it using CTFE.

>
> We also have std.exception.assumeUnique (which is just a cast to immutable) as a way to document that you're guaranteeing that a reference to an object is unique and therefore can be safely cast to immutable.
>

Can't say I've used std.exception.assumeUnique, but I guess other people have a use for it as it exists.

Would be nice if you could inject type checking information at compile time without effecting the storage class. But thats a bit OT now.

>
> Because of how restrictive shared and immutable are, you frequently have to build them from thread-local, mutable data. And while it's preferable to have as little in your program be shared as possible and to favor solutions such as doing message passing with std.concurrency, there are situations where you pretty much need to have complex shared objects. And since D is a systems language, we're a lot more restricted in the assumptions that we can make in comparison to a language such as Java or C#.
>

Yeah i agree that any solution should keep in mind that D is a systems language and should allow you to do stuff when you need to.

Oh, I just had a much simpler idea that shouldn't have any issues, I'll see if that makes the GC faster to allocate. (everything else is the same)