April 10, 2018
On Tuesday, April 10, 2018 08:37:47 David Bennett via Digitalmars-d wrote:
> On Tuesday, 10 April 2018 at 08:10:32 UTC, Jonathan M Davis wrote:
> > Yes. They expect it to work, and as the language is currently designed, it works perfectly well. In fact, it's even built into the language. e.g.
> >
> >     int[] foo() pure
> >     {
> >         return [1, 2, 3, 4];
> >     }
> >
> >     void main()
> >     {
> >         immutable arr = foo();
> >     }
> >
> > compiles thanks to the fact that the compiler can guarantee from the signature of foo that its return value is unique.
>
> Oh is that run at runtime? I thought D was just smart and did it using CTFE.

CTFE only ever happens when it must happen. The compiler never does it as an optimization. So, if you did

enum arr = foo();

or

static arr = foo();

then it would use CTFE, because an enum's value must be known at compile time, and if a static variable is directly initialized instead of initialized via a static constructor, its value must be known at compile time. But if you're initializing a variable whose value does not need to be known at compile time, then no CTFE occurs.

It would be a serious rabbit hole for the compiler to attempt CTFE when it wasn't told to, particularly since it can't look at a function and know whether it's going to work with CTFE or not. It has to actually call it with a specific set of arguments to find out (and depending on what the function does, it might even work with CTFE with some arguments and not with others - e.g. if a particular branch of an if statement works with CTFE while another does an operation that doesn't work with CTFE).

- Jonathan M Davis

April 10, 2018
On 4/10/18 4:37 AM, David Bennett wrote:
> On Tuesday, 10 April 2018 at 08:10:32 UTC, Jonathan M Davis wrote:
>>
>> Yes. They expect it to work, and as the language is currently designed, it works perfectly well. In fact, it's even built into the language. e.g.
>>
>>     int[] foo() pure
>>     {
>>         return [1, 2, 3, 4];
>>     }
>>
>>     void main()
>>     {
>>         immutable arr = foo();
>>     }
>>
>> compiles thanks to the fact that the compiler can guarantee from the signature of foo that its return value is unique.
>>
> 
> Oh is that run at runtime? I thought D was just smart and did it using CTFE.

Well, D could be smart enough and call a runtime function that says it's moving data from thread-local to shared (or vice versa).

> 
>>
>> We also have std.exception.assumeUnique (which is just a cast to immutable) as a way to document that you're guaranteeing that a reference to an object is unique and therefore can be safely cast to immutable.
>>
> 
> Can't say I've used std.exception.assumeUnique, but I guess other people have a use for it as it exists.
> 
> Would be nice if you could inject type checking information at compile time without effecting the storage class. But thats a bit OT now.

assumeUnique is a library function, it could be instrumented to do the right thing.

I think it's possible to do this in D, but you need language support.

-Steve
April 10, 2018
On 2018-04-10 08:47, Jonathan M Davis wrote:

> Regardless, I think that it's clear that in order to do anything with
> thread-local pools, we'd have to lock down the type system even further to
> disallow casts to or from shared or immutable, and that would really be a
> big problem given the inherent restrictions on those types and how shared is
> intended to be used.

Apple's GC for Objective-C (before it had ARC) was using thread-local pools. I wonder how they manged to do that in a language that doesn't have a type system that differentiates between TLS and shared memory.

-- 
/Jacob Carlborg
April 11, 2018
On Tuesday, 10 April 2018 at 18:31:28 UTC, Jacob Carlborg wrote:
> On 2018-04-10 08:47, Jonathan M Davis wrote:
>
>> Regardless, I think that it's clear that in order to do anything with
>> thread-local pools, we'd have to lock down the type system even further to
>> disallow casts to or from shared or immutable, and that would really be a
>> big problem given the inherent restrictions on those types and how shared is
>> intended to be used.
>
> Apple's GC for Objective-C (before it had ARC) was using thread-local pools. I wonder how they manged to do that in a language that doesn't have a type system that differentiates between TLS and shared memory.

They were doing it quite bad.

One of the reasons that always gets lost when discussing the merits of ARC over GC in Objective-C, is that Apple never managed to make the GC work without issues given its underlying C semantics.

So naturally having the compiler do what developers were already doing by hand with Framework derived classes was a safer way than ensuring Objective-C's GC would never crash.

Apple used to have a GC caveats document that was long taken down from their site.

This is one of the few surviving ones,

https://developer.apple.com/library/content/releasenotes/Cocoa/RN-ObjectiveC/#//apple_ref/doc/uid/TP40004309-CH1-DontLinkElementID_1
April 11, 2018
On Tuesday, 10 April 2018 at 07:22:14 UTC, David Bennett wrote:
> On Tuesday, 10 April 2018 at 06:43:28 UTC, Dmitry Olshansky wrote:
>> On Tuesday, 10 April 2018 at 06:10:10 UTC, David Bennett wrote:
>>> I was thinking about messing with the GC in my free time just yesterday... how hard would it be:
>>>
>>> [snip]
>>
>> Lost immutable and that thread-local is often casted to immutable, sometimes by compiler.
>> See assumeUnique and its ilk in Phobos.
>>
>> Same with shared - it’s still often the case that you allocate thread-local then cast to shared.
>
> People cast from thread local to shared? ...okay thats no fun...  :(
>
> I can understand the other way, thats why i was leaning on the conservative side and putting more stuff in the global pools.

Well you might want to build something as thread-local and then publish as shared.


>> That is indeed something we should at some point have. Needs cooperation from the language such as explicit functions for shared<->local conversions that run-time is aware of.
>>
>
> So the language could (in theory) inject a __move_to_global(ref local, ref global) when casting to shared and the GC would need to update all the references in the local pages to point to the new global address?

I think it could be __to_shared(ptr, length) to let GC know that block should be added to global set of sorts. That will foobar the GC design quite a bit but to have per thread GCs I’d take that risk.

But then keeping in mind transitive nature of shared.... Maybe not ;)

Maybe it should work the other way around - keep all in global pool, and have per-thread ref-sets of some form. Tricky anyway.


April 12, 2018
On Tuesday, 10 April 2018 at 06:47:53 UTC, Jonathan M Davis wrote:
> As it stands, it's impossible to have thread-local memory pools. It's quite legal to construct an object as shared or thread-local and cast it to the other. In fact, it's _highly_ likely that that's how any shared object of any complexity is going to be constructed. Similarly, it's extremely common to allocate an object as mutable and then cast it to immutable (either using assumeUnique or by using a pure function where the compiler does the cast implicitly for you if it can guarantee that the return value is unique), and immutable objects are implicitly shared. At minimum, there would have to be runtime hooks to do something like move an object between pools when it is cast to shared or immutable (or back) in order to ensure that an object was in the right pool, but if that requires copying the object rather than just moving the memory block, then it can't be done, because every pointer or reference pointing to that object would have to be rewritten (which isn't supported by the language).

It's a bit easier than that. When you cast something to shared or immutable, or allocate it as shared or immutable, you pin the object on the local heap. When the thread-local collector runs, it won't collect that object, since another thread might know about it. Then, when you run the global collector, it will determine which shared objects are still reachable and unpin things as appropriate.

That unpinning process requires a way to look up the owning thread for a piece of memory, which can be done in logarithmic time relative to the number of contiguous segments of address space.

Casting away from shared would not call any runtime functions; even if it were guaranteed that the cast were done on the allocating thread, it's likely that there exists another reference to the item in another thread.

This would discourage the use of immutable, since it wouldn't benefit from thread-local heaps.
April 12, 2018
On Wednesday, 11 April 2018 at 19:38:59 UTC, Dmitry Olshansky wrote:
> On Tuesday, 10 April 2018 at 07:22:14 UTC, David Bennett wrote:
>> People cast from thread local to shared? ...okay thats no fun...  :(
>>
>> I can understand the other way, thats why i was leaning on the conservative side and putting more stuff in the global pools.
>
> Well you might want to build something as thread-local and then publish as shared.

Yeah I can see if your trying to share types like classes, shared would get in the way quite quick.

> I think it could be __to_shared(ptr, length) to let GC know that block should be added to global set of sorts. That will foobar the GC design quite a bit but to have per thread GCs I’d take that risk.

Yeah I had this idea also, the runtime gets a hook on cast(shared) and the GC then just sets a flag and that part of memory will never be freed inside a thread-local mark/sweep. No move needed.

> But then keeping in mind transitive nature of shared.... Maybe not ;)

Yeah shared is quite locked down so should have less ways people could foil my plans.

It's __gshared that im worried about now, ie if you had a class (stored in global pool) that you then assigned a local class to one of it's members. When a thread-local mark/sweep happened it wouldn't see the ref in the global pool and the member might get removed...

---
class A{}

class B{
    __gshared A a;
    this(A a){
        this.a=a;
    }
}

void main()
{
    A a = new A();
    B b = new B(a);
}
---

Currently my idea of storing classes with __gshared members would put B on the global poll but theres no cast so A would not be hoocked with __to_shared(). I guess the compiler could in theory inject the same __to_shared() in this case also, but it would be a lot harder and would probably be a mess as theres no cast to hook.

So maybe with __gshared it should be on the thread-local pool but marked as global.. but you might be able to mix shared and __gshared in a way that wouldn't work.

> Maybe it should work the other way around - keep all in global pool, and have per-thread ref-sets of some form. Tricky anyway.

Would be worth some thought, I'll keep it in mind.

For now, I'm seeing if I can just make it so each thread has it's own Bin list, this way the data is stored in a way where the thread-local stuff is generally packed closer together and theres a higher chance to have a whole free page after a global mark/sweep.

If there a good benchmark for the GC I can run to see if I'm actually improving things?
April 13, 2018
On 10.04.2018 10:56, Jonathan M Davis wrote:
> CTFE only ever happens when it must happen. The compiler never does it as an
> optimization.

The frontend doesn't. The backend might.
April 13, 2018
On Friday, April 13, 2018 22:36:31 Timon Gehr via Digitalmars-d wrote:
> On 10.04.2018 10:56, Jonathan M Davis wrote:
> > CTFE only ever happens when it must happen. The compiler never does it as an optimization.
>
> The frontend doesn't. The backend might.

The optimizer may do constant folding or inline the code so far that it just gives the result, but it doesn't do actual CTFE. That's all in the frontend.

- Jonathan M Davis

April 14, 2018
On 13.04.2018 23:40, Jonathan M Davis wrote:
> On Friday, April 13, 2018 22:36:31 Timon Gehr via Digitalmars-d wrote:
>> On 10.04.2018 10:56, Jonathan M Davis wrote:
>>> CTFE only ever happens when it must happen. The compiler never does it
>>> as an optimization.
>>
>> The frontend doesn't. The backend might.
> 
> The optimizer may do constant folding or inline the code so far that it just
> gives the result, but it doesn't do actual CTFE. That's all in the frontend.
> 
> - Jonathan M Davis
> 

CTFE just stands for "compile-time function evaluation". Claiming that the compiler never does this as an optimization is a bit misleading, but fine.