January 02, 2015
On Friday, 2 January 2015 at 23:26:57 UTC, Jonathan M Davis via Digitalmars-d-learn wrote:
> On Friday, January 02, 2015 19:47:50 John Colvin via Digitalmars-d-learn wrote:
>> On Friday, 2 January 2015 at 13:14:14 UTC, Jonathan M Davis via
>> Digitalmars-d-learn wrote:
>> > Objects in D default to being thread-local. __gshared and
>> > shared both make
>> > it so that they're not thread-local. __gshared does it without
>> > actually
>> > changing the type, making it easier to use but also dangerous
>> > to use,
>> > because it makes it easy to violate the compiler's guarantees,
>> > because it'll
>> > treat it like a thread-local variable with regards to
>> > optimizations and
>> > whatnot.
>>
>> I'm pretty sure that's not true. __gshared corresponds to C-style
>> globals, which are *not* assumed to be thread-local (see below).
>
> No, the type system will treat __gshared like a thread-local variable. It
> gets put in shared memory like a C global would be, but __gshared isn't
> actually part of the type, so the compiler has no way of knowing that it's
> anything other than a thread-local variable - which is precisely why it's so
> dangerous to use it instead of shared. For instance,
>
> __gshared int* foo;
>
> void main()
> {
>     foo = new int;
>     int* bar = foo;
> }
>
> will compile just fine, whereas if you used shared, it wouldn't.
>
> - Jonathan M Davis

I understand that. As far as optimisations and codegen go, that is not the same as being able to assume that something is thread-local, far from it.

The rule (in C(++) at least) is that all data is assumed to be visible and mutable from multiple other threads unless proved otherwise. However, given that you do not write a race, the compiler will provide full sequential consistency. If you do write a race though, all bets are off.

Are you telling me that D does not obey the C(++) memory model? That would be a fatal hole in our C(++) interoperability.

AFAIK, the only data in D that the compiler is allowed to assume to be thread-local is data that it can prove is thread-local. The trivial case is TLS, which is thread-local by definition.
January 02, 2015
On Friday, 2 January 2015 at 23:10:46 UTC, John Colvin wrote:
> What significant optimisations does SC-DRF actually prevent?

By "SC-DRF" I assume you mean the Java memory model. AFAIK SCDRF just means that if you syncronize correctly (manually) then you will get sequential consistency (restriction on the compiler).

Getting rid of the restrictions on the compiler and eliding programmer-provided syncronization allows for more optimizations on loads, writes, reordering, syncronization/refcounting...?
January 03, 2015
On Friday, January 02, 2015 23:51:04 John Colvin via Digitalmars-d-learn wrote:
> AFAIK, the only data in D that the compiler is allowed to assume to be thread-local is data that it can prove is thread-local. The trivial case is TLS, which is thread-local by definition.

In D, if a type is not marked as shared, then it is by definition thread-local, and the compiler is free to assume that it's thread-local. If it's not actually thread-local (e.g. because you cast away shared), then it's up to you to ensure that no other threads access that data while it's being referred to via a reference or pointer that's typed as thread-local. The only exception is immutable, in which case it's implicitly shared, because it can never change, and there's no need to worry about multiple threads accessing the data at the same time.

- Jonathan M Davis

January 03, 2015
On Friday, 2 January 2015 at 23:51:05 UTC, John Colvin wrote:
> The rule (in C(++) at least) is that all data is assumed to be visible and mutable from multiple other threads unless proved otherwise. However, given that you do not write a race, the compiler will provide full sequential consistency. If you do write a race though, all bets are off.

The memory is visible and mutable, but that's pretty much the only guarantee you get. Without synchronization, there's no guarantee a write made by thread A will ever be seen by thread B, and vice versa.

Analogously in D, if a thread modifies a __gshared variable, there's no guarantees another thread will ever see that modification. The variable isn't thread local, but it's almost as if the compiler to treat it that way.

These relaxed guarantees allow the compiler to keep variables in registers, and re-order memory writes. These optimizations are crucial to performance.
January 03, 2015
On Friday, 2 January 2015 at 23:56:44 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 2 January 2015 at 23:10:46 UTC, John Colvin wrote:
>> What significant optimisations does SC-DRF actually prevent?
>
> By "SC-DRF" I assume you mean the Java memory model.

The Java, C11 and C++11 memory model.

> AFAIK SCDRF just means that if you syncronize correctly (manually) then you will get sequential consistency (restriction on the compiler).

That sounds like a correct description of it to me, yes.

> Getting rid of the restrictions on the compiler and eliding programmer-provided syncronization allows for more optimizations on loads, writes, reordering, syncronization/refcounting...?

Yes, I was hoping that perhaps you knew more specifics. AFAIK, when not restricted by any kind of barriers, SC-DRF does not have a particularly significant cost.
January 03, 2015
On Saturday, 3 January 2015 at 00:48:23 UTC, Peter Alexander wrote:
> On Friday, 2 January 2015 at 23:51:05 UTC, John Colvin wrote:
>> The rule (in C(++) at least) is that all data is assumed to be visible and mutable from multiple other threads unless proved otherwise. However, given that you do not write a race, the compiler will provide full sequential consistency. If you do write a race though, all bets are off.
>
> The memory is visible and mutable, but that's pretty much the only guarantee you get. Without synchronization, there's no guarantee a write made by thread A will ever be seen by thread B, and vice versa.
>
> Analogously in D, if a thread modifies a __gshared variable, there's no guarantees another thread will ever see that modification. The variable isn't thread local, but it's almost as if the compiler to treat it that way.
>
> These relaxed guarantees allow the compiler to keep variables in registers, and re-order memory writes. These optimizations are crucial to performance.

That is exactly how I understood the situation to be, yes.

2 questions I have to absolutely nail this down for good:

Does D assume that local variables are truly thread-local? I.e. can the compiler perform optimisations on local references that would break SC-DRF without first proving that it does not? Another way of putting it is: can a D compiler perform optimisations that are normally illegal in modern C(++) and Java?

If the answer to the above is yes, does the same apply to explicit use of __gshared variables?
January 03, 2015
On Saturday, 3 January 2015 at 10:13:52 UTC, John Colvin wrote:
> The Java, C11 and C++11 memory model.

Well...

http://en.cppreference.com/w/cpp/atomic/memory_order

> Yes, I was hoping that perhaps you knew more specifics. AFAIK, when not restricted by any kind of barriers, SC-DRF does not have a particularly significant cost.

I think that even with lock free datastructures such as Intel Threaded Building Blocks, you would still gain from using a non-synchronizing API where possible. In real code you have several layers for functioncalls, so doing this by hand will complicate the code.

If you can propagate knowledge down the call chain about locality and the semantics of object interfaces or clusters of objects, then you can relax restrictions on the optimizer and get better performance on real-world code. This would be a natural direction for D, since it is already encouraging templated code.

The alternative is to hand code this where it matters, but that is inconvenient...
January 03, 2015
On Saturday, 3 January 2015 at 00:12:35 UTC, Jonathan M Davis via Digitalmars-d-learn wrote:
> In D, if a type is not marked as shared, then it is by definition
> thread-local, and the compiler is free to assume that it's thread-local.

I find this to be rather vague. If the compiler exploit this to the maximum wouldn't that lead to lots of bugs?

January 03, 2015
On Sat, 03 Jan 2015 12:14:54 +0000
via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> wrote:

> On Saturday, 3 January 2015 at 00:12:35 UTC, Jonathan M Davis via Digitalmars-d-learn wrote:
> > In D, if a type is not marked as shared, then it is by
> > definition
> > thread-local, and the compiler is free to assume that it's
> > thread-local.
> 
> I find this to be rather vague. If the compiler exploit this to the maximum wouldn't that lead to lots of bugs?

why should it? thread locals are... well, local for each thread. you can't access local of different thread without resorting to low-level assembly and OS dependent tricks.


January 03, 2015
On Saturday, 3 January 2015 at 12:12:47 UTC, Ola Fosheim Grøstad wrote:
> On Saturday, 3 January 2015 at 10:13:52 UTC, John Colvin wrote:
>> The Java, C11 and C++11 memory model.
>
> Well...
>
> http://en.cppreference.com/w/cpp/atomic/memory_order

Ok, with the exception of relaxed atomics.

>
>> Yes, I was hoping that perhaps you knew more specifics. AFAIK, when not restricted by any kind of barriers, SC-DRF does not have a particularly significant cost.
>
> I think that even with lock free datastructures such as Intel Threaded Building Blocks, you would still gain from using a non-synchronizing API where possible. In real code you have several layers for functioncalls, so doing this by hand will complicate the code.

That isn't what I mean. I was talking about the restrictions that the memory model puts on optimising _all_ code, except where memory is provably unshared. Things like never creating a write where one would not have occurred in a sequentially consistent execution of the original source.