DIP 1024--Shared Atomics--Community Review Round 1 (page 9)

On Saturday, 12 October 2019 at 20:52:45 UTC, Jonathan M Davis wrote: > In the case of shared, in general, it's not thread-safe to read or write to such a variable without either using atomics or some other form of thread synchronization that is currently beyond the ability of the compiler to make guarantees about and will likely always be beyond the ability of the compiler to make guarantees about except maybe in fairly restricted circumstances. A shared variable may not need any synchronization at all, depending on the algorithm it is used in. There is a class of optimized algorithms that act like gathering operations. You mostly find them on GPUs because they map quite naturally to that hardware architecture, but they can also be implemented on CPUs with multiple threads. The core idea is in every case that you generate a set of output values in parallel in such a way that each value in the output is generated by at most one of the running threads. So there is no need to synchronize memory writes when the underlying hardware architecture provides sufficient cache coherency guarantees. All the threads share the same input, which obviously must not be modified. A compiler cannot possibly be smart enough to prove that synchronization is not required for these kinds of algorithms. And any form of enforced synchronization (explicit or implicit) would significantly affect performance. So how would you express this implicit synchronization by mathematical properties to the compiler? Would the whole implementation have to be marked as @safe?

On Tuesday, 15 October 2019 at 10:25:54 UTC, Gregor Mückl wrote: > generated by at most one of the running threads. So there is no need to synchronize memory writes when the underlying hardware architecture provides sufficient cache coherency guarantees. So what you basically is saying is that a low level language should be careful about assuming a particular hardware model and leave more to intrinsics and libraries. I think that is reasonable, because hardware does change and concurrent programming strategies change. So D has to figure out whether it is a low level language for many architectures or a specific x86 centric one limited to common contemporary programming patterns. My perception is that D does not need to lock down «shared» but can rather improve on these: 1. A set of basic lowlevel building blocks + intrinsics (which D has mostly). 2. Metaprogramming features to build libraries that support contemporary patterns (which D has mostly). 3. Language features that support clean static analysis tooling (perhaps lacking). But given the history of D, it will probably go with what is demanded based on contemporary x86 patterns.

On Tuesday, 15 October 2019 at 11:28:24 UTC, Ola Fosheim Grøstad wrote: > On Tuesday, 15 October 2019 at 10:25:54 UTC, Gregor Mückl wrote: >> generated by at most one of the running threads. So there is no need to synchronize memory writes when the underlying hardware architecture provides sufficient cache coherency guarantees. > > So what you basically is saying is that a low level language should be careful about assuming a particular hardware model and leave more to intrinsics and libraries. I think that is reasonable, because hardware does change and concurrent programming strategies change. I'm not sure if this is quite what I'm saying. I guess I'm fine with the compiler telling me that a piece of code tries to access shared data without any annotation that this is what is desired. In my opinion, it then needs to be up to the developer to deal with this. It is *not* OK for the compiler to go and secretly insert synchronization mechanisms (memory barriers, atomic operations, etc...) behind the developer's back. If such an automatism must exist, it also must be invoked by the developer explicitly ("Computer, generate automatic synchronization!"). And there must be a way to tell the compiler that the developer is a responsible adult who knows what he/she is doing and that the code is OK for reasons unknown to the compiler.

October 16, 2019

Re: DIP 1024--Shared Atomics--Community Review Round 1

Posted by Jonathan M Davis
in reply to Gregor Mückl

Permalink

Jonathan M Davis

Posted in reply to Gregor Mückl

Permalink

On Wednesday, October 16, 2019 12:43:30 PM MDT Gregor Mückl via Digitalmars- d wrote:
> On Tuesday, 15 October 2019 at 11:28:24 UTC, Ola Fosheim Grøstad
>
> wrote:
> > On Tuesday, 15 October 2019 at 10:25:54 UTC, Gregor Mückl wrote:
> >> generated by at most one of the running threads. So there is no need to synchronize memory writes when the underlying hardware architecture provides sufficient cache coherency guarantees.
> >
> > So what you basically is saying is that a low level language should be careful about assuming a particular hardware model and leave more to intrinsics and libraries. I think that is reasonable, because hardware does change and concurrent programming strategies change.
>
> I'm not sure if this is quite what I'm saying. I guess I'm fine with the compiler telling me that a piece of code tries to access shared data without any annotation that this is what is desired. In my opinion, it then needs to be up to the developer to deal with this. It is *not* OK for the compiler to go and secretly insert synchronization mechanisms (memory barriers, atomic operations, etc...) behind the developer's back. If such an automatism must exist, it also must be invoked by the developer explicitly ("Computer, generate automatic synchronization!"). And there must be a way to tell the compiler that the developer is a responsible adult who knows what he/she is doing and that the code is OK for reasons unknown to the compiler.

The DIP is not clear on the matter and needs to be updated, but Walter has made it clear in his comments that there is no plan to even insert core.atomic calls for you. Reading and writing to shared data is going to be illegal, requiring that the programmer either explicitly use core.atomic or that they cast away shared (rendering that part of the code @system, thereby segregating the code that the programmer has to verify for thread safety).

It is conceivable that someone will come up with a feature that will allow the compiler to implicitly remove shared under some set of circumstances, because it's able to see that only one thread can access that data at that point (e.g. TDPL's synchronized classes would be able to do this to the outer layer of shared for the member variables of that class), but that would just be an improvement on top of what Walter is proposing, and honestly, I doubt that we'll ever see much along those lines, because making such compiler guarantees is very difficult with D's type system (e.g. TDPL's synchronized classes lock the type down quite a lot and yet are still only able to remove a single layer of shared, making them borderline useless, and AFAIK, no one has yet proposed anything that could do better). The type system would likely need a concept of thread ownership to safely reason about much with regards to shared, and even if the compiler _were_ able to implicitly remove shared under some set of circumstances, there's no way that it's going to understand all of the various threading mechanisms that get used in code. So, at best, you'd be able to use a particular feature to have a piece of code implicitly remove shared, because the compiler is able to do it for that particular idiom. There's no question that the programmer is going to have to cast away shared in many cases in order to actually operate on shared data.

And once the programmer casts away shared, that code is then @system, requiring the programmer to vet it and certify that it's actually @safe by using @trusted. So, all of the code that operates on shared data should then be in three camps:

1. It uses core.atomic.

2. It casts away shared and is thus @system.

3. It involves shared member functions which then do either #1 or #2 internally.

Andrei has talked in the past about inserting magic to make shared "just work" without casting, but I think that it's pretty clear at this point that that isn't going to work. The best that we could do would be to make some operations which are guaranteed to be atomic allowed on shared primitive types, and that would be a questionable choice for a variety of reasons. Either way, it's not currently the plan to do so.

- Jonathan M Davis

On Tuesday, 1 October 2019 at 10:40:52 UTC, Mike Parker wrote: > This is the feedback thread for the first round of Community Review for DIP 1024, "Shared Atomics": > > https://github.com/dlang/DIPs/blob/0b892dd99aba74b9631572ad3a53000f5975b7c2/DIPs/DIP1024.md > > Apologies to everyone who participated in the review. I didn't give the discussion my full attention, else I would have called off the review early so that the document could be corrected. Generally, that's what the policy should be when such confusing ambiguities slip through the revision process. So if it does happen again in the future, I ask that whoever catches it please ping me (email or slack is the best bet) in case I miss it. Walter is going to revise the DIP and we will do another round of community review soon.

Forums