November 14, 2012
On 11/14/12 7:11 AM, Alex Rønne Petersen wrote:
> On 14-11-2012 15:32, Andrei Alexandrescu wrote:
>> On 11/14/12 4:23 AM, David Nadlinger wrote:
>>> On Wednesday, 14 November 2012 at 00:04:56 UTC, deadalnix wrote:
>>>> That is what java's volatile do. It have several uses cases, including
>>>> valid double check locking (It has to be noted that this idiom is used
>>>> incorrectly in druntime ATM, which proves both its usefullness and
>>>> that it require language support) and disruptor which I wanted to
>>>> implement for message passing in D but couldn't because of lack of
>>>> support at the time.
>>>
>>> What stops you from using core.atomic.{atomicLoad, atomicStore}? I don't
>>> know whether there might be a weird spec loophole which could
>>> theoretically lead to them being undefined behavior, but I'm sure that
>>> they are guaranteed to produce the right code on all relevant compilers.
>>> You can even specify the memory order semantics if you know what you are
>>> doing (although this used to trigger a template resolution bug in the
>>> frontend, no idea if it works now).
>>>
>>> David
>>
>> This is a simplification of what should be going on. The
>> core.atomic.{atomicLoad, atomicStore} functions must be intrinsics so
>> the compiler generate sequentially consistent code with them (i.e. not
>> perform certain reorderings). Then there are loads and stores with
>> weaker consistency semantics (acquire, release, acquire/release, and
>> consume).
>>
>> Andrei
>
> They already work as they should:
>
> * DMD: They use inline asm, so they're guaranteed to not be reordered.
> Calls aren't reordered with DMD either, so even if the former wasn't the
> case, it'd still work.
> * GDC: They map directly to the GCC __sync_* builtins, which have the
> semantics you describe (with full sequential consistency).
> * LDC: They map to LLVM's load/store instructions with the atomic flag
> set and with the given atomic consistency, which have the semantics you
> describe.
>
> I don't think there's anything that actually needs to be fixed there.

The language definition should be made clear so as future optimizations of existing implementations, and future implementations, don't push things over the limit.

Andrei


November 14, 2012
On 11/14/12 7:14 AM, Jacob Carlborg wrote:
> On 2012-11-14 15:33, Andrei Alexandrescu wrote:
>
>> Actually this hypothesis is false.
>
> That we should remove it or that it's not working/nobody understands
> what it should do? If it's the latter then this thread is the evidence
> that my hypothesis is true.

The hypothesis that atomic primitives can be implemented as a library.

Andrei
November 14, 2012
On 11/14/12 7:16 AM, Jacob Carlborg wrote:
> On 2012-11-14 15:22, Andrei Alexandrescu wrote:
>
>> It's not an advantage, it's a necessity.
>
> Walter seems to indicate that there is no technical reason for "shared"
> to be part of the language.

Walter is a self-confessed dilettante in threading. To be frank I hope he asks more and answers less in this thread.

> I don't know how these memory barriers work,
> that's why I'm asking. Does it need to be in the language or not?

Memory ordering must be built into the language and understood by the compiler.


Andrei
November 14, 2012
On 11/14/12 8:59 AM, David Nadlinger wrote:
> On Wednesday, 14 November 2012 at 14:32:34 UTC, Andrei Alexandrescu wrote:
>> On 11/14/12 4:23 AM, David Nadlinger wrote:
>>> On Wednesday, 14 November 2012 at 00:04:56 UTC, deadalnix wrote:
>>>> That is what java's volatile do. It have several uses cases, including
>>>> valid double check locking (It has to be noted that this idiom is used
>>>> incorrectly in druntime ATM, which proves both its usefullness and
>>>> that it require language support) and disruptor which I wanted to
>>>> implement for message passing in D but couldn't because of lack of
>>>> support at the time.
>>>
>>> What stops you from using core.atomic.{atomicLoad, atomicStore}? I don't
>>> know whether there might be a weird spec loophole which could
>>> theoretically lead to them being undefined behavior, but I'm sure that
>>> they are guaranteed to produce the right code on all relevant compilers.
>>> You can even specify the memory order semantics if you know what you are
>>> doing (although this used to trigger a template resolution bug in the
>>> frontend, no idea if it works now).
>>>
>>> David
>>
>> This is a simplification of what should be going on. The
>> core.atomic.{atomicLoad, atomicStore} functions must be intrinsics so
>> the compiler generate sequentially consistent code with them (i.e. not
>> perform certain reorderings). Then there are loads and stores with
>> weaker consistency semantics (acquire, release, acquire/release, and
>> consume).
>
> Sorry, I don't quite see where I simplified things.

First, there are more kinds of atomic loads and stores. Then, the fact that the calls are not supposed to be reordered must be a guarantee of the language, not a speculation about an implementation. We can't argue that a feature works just because it so happens an implementation works a specific way.

> Yes, in the
> implementation of atomicLoad/atomicStore, one would probably use
> compiler intrinsics, as done in LDC's druntime, or inline assembly, as
> done for DMD.
>
> But an optimizer will never move instructions across opaque function
> calls, because they could have arbitrary side effects.

Nowhere in the language definition is explained what an opaque function call is and what optimizations can and cannot be done in the presence of such.

> So, either we are
> fine by definition,

s/definition/happenstance/

> or if the compiler inlines the
> atomicLoad/atomicStore calls (which is actually possible in LDC), then
> its optimizer will detect the presence of inline assembly resp. the
> load/store intrinsics, and take care of not reordering the instructions
> in an invalid way.
>
> I don't see how this makes my answer to deadalnix (that »volatile« is
> not necessary to implement sequentially consistent loads/stores) any
> less valid.

Using load/store everywhere would make volatile unneeded (and for us, shared). But the advantage there is that you qualify the type/value once and then you don't need to remember to only use specific primitives to manipulate it.


Andrei
November 14, 2012
On 11/14/12 9:15 AM, David Nadlinger wrote:
> On Wednesday, 14 November 2012 at 14:16:57 UTC, Andrei Alexandrescu wrote:
>> On 11/14/12 1:20 AM, Walter Bright wrote:
>>> On 11/13/2012 11:37 PM, Jacob Carlborg wrote:
>>>> If the compiler should/does not add memory barriers, then is there a
>>>> reason for
>>>> having it built into the language? Can a library solution be enough?
>>>
>>> Memory barriers can certainly be added using library functions.
>>
>> The compiler must understand the semantics of barriers such as e.g. it
>> doesn't hoist code above an acquire barrier or below a release barrier.
>
> Again, this is true, but it would be a fallacy to conclude that
> compiler-inserted memory barriers for »shared« are required due to this
> (and it is »shared« we are discussing here!).
>
> Simply having compiler intrinsics for atomic loads/stores is enough,
> which is hardly »built into the language«.

Compiler intrinsics ====== built into the language.

Andrei

November 14, 2012
On Wednesday, 14 November 2012 at 17:31:07 UTC, David Nadlinger wrote:
> Thus, »we«, meaning on a language level, don't need to change anything about the current situations, […]

Let my clarify that: We don't necessarily need to tuck on any extra semantics to the language other than what we currently have. However, what we must indeed do is clarifying/specifying the implicit consensus on which the current implementations are built. We really need a »The D Memory Model«-style document.

David
November 14, 2012
On 11/14/12 9:31 AM, David Nadlinger wrote:
> On Wednesday, 14 November 2012 at 15:08:35 UTC, Andrei Alexandrescu wrote:
>> Sorry, I was imprecise. We need to (a) define intrinsics for loading
>> and storing data with high-level semantics (a short list: acquire,
>> release, acquire+release, and sequentially-consistent) and THEN (b)
>> implement the needed code generation appropriately for each
>> architecture. Indeed on x86 there is little need to insert fence
>> instructions, BUT there is a definite need for the compiler to prevent
>> certain reorderings. That's why implementing shared data operations
>> (whether implicit or explicit) as sheer library code is NOT possible.
>
> Sorry, I didn't see this message of yours before replying (the perils of
> threaded news readers…).
>
> You are right about the fact that we need some degree of compiler
> support for atomic instructions. My point was that is it already
> available, otherwise it would have been impossible to implement
> core.atomic.{atomicLoad, atomicStore} (for DMD inline asm is used, which
> prohibits compiler code motion).

Yah, the whole point here is that we need something IN THE LANGUAGE DEFINITION about atomicLoad and atomicStore. NOT IN THE IMPLEMENTATION.

THIS IS VERY IMPORTANT.

> Thus, »we«, meaning on a language level, don't need to change anything
> about the current situations, with the possible exception of adding
> finer-grained control to core.atomic.MemoryOrder/mysnc [1]. It is the
> duty of the compiler writers to provide the appropriate means to
> implement druntime on their code generation infrastructure – and indeed,
> the situation in DMD could be improved, using inline asm is hitting a
> fly with a sledgehammer.

That is correct. My point is that compiler implementers would follow some specification. That specification would contain informationt hat atomicLoad and atomicStore must have special properties that put them apart from any other functions.

> David
>
>
> [1] I am not sure where the point of diminishing returns is here,
> although it might make sense to provide the same options as C++11. If I
> remember correctly, D1/Tango supported a lot more levels of
> synchronization.

We could start with sequential consistency and then explore riskier/looser policies.


Andrei
November 14, 2012
On 14 November 2012 17:50, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
> On 11/14/12 9:15 AM, David Nadlinger wrote:
>>
>> On Wednesday, 14 November 2012 at 14:16:57 UTC, Andrei Alexandrescu wrote:
>>>
>>> On 11/14/12 1:20 AM, Walter Bright wrote:
>>>>
>>>> On 11/13/2012 11:37 PM, Jacob Carlborg wrote:
>>>>>
>>>>> If the compiler should/does not add memory barriers, then is there a
>>>>> reason for
>>>>> having it built into the language? Can a library solution be enough?
>>>>
>>>>
>>>> Memory barriers can certainly be added using library functions.
>>>
>>>
>>> The compiler must understand the semantics of barriers such as e.g. it doesn't hoist code above an acquire barrier or below a release barrier.
>>
>>
>> Again, this is true, but it would be a fallacy to conclude that compiler-inserted memory barriers for »shared« are required due to this (and it is »shared« we are discussing here!).
>>
>> Simply having compiler intrinsics for atomic loads/stores is enough, which is hardly »built into the language«.
>
>
> Compiler intrinsics ====== built into the language.
>
> Andrei
>

Not necessarily. For example, printf is a compiler intrinsic for GDC, but it's not built into the language in the sense of the compiler *provides* the codegen for it.  Though it is aware of what it is and what it does, so can perform relevant optimisations around the use of it.


Regards,
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
November 14, 2012
On 2012-11-14 18:36, Andrei Alexandrescu wrote:

> The hypothesis that atomic primitives can be implemented as a library.

I don't know these kind of things, that's why I'm asking.

-- 
/Jacob Carlborg
November 14, 2012
On 2012-11-14 18:40, Andrei Alexandrescu wrote:

> Memory ordering must be built into the language and understood by the
> compiler.

Ok, thanks for the expatiation.

-- 
/Jacob Carlborg