November 14, 2012
Le 14/11/2012 13:23, David Nadlinger a écrit :
> On Wednesday, 14 November 2012 at 00:04:56 UTC, deadalnix wrote:
>> That is what java's volatile do. It have several uses cases, including
>> valid double check locking (It has to be noted that this idiom is used
>> incorrectly in druntime ATM, which proves both its usefullness and
>> that it require language support) and disruptor which I wanted to
>> implement for message passing in D but couldn't because of lack of
>> support at the time.
>
> What stops you from using core.atomic.{atomicLoad, atomicStore}? I don't
> know whether there might be a weird spec loophole which could
> theoretically lead to them being undefined behavior, but I'm sure that
> they are guaranteed to produce the right code on all relevant compilers.
> You can even specify the memory order semantics if you know what you are
> doing (although this used to trigger a template resolution bug in the
> frontend, no idea if it works now).
>
> David

It is a solution now (it wasn't at the time).

The main drawback with that solution is that the compiler can't optimize thread local read/write regardless of shared read/write. This is wasted opportunity.
November 14, 2012
On 11/13/12 11:37 PM, Jacob Carlborg wrote:
> On 2012-11-13 23:22, Walter Bright wrote:
>
>> But I do see enormous value in shared in that it logically (and rather
>> forcefully) separates thread-local code from multi-thread code. For
>> example, see the post here about adding a destructor to a shared struct,
>> and having it fail to compile. The complaint was along the lines of
>> shared being broken, whereas I viewed it along the lines of shared
>> pointing out a logic problem in the code - what does destroying a struct
>> accessible from multiple threads mean? I think it must be clear that
>> destroying an object can only happen in one thread, i.e. the object must
>> become thread local in order to be destroyed.
>
> If the compiler should/does not add memory barriers, then is there a
> reason for having it built into the language? Can a library solution be
> enough?

The compiler must be in this so as to not do certain reorderings.

Andrei

November 14, 2012
On 11/14/12 1:19 AM, Walter Bright wrote:
> On 11/13/2012 11:56 PM, Jonathan M Davis wrote:
>> Being able to have double-checked locking work would be valuable, and
>> having
>> memory barriers would reduce race condition weirdness when locks
>> aren't used
>> properly, so I think that it would be desirable to have memory barriers.
>
> I'm not saying "memory barriers are bad". I'm saying that having the
> compiler blindly insert them for shared reads/writes is far from the
> right way to do it.

Let's not hasten. That works for Java and C#, and is allowed in C++.

Andrei


November 14, 2012
On 11/14/12 1:20 AM, Walter Bright wrote:
> On 11/13/2012 11:37 PM, Jacob Carlborg wrote:
>> If the compiler should/does not add memory barriers, then is there a
>> reason for
>> having it built into the language? Can a library solution be enough?
>
> Memory barriers can certainly be added using library functions.

The compiler must understand the semantics of barriers such as e.g. it doesn't hoist code above an acquire barrier or below a release barrier.

Andrei


November 14, 2012
On 11/14/12 1:31 AM, Jacob Carlborg wrote:
> On 2012-11-14 10:20, Walter Bright wrote:
>
>> Memory barriers can certainly be added using library functions.
>
> Is there then any real advantage of having it directly in the language?

It's not an advantage, it's a necessity.

Andrei

November 14, 2012
On 11/14/2012 01:42 PM, Michel Fortin wrote:
> On 2012-11-14 10:30:46 +0000, Timon Gehr <timon.gehr@gmx.ch> said:
>
>> On 11/14/2012 04:12 AM, Michel Fortin wrote:
>>> On 2012-11-13 19:54:32 +0000, Timon Gehr <timon.gehr@gmx.ch> said:
>>>
>>>> On 11/12/2012 02:48 AM, Michel Fortin wrote:
>>>>> I feel like the concurrency aspect of D2 was rushed in the haste of
>>>>> having it ready for TDPL. Shared, deadlock-prone synchronized
>>>>> classes[1]
>>>>> as well as destructors running in any thread (thanks GC!) plus a
>>>>> couple
>>>>> of other irritants makes the whole concurrency scheme completely
>>>>> flawed
>>>>> if you ask me. D2 needs a near complete overhaul on the concurrency
>>>>> front.
>>>>>
>>>>> I'm currently working on a big code base in C++. While I do miss D
>>>>> when
>>>>> it comes to working with templates as well as for its compilation
>>>>> speed
>>>>> and a few other things, I can't say I miss D much when it comes to
>>>>> anything touching concurrency.
>>>>>
>>>>> [1]: http://michelf.ca/blog/2012/mutex-synchonization-in-d/
>>>>
>>>> I am always irritated by shared-by-default static variables.
>>>
>>> I tend to have very little global state in my code,
>>
>> So do I. A thread-local static variable does not imply global state.
>> (The execution stack is static.) Eg. in a few cases it is sensible to
>> use static variables as implicit arguments to avoid having to pass
>> them around by copying them all over the execution stack.
>>
>> private int x = 0;
>>
>> int foo(){
>>      int xold = x;
>>      scope(exit) x = xold;
>>      x = new_value;
>>      bar(); // reads x
>>      return baz(); // reads x
>> }
>
> I'd consider that poor style.

I'd consider this a poor statement to make. Universally quantified assertions require more rigorous justification.

"In a few cases" it is not, even if it is poor style "most of the time".

> Use a struct to encapsulate the state, then make bar, and baz member functions of that struct.

They could eg. be virtual member functions of a class already.

> Using a local-scoped struct would work with pure,

It might.

> be more efficient

Not necessarily.

> (accessing thread-local variables takes more cycles),

It can be accessed sparsely, copying around the struct pointer is work too, and the fastest access path in a proper alternative design would potentially be even slower.

> and be less error-prone while refactoring.

If done in such a way that it makes refactoring error prone, it is to be considered poor style.


November 14, 2012
On 11/14/12 4:23 AM, David Nadlinger wrote:
> On Wednesday, 14 November 2012 at 00:04:56 UTC, deadalnix wrote:
>> That is what java's volatile do. It have several uses cases, including
>> valid double check locking (It has to be noted that this idiom is used
>> incorrectly in druntime ATM, which proves both its usefullness and
>> that it require language support) and disruptor which I wanted to
>> implement for message passing in D but couldn't because of lack of
>> support at the time.
>
> What stops you from using core.atomic.{atomicLoad, atomicStore}? I don't
> know whether there might be a weird spec loophole which could
> theoretically lead to them being undefined behavior, but I'm sure that
> they are guaranteed to produce the right code on all relevant compilers.
> You can even specify the memory order semantics if you know what you are
> doing (although this used to trigger a template resolution bug in the
> frontend, no idea if it works now).
>
> David

This is a simplification of what should be going on. The core.atomic.{atomicLoad, atomicStore} functions must be intrinsics so the compiler generate sequentially consistent code with them (i.e. not perform certain reorderings). Then there are loads and stores with weaker consistency semantics (acquire, release, acquire/release, and consume).

Andrei
November 14, 2012
On 11/14/12 4:47 AM, Jacob Carlborg wrote:
> On 2012-11-14 11:38, Walter Bright wrote:
>
>> Not that I can think of.
>
> Then we might want to remove it since it's either not working or
> basically everyone has misunderstood how it should work.

Actually this hypothesis is false.

Andrei

November 14, 2012
On 14-11-2012 15:14, Andrei Alexandrescu wrote:
> On 11/14/12 1:19 AM, Walter Bright wrote:
>> On 11/13/2012 11:56 PM, Jonathan M Davis wrote:
>>> Being able to have double-checked locking work would be valuable, and
>>> having
>>> memory barriers would reduce race condition weirdness when locks
>>> aren't used
>>> properly, so I think that it would be desirable to have memory barriers.
>>
>> I'm not saying "memory barriers are bad". I'm saying that having the
>> compiler blindly insert them for shared reads/writes is far from the
>> right way to do it.
>
> Let's not hasten. That works for Java and C#, and is allowed in C++.
>
> Andrei
>
>

I need some clarification here: By memory barrier, do you mean x86's mfence, sfence, and lfence? Because as Walter said, inserting those blindly when unnecessary can lead to terrible performance because it practically murders pipelining.

(And note that you can't optimize this either; since the dependencies memory barriers are supposed to express are subtle and not detectable by a compiler, the compiler would always have to insert them because it can't know when it would be safe not to.)

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
November 14, 2012
Le 14/11/2012 15:39, Alex Rønne Petersen a écrit :
> On 14-11-2012 15:14, Andrei Alexandrescu wrote:
>> On 11/14/12 1:19 AM, Walter Bright wrote:
>>> On 11/13/2012 11:56 PM, Jonathan M Davis wrote:
>>>> Being able to have double-checked locking work would be valuable, and
>>>> having
>>>> memory barriers would reduce race condition weirdness when locks
>>>> aren't used
>>>> properly, so I think that it would be desirable to have memory
>>>> barriers.
>>>
>>> I'm not saying "memory barriers are bad". I'm saying that having the
>>> compiler blindly insert them for shared reads/writes is far from the
>>> right way to do it.
>>
>> Let's not hasten. That works for Java and C#, and is allowed in C++.
>>
>> Andrei
>>
>>
>
> I need some clarification here: By memory barrier, do you mean x86's
> mfence, sfence, and lfence? Because as Walter said, inserting those
> blindly when unnecessary can lead to terrible performance because it
> practically murders pipelining.
>

In fact, x86 is mostly sequentially consistent due to its memory model. It only require an mfence when an shared store is followed by a shared load.

See : http://g.oswego.edu/dl/jmm/cookbook.html for more information on the barrier required on different architectures.

> (And note that you can't optimize this either; since the dependencies
> memory barriers are supposed to express are subtle and not detectable by
> a compiler, the compiler would always have to insert them because it
> can't know when it would be safe not to.)
>

Compiler is aware of what is thread local and what isn't. It means the compiler can fully optimize TL store and load (like doing register promotion or reorder them across shared store/load).

This have a cost, indeed, but is useful, and Walter's solution to cast away shared when a mutex is acquired is always available.
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19