November 14, 2012
Le 14/11/2012 00:48, Alex Rønne Petersen a écrit :
> On 14-11-2012 00:43, Alex Rønne Petersen wrote:
>> On 14-11-2012 00:38, Andrei Alexandrescu wrote:
>>> On 11/13/12 3:28 PM, Alex Rønne Petersen wrote:
>>>> On 13-11-2012 23:33, Andrei Alexandrescu wrote:
>>>>> shared int x;
>>>>> ...
>>>>> x = 4;
>>>>>
>>>>> You'll need to use x.store(4) instead.
>>>>
>>>> Is that meant to be an atomic store, or just a regular, but explicit,
>>>> store?
>>>
>>> Atomic and sequentially consistent.
>>>
>>>
>>> Andrei
>>
>> OK, but then we have the problem I presented in the OP: This only works
>> for certain types, on certain architectures, for certain processors, ...
>>
>> So, we could limit shared load/store to only work on certain types and
>> require all architectures that D compilers target to provide those.
>> *But* this means that shared on any non-primitive types becomes
>> essentially useless and will in 99% of cases just be casted away. On the
>> other hand, if we make it implementation-defined, people end up writing
>> highly unportable code. So, (unless anyone can come up with better
>> alternatives), I think guaranteeing atomic load/store for a certain set
>> of types is the most sensible way forward.
>>
>> FWIW, these are the types and type categories I'd expect shared
>> load/store to work on, on any architecture:
>>
>> * ubyte, byte
>> * ushort, short
>> * uint, int
>> * ulong, long
>> * float, double
>> * pointers
>> * slices
>> * references
>> * function pointers
>> * delegates
>>
>
> Scratch that, make it this:
>
> * ubyte, byte
> * ushort, short
> * uint, int
> * ulong, long
> * float, double
> * pointers
> * references
> * function pointers
>
> Slices and delegates can't be loaded/stored atomically because very few
> architectures provide instructions to atomically load/store 16 bytes of
> data (required on 64-bit; 32-bit would be fine since that's just 8
> bytes, but portability is king). This is also why ucent, cent, and real
> are not included in the list.
>

That list sound more reasonable.
November 14, 2012
On 11/13/2012 2:33 PM, Andrei Alexandrescu wrote:
> As long as a cast is required along the way, we can't claim victory. I need to
> think about that scenario.

Our car doesn't have an electric starter yet, but it's still better than a horse :-)

November 14, 2012
On 14-11-2012 01:09, deadalnix wrote:
> Le 14/11/2012 00:43, Alex Rønne Petersen a écrit :
>> On 14-11-2012 00:38, Andrei Alexandrescu wrote:
>>> On 11/13/12 3:28 PM, Alex Rønne Petersen wrote:
>>>> On 13-11-2012 23:33, Andrei Alexandrescu wrote:
>>>>> shared int x;
>>>>> ...
>>>>> x = 4;
>>>>>
>>>>> You'll need to use x.store(4) instead.
>>>>
>>>> Is that meant to be an atomic store, or just a regular, but explicit,
>>>> store?
>>>
>>> Atomic and sequentially consistent.
>>>
>>>
>>> Andrei
>>
>> OK, but then we have the problem I presented in the OP: This only works
>> for certain types, on certain architectures, for certain processors, ...
>>
>> So, we could limit shared load/store to only work on certain types and
>> require all architectures that D compilers target to provide those.
>> *But* this means that shared on any non-primitive types becomes
>> essentially useless and will in 99% of cases just be casted away. On the
>> other hand, if we make it implementation-defined, people end up writing
>> highly unportable code. So, (unless anyone can come up with better
>> alternatives), I think guaranteeing atomic load/store for a certain set
>> of types is the most sensible way forward.
>>
>> FWIW, these are the types and type categories I'd expect shared
>> load/store to work on, on any architecture:
>>
>> * ubyte, byte
>> * ushort, short
>> * uint, int
>> * ulong, long
>> * float, double
>> * pointers
>> * slices
>> * references
>> * function pointers
>> * delegates
>>
>
> I wouldn't expected it to work for delegates, long, ulong, double and
> slice on every arch. If it does work, that is awesome, and add to my
> determination that this is the thing to do.

8-byte atomic loads/stores is doable on all major architectures.

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
November 14, 2012
On 11/13/2012 3:43 PM, Alex Rønne Petersen wrote:
> FWIW, these are the types and type categories I'd expect shared load/store to
> work on, on any architecture:
>
> * ubyte, byte
> * ushort, short
> * uint, int
> * ulong, long
> * float, double
> * pointers
> * slices
> * references
> * function pointers
> * delegates
>

Not going to portably work on long, ulong, double, slices, or delegates.

(The compiler should issue an error where it won't work, and allow it where it does, letting the user decide what to do about the non-working cases.)
November 14, 2012
On 14-11-2012 02:33, Walter Bright wrote:
> On 11/13/2012 3:43 PM, Alex Rønne Petersen wrote:
>> FWIW, these are the types and type categories I'd expect shared
>> load/store to
>> work on, on any architecture:
>>
>> * ubyte, byte
>> * ushort, short
>> * uint, int
>> * ulong, long
>> * float, double
>> * pointers
>> * slices
>> * references
>> * function pointers
>> * delegates
>>
>
> Not going to portably work on long, ulong, double, slices, or delegates.
>
> (The compiler should issue an error where it won't work, and allow it
> where it does, letting the user decide what to do about the non-working
> cases.)

I amended that (see my other post). 8-byte loads/stores can be done atomically on all relevant architectures today. Andrei linked a page a while back that explained how to do it on x86, ARM, MIPS, and PowerPC (if memory serves), but I can't seem to find it again...

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
November 14, 2012
On 11/13/2012 4:04 PM, deadalnix wrote:
> That is what java's volatile do. It have several uses cases, including valid
> double check locking (It has to be noted that this idiom is used incorrectly in
> druntime ATM,

Please, please file a bug report about this, rather than a vague statement here. If there already is one, please post its number.


> So sequentially consistent read/write are usefull.

Sure, I agree with that.


> This struct stuff don't make any sense to me. Java, C# and many other language
> multithread, have everything shared and still are able to have finalizer of some
> sort.

I understand, though, that they take steps to ensure that the finalizer is run in one thread and no other thread still has access to it - i.e. it is converted back to a local reference.

> Why couldn't a shared object be destroyed ? Why should it be destroyed in a
> specific thread as it can only refer shared data because of transitivity ?

How can you destroy an object in one thread when another thread holding live references to it? (Well, how can you destroy it without causing corruption bugs, that is.)
November 14, 2012
On 11/13/12 3:48 PM, Alex Rønne Petersen wrote:
> Slices and delegates can't be loaded/stored atomically because very few
> architectures provide instructions to atomically load/store 16 bytes of
> data (required on 64-bit; 32-bit would be fine since that's just 8
> bytes, but portability is king). This is also why ucent, cent, and real
> are not included in the list.

When I wrote TDPL I looked at the contemporary architectures and it seemed all were or were about to support double-word atomic ops. So the intent is to allow shared delegates and slices.

Are there any architectures today that don't support double-word load, store, and CAS?


Andrei
November 14, 2012
Le 14/11/2012 02:39, Walter Bright a écrit :
> On 11/13/2012 4:04 PM, deadalnix wrote:
>> That is what java's volatile do. It have several uses cases, including
>> valid
>> double check locking (It has to be noted that this idiom is used
>> incorrectly in
>> druntime ATM,
>
> Please, please file a bug report about this, rather than a vague
> statement here. If there already is one, please post its number.
>

http://d.puremagic.com/issues/show_bug.cgi?id=6607

>
>> So sequentially consistent read/write are usefull.
>
> Sure, I agree with that.
>
>
>> This struct stuff don't make any sense to me. Java, C# and many other
>> language
>> multithread, have everything shared and still are able to have
>> finalizer of some
>> sort.
>
> I understand, though, that they take steps to ensure that the finalizer
> is run in one thread and no other thread still has access to it - i.e.
> it is converted back to a local reference.
>
>> Why couldn't a shared object be destroyed ? Why should it be destroyed
>> in a
>> specific thread as it can only refer shared data because of
>> transitivity ?
>
> How can you destroy an object in one thread when another thread holding
> live references to it? (Well, how can you destroy it without causing
> corruption bugs, that is.)

Why would you destroy something that isn't dead yet ?
November 14, 2012
On 11/13/12 5:29 PM, Walter Bright wrote:
> On 11/13/2012 2:33 PM, Andrei Alexandrescu wrote:
>> As long as a cast is required along the way, we can't claim victory. I
>> need to
>> think about that scenario.
>
> Our car doesn't have an electric starter yet, but it's still better than
> a horse :-)

Please don't. This is "we're doing better than C++" in disguise and exactly the wrong frame of mind. I find few things more negatively disruptive than lulling into a false sense of achievement.

Andrei
November 14, 2012
On 11/13/12 5:33 PM, Alex Rønne Petersen wrote:
> 8-byte atomic loads/stores is doable on all major architectures.

We're looking at 128-bit load, store, and CAS for 64-bit machines.

Andrei