November 14, 2012
On 14-11-2012 02:52, Andrei Alexandrescu wrote:
> On 11/13/12 3:48 PM, Alex Rønne Petersen wrote:
>> Slices and delegates can't be loaded/stored atomically because very few
>> architectures provide instructions to atomically load/store 16 bytes of
>> data (required on 64-bit; 32-bit would be fine since that's just 8
>> bytes, but portability is king). This is also why ucent, cent, and real
>> are not included in the list.
>
> When I wrote TDPL I looked at the contemporary architectures and it
> seemed all were or were about to support double-word atomic ops. So the
> intent is to allow shared delegates and slices.
>
> Are there any architectures today that don't support double-word load,
> store, and CAS?
>
>
> Andrei

I do not know of a single architecture apart from x86 that supports > 8-byte load/store/CAS (and come to think of it, I'm not so sure x86 actually can do 16-byte load/store, only CAS). So while a shared delegate is doable in 32-bit, it isn't really in 64-bit.

(I deliberately talk in terms of bytes here because that's the nomenclature most architecture manuals use from what I've seen.)

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
November 14, 2012
Le 14/11/2012 02:36, Alex Rønne Petersen a écrit :
> On 14-11-2012 02:33, Walter Bright wrote:
>> On 11/13/2012 3:43 PM, Alex Rønne Petersen wrote:
>>> FWIW, these are the types and type categories I'd expect shared
>>> load/store to
>>> work on, on any architecture:
>>>
>>> * ubyte, byte
>>> * ushort, short
>>> * uint, int
>>> * ulong, long
>>> * float, double
>>> * pointers
>>> * slices
>>> * references
>>> * function pointers
>>> * delegates
>>>
>>
>> Not going to portably work on long, ulong, double, slices, or delegates.
>>
>> (The compiler should issue an error where it won't work, and allow it
>> where it does, letting the user decide what to do about the non-working
>> cases.)
>
> I amended that (see my other post). 8-byte loads/stores can be done
> atomically on all relevant architectures today. Andrei linked a page a
> while back that explained how to do it on x86, ARM, MIPS, and PowerPC
> (if memory serves), but I can't seem to find it again...
>

http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
November 14, 2012
On 11/13/12 5:58 PM, Alex Rønne Petersen wrote:
> On 14-11-2012 02:52, Andrei Alexandrescu wrote:
>> On 11/13/12 3:48 PM, Alex Rønne Petersen wrote:
>>> Slices and delegates can't be loaded/stored atomically because very few
>>> architectures provide instructions to atomically load/store 16 bytes of
>>> data (required on 64-bit; 32-bit would be fine since that's just 8
>>> bytes, but portability is king). This is also why ucent, cent, and real
>>> are not included in the list.
>>
>> When I wrote TDPL I looked at the contemporary architectures and it
>> seemed all were or were about to support double-word atomic ops. So the
>> intent is to allow shared delegates and slices.
>>
>> Are there any architectures today that don't support double-word load,
>> store, and CAS?
>>
>>
>> Andrei
>
> I do not know of a single architecture apart from x86 that supports >
> 8-byte load/store/CAS (and come to think of it, I'm not so sure x86
> actually can do 16-byte load/store, only CAS). So while a shared
> delegate is doable in 32-bit, it isn't really in 64-bit.

Intel does 128-bit atomic load and store, see http://www.intel.com/content/www/us/en/processors/itanium/itanium-architecture-software-developer-rev-2-3-vol-2-manual.html, "4.5 Memory Datum Alignment and Atomicity".

Andrei

November 14, 2012
On 14-11-2012 03:00, deadalnix wrote:
> Le 14/11/2012 02:36, Alex Rønne Petersen a écrit :
>> On 14-11-2012 02:33, Walter Bright wrote:
>>> On 11/13/2012 3:43 PM, Alex Rønne Petersen wrote:
>>>> FWIW, these are the types and type categories I'd expect shared
>>>> load/store to
>>>> work on, on any architecture:
>>>>
>>>> * ubyte, byte
>>>> * ushort, short
>>>> * uint, int
>>>> * ulong, long
>>>> * float, double
>>>> * pointers
>>>> * slices
>>>> * references
>>>> * function pointers
>>>> * delegates
>>>>
>>>
>>> Not going to portably work on long, ulong, double, slices, or delegates.
>>>
>>> (The compiler should issue an error where it won't work, and allow it
>>> where it does, letting the user decide what to do about the non-working
>>> cases.)
>>
>> I amended that (see my other post). 8-byte loads/stores can be done
>> atomically on all relevant architectures today. Andrei linked a page a
>> while back that explained how to do it on x86, ARM, MIPS, and PowerPC
>> (if memory serves), but I can't seem to find it again...
>>
>
> http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

Thanks, exactly that. No MIPS, though. I guess I'm going to have to go dig through their manuals.

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
November 14, 2012
On 14-11-2012 03:02, Andrei Alexandrescu wrote:
> On 11/13/12 5:58 PM, Alex Rønne Petersen wrote:
>> On 14-11-2012 02:52, Andrei Alexandrescu wrote:
>>> On 11/13/12 3:48 PM, Alex Rønne Petersen wrote:
>>>> Slices and delegates can't be loaded/stored atomically because very few
>>>> architectures provide instructions to atomically load/store 16 bytes of
>>>> data (required on 64-bit; 32-bit would be fine since that's just 8
>>>> bytes, but portability is king). This is also why ucent, cent, and real
>>>> are not included in the list.
>>>
>>> When I wrote TDPL I looked at the contemporary architectures and it
>>> seemed all were or were about to support double-word atomic ops. So the
>>> intent is to allow shared delegates and slices.
>>>
>>> Are there any architectures today that don't support double-word load,
>>> store, and CAS?
>>>
>>>
>>> Andrei
>>
>> I do not know of a single architecture apart from x86 that supports >
>> 8-byte load/store/CAS (and come to think of it, I'm not so sure x86
>> actually can do 16-byte load/store, only CAS). So while a shared
>> delegate is doable in 32-bit, it isn't really in 64-bit.
>
> Intel does 128-bit atomic load and store, see
> http://www.intel.com/content/www/us/en/processors/itanium/itanium-architecture-software-developer-rev-2-3-vol-2-manual.html,
> "4.5 Memory Datum Alignment and Atomicity".
>
> Andrei
>

That's Itanium, though, not x86. Itanium is a fairly high-end, enterprise-class thing, so that's not very surprising.

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
November 14, 2012
On 2012-11-13 19:54:32 +0000, Timon Gehr <timon.gehr@gmx.ch> said:

> On 11/12/2012 02:48 AM, Michel Fortin wrote:
>> I feel like the concurrency aspect of D2 was rushed in the haste of
>> having it ready for TDPL. Shared, deadlock-prone synchronized classes[1]
>> as well as destructors running in any thread (thanks GC!) plus a couple
>> of other irritants makes the whole concurrency scheme completely flawed
>> if you ask me. D2 needs a near complete overhaul on the concurrency front.
>> 
>> I'm currently working on a big code base in C++. While I do miss D when
>> it comes to working with templates as well as for its compilation speed
>> and a few other things, I can't say I miss D much when it comes to
>> anything touching concurrency.
>> 
>> [1]: http://michelf.ca/blog/2012/mutex-synchonization-in-d/
> 
> I am always irritated by shared-by-default static variables.

I tend to have very little global state in my code, so shared-by-default is not something I have to fight with very often. I do agree that thread-local is a better default.

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca/

November 14, 2012
On Tuesday, November 13, 2012 14:33:50 Andrei Alexandrescu wrote:
> As long as a cast is required along the way, we can't claim victory. I need to think about that scenario.

At this point, I don't see how it could be otherwise. Having the shared equivalent of const would just lead to that being used everywhere and defeat the purpose of shared in the first place. If it's not segregated, it's not doing its job. But that leaves us with most functions not working with shared, which is also a problem. Templates are a partial solution, but they obviously don't work for everything.

In general, I would expect that all uses of shared would be protected by a mutex or synchronized block or other similar construct. It's just going to cause problems to do otherwise. There are some cases where if you can guarantee that writes and reads are atomic, you're fine skipping the mutexes, but those are relatively rare, particularly when you consider the issues in making anything but extremely trivial writes or reads atomic.

That being the case, it doesn't really seem all that unreasonable to me for it to be normal to have to cast shared to non-shared to pass to functions as long as all of that code is protected with a mutex or another, similar construct - though if those functions aren't pure, you _could_ run into entertaining problems when a non-shared reference to the data gets wiled away somewhere in those function calls.

But we seem to have contradictory requirements here of trying to segregate shared from normal, thread-local stuff but are still looking to be able to use shared with functions intended to be used with non-shared stuff.

- Jonathan M Davis
November 14, 2012
On Tuesday, November 13, 2012 22:12:12 Michel Fortin wrote:
> I tend to have very little global state in my code, so shared-by-default is not something I have to fight with very often. I do agree that thread-local is a better default.

Thread-local by default is a _huge_ step forward, and in hindsight, it seems pretty ridiculous that a language would do anything else. Shared by default is just too horrible.

- Jonathan M Davis
November 14, 2012
On 2012-11-13 23:22, Walter Bright wrote:

> But I do see enormous value in shared in that it logically (and rather
> forcefully) separates thread-local code from multi-thread code. For
> example, see the post here about adding a destructor to a shared struct,
> and having it fail to compile. The complaint was along the lines of
> shared being broken, whereas I viewed it along the lines of shared
> pointing out a logic problem in the code - what does destroying a struct
> accessible from multiple threads mean? I think it must be clear that
> destroying an object can only happen in one thread, i.e. the object must
> become thread local in order to be destroyed.

If the compiler should/does not add memory barriers, then is there a reason for having it built into the language? Can a library solution be enough?

-- 
/Jacob Carlborg
November 14, 2012
On Tuesday, November 13, 2012 14:22:07 Walter Bright wrote:
> I'm just not convinced that having the compiler add memory barriers:
> 
> 1. will result in correctly working code, when done by programmers who have only an incomplete understanding of memory barriers, which would be about 99.9% of us.
> 
> 2. will result in efficient code

Being able to have double-checked locking work would be valuable, and having memory barriers would reduce race condition weirdness when locks aren't used properly, so I think that it would be desirable to have memory barriers. If there's a major performance penalty though, that might be a reason not to do it. Certainly, I don't think that there's any question that adding memory barriers won't make it so that you don't need mutexes or synchronized blocks or whatnot. shared's primary benefit is in logically separating normal code from code that must shared data across threads and making it possible for the compiler to optimize based on the fact that it knows that a variable is thread-local.

- Jonathan M Davis