Thread overview
[dmd-concurrency] shared arrays
Jan 28, 2010
Michel Fortin
Jan 28, 2010
Robert Jacques
Jan 28, 2010
Robert Jacques
Jan 28, 2010
Kevin Bealer
January 27, 2010
Consider a shared array:

shared int[] x;

What can be done with it? It's tricky; the array has two words worth of info and we can't assume 128 bits can be atomically written on a 64-bit machine.

So you can't assign the entire array. If we assume no assignment, then element access with bounds checking becomes possible. But then we need to provide a method for changing a shared array. There are a few possibilities:

(a) recommend an extra indirection

shared int[]* x;

A pointer can be reassigned, so whenever you want to mess with the array you reallocate it.

(b) recommend you put the array in a class

class A
{
     int[] x;
}

Then, inside synchronized methods of shared(A) objects, the type of x changes from shared(int[]) to shared(int)[] and can be manipulated.

(c) recommend the yet-unwritten class std.concurrency.SharedArray.

Any thoughts are welcome.


Andrei
January 27, 2010
Le 2010-01-27 ? 18:25, Andrei Alexandrescu a ?crit :

> Consider a shared array:
> 
> shared int[] x;
> 
> What can be done with it? It's tricky; the array has two words worth of info and we can't assume 128 bits can be atomically written on a 64-bit machine.

Isn't that a great case for a lock pool controlled with a hash table? If you have dismissed that option, why?


-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



January 27, 2010
On Wed, 27 Jan 2010 19:19:46 -0500, Michel Fortin <michel.fortin at michelf.com> wrote:

> Le 2010-01-27 ? 18:25, Andrei Alexandrescu a ?crit :
>
>> Consider a shared array:
>>
>> shared int[] x;
>>
>> What can be done with it? It's tricky; the array has two words worth of info and we can't assume 128 bits can be atomically written on a 64-bit machine.
>
> Isn't that a great case for a lock pool controlled with a hash table? If you have dismissed that option, why?
>

Locking for a simple assignment is costly, doesn't scale well and tends to turn parallel programs into serial one.

Anyways, the lack of atomic 128-bit loads is a equally troublesome. Not to mention an atomic read has to be done at every indexing of the var.

I did write down a simple single cas/xchg routine below, but it suffers
 from tearing and inconsistencies if/when two or more writers assign to x.

Given:
shared int[] x;
shared int[] y;
x = y;

Then:
if(y.length < x.length) { // shrink then set
    x.length = y.length;
    x.ptr    = y.ptr;
} else { // set then expand
    x.ptr    = y.ptr;
    x.length = y.length;
}

Some searching has found an answer to the issue:

 From http://www.kvraudio.com/forum/viewtopic.php?p=3935218:
tony tony chopper wrote:
Quote:
That was true on the 486 but since Pentium all writes/reads up to 64 bits
are guaranteed to be atomic if they are naturally aligned. Words to 2
bytes, DWORDs to 4, QWORDs to 8.
How could this be true? Depending on the compiler, a QWORD read/write will
involve 2 32bit accesses, how could this be atomic?

Indeed, the situation is that a single load/store involving aligned access (or cache line, but it's easier to just align), will be atomic. If you do a 16-byte load/store with SSE on aligned address, it's just as atomic (ok, this is from memory, I didn't double check it, as I don't immediately recall needing more than 32-bits moved from thread to thread on the fly). If the compiler splits a 16-byte load/store into 4 32-bit loads/stores, it's now four loads/stores and not atomic in any way. I think nollock is thinking on opcode level.

What this means: yes, you can move double (and 16-byte) data around with atomic loads and stores, but the part about avoiding it still holds, unless one knows for sure how a particular compiler compiles different things. Aligned 32-bit load/store at least will always be atomic (for the current generation of compilers and processors; if they ever remove 32-bit access from the ISA, then a 32-bit store will probably become load+modify+store on 64-bit data which is no longer atomic; I find that rather unlikely though).


So, my read on this is the given all 64-bit x86 CPUs support SSE2, atomic reads/writes can be done using 128-bit SSE memory ops on aligned data. So all that's needed is align(16) support in DMD (align(8) would also be appreciated)
January 27, 2010
Michel Fortin wrote:
> Le 2010-01-27 ? 18:25, Andrei Alexandrescu a ?crit :
> 
>> Consider a shared array:
>> 
>> shared int[] x;
>> 
>> What can be done with it? It's tricky; the array has two words worth of info and we can't assume 128 bits can be atomically written on a 64-bit machine.
> 
> Isn't that a great case for a lock pool controlled with a hash table? If you have dismissed that option, why?

I'm unsure whether that much magic is warranted, and I'm afraid that "arrays that suck" is the most likely result. Array primitive operations aren't quite meant to be individually synchronized, and in my experience higher-level APIs are almost always necessary.

Andrei
January 27, 2010
Robert Jacques wrote:
> I did write down a simple single cas/xchg routine below, but it suffers from tearing and inconsistencies if/when two or more writers assign to x.
> 
> Given:
> shared int[] x;
> shared int[] y;
> x = y;
> 
> Then:
> if(y.length < x.length) { // shrink then set
>    x.length = y.length;
>    x.ptr    = y.ptr;
> } else { // set then expand
>    x.ptr    = y.ptr;
>    x.length = y.length;
> }

Yah this is problematic because arrays go through states they aren't supposed to go through.

> So, my read on this is the given all 64-bit x86 CPUs support SSE2, atomic reads/writes can be done using 128-bit SSE memory ops on aligned data. So all that's needed is align(16) support in DMD (align(8) would also be appreciated)

I've read about 128-bit SSE operations. I'm not sure to what extent we could rely on them being there. Besides, we'd need to make sure all arrays (or at least all shared arrays) are properly aligned. I'm open to this, but would like to collect more data and ideas if we're to commit working on this.


Andrei

January 27, 2010
On Wed, 27 Jan 2010 21:33:42 -0500, Andrei Alexandrescu <andrei at erdani.com> wrote:
> Robert Jacques wrote:
>> So, my read on this is the given all 64-bit x86 CPUs support SSE2, atomic reads/writes can be done using 128-bit SSE memory ops on aligned data. So all that's needed is align(16) support in DMD (align(8) would also be appreciated)
>
> I've read about 128-bit SSE operations. I'm not sure to what extent we could rely on them being there.

While SSE2 is not standard on x86 platforms, it is on x86-64, so you can assume it's there.

January 27, 2010
On Wed, Jan 27, 2010 at 7:19 PM, Michel Fortin <michel.fortin at michelf.com>wrote:

> Le 2010-01-27 ? 18:25, Andrei Alexandrescu a ?crit :
>
> > Consider a shared array:
> >
> > shared int[] x;
> >
> > What can be done with it? It's tricky; the array has two words worth of
> info and we can't assume 128 bits can be atomically written on a 64-bit machine.
>
> Isn't that a great case for a lock pool controlled with a hash table? If you have dismissed that option, why?
>
>
I think the larger question is, what are the common use cases for shared? It seems dangerous to rely on per-field atomicity too much for ordinary tasks; clearly (int32)++ is supportable without heroic measures and (int[][]) ~= (int[]) probably is not, or is at least harder / trickier.  If you don't understand modern CPU architecture it's not self-evident what the rules are, which makes this topic pretty much restricted to people with advanced skills as these things are reckoned nowadays.  Is shared just to support building synchronization tools and special purpose things like rapid-access stacks, or is it a tool every developer is expected to use commonly?

It may be that our profile of who is supposed to be using it for what will answer a lot of questions about what we need to support, how fast it has to be, and what trade-offs are appropriate.  Or at least, I will be less confused (or more likely, confused for a different reason).

Kevin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/dmd-concurrency/attachments/20100127/60bcd6ea/attachment.htm>
January 28, 2010
b, with c being a fully usable implementation of b.  Reallocating an array on every assign is just as bad, if not worse, than taking a lock on an object.

Also, consider that assigning to the length of an array is bad news also, it's a function call.  I think straight shared arrays are a bad idea in general.

-Steve



----- Original Message ----
> From: Andrei Alexandrescu <andrei at erdani.com>
> 
> Consider a shared array:
> 
> shared int[] x;
> 
> What can be done with it? It's tricky; the array has two words worth of info and we can't assume 128 bits can be atomically written on a 64-bit machine.
> 
> So you can't assign the entire array. If we assume no assignment, then element access with bounds checking becomes possible. But then we need to provide a method for changing a shared array. There are a few possibilities:
> 
> (a) recommend an extra indirection
> 
> shared int[]* x;
> 
> A pointer can be reassigned, so whenever you want to mess with the array you reallocate it.
> 
> (b) recommend you put the array in a class
> 
> class A
> {
>     int[] x;
> }
> 
> Then, inside synchronized methods of shared(A) objects, the type of x changes from shared(int[]) to shared(int)[] and can be manipulated.
> 
> (c) recommend the yet-unwritten class std.concurrency.SharedArray.
> 
> Any thoughts are welcome.
> 
> 
> Andrei
> _______________________________________________
> dmd-concurrency mailing list
> dmd-concurrency at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency