January 14, 2010 [dmd-concurrency] shared arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to Michel Fortin | Michel Fortin wrote: > Le 2010-01-14 ? 22:01, Andrei Alexandrescu a ?crit : > >> Michel Fortin wrote: >>> Le 2010-01-14 ? 18:07, Andrei Alexandrescu a ?crit : >>>> Unfortunately, this(this) is called after the bits have been copied, so if there was any tearing, it already happened. I don't know how we can solve this. >>> Where is the copy-constructor when we need one? :-) >> I realized that fortunately that's not a problem: during memcpy, the target is not yet shared. Whew. So it all holds water. >> >> this(this) shared { ... } >> >> should work. > > But the source is shared. Couldn't it be updated while memcpy does its work, creating an incoherent state? Something like: memcpy copies half of it, context switch, other thread update first and last bytes, context switch, memcpy finishes its work. Result, last byte of the copy doesn't match first byte. I just meant that we're not constrained by the memcpy prior to this(this). I don't know whether we should disallow such copies. Probably we should, but I don't have a strong case. If we disable it, cowboys will be unhappy. If we enable it, others will. The problem is, if we disable even "this(this) shared" there's no chance to copy a shared object, even if you're willing to fix the inconsistencies. >>> Now, the really tricky question is should this one be copyable when >>> shared: >>> struct D { immutable string a; long b; } >>> Since 'a' is immutable, you don't need to copy it atomically, only >>> 'b' requires an atomic copy. So copying the struct "atomically" is >>> possible by copying 'a' normally and 'b' atomically. Now, is that >>> going to be supported? >> But long isn't atomically copyable. Did you mean int? > > Well, that depends on the architecture. I meant a type which is atomically copyable yes. Speaking of which: is it reasonable to assume that all 32-bit modern architectures have a 64-bit atomic assign? How about 64-bit atomic CAS? Andrei |
January 15, 2010 [dmd-concurrency] shared arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Thu, Jan 14, 2010 at 10:20 PM, Andrei Alexandrescu <andrei at erdani.com>wrote: > Michel Fortin wrote: > >> Le 2010-01-14 ? 22:01, Andrei Alexandrescu a ?crit : >> >> Michel Fortin wrote: >>> >>>> Le 2010-01-14 ? 18:07, Andrei Alexandrescu a ?crit : >>>> >>>>> Unfortunately, this(this) is called after the bits have been copied, so if there was any tearing, it already happened. I don't know how we can solve this. >>>>> >>>> Where is the copy-constructor when we need one? :-) >>>> >>> I realized that fortunately that's not a problem: during memcpy, the target is not yet shared. Whew. So it all holds water. >>> >>> this(this) shared { ... } >>> >>> should work. >>> >> >> But the source is shared. Couldn't it be updated while memcpy does its work, creating an incoherent state? Something like: memcpy copies half of it, context switch, other thread update first and last bytes, context switch, memcpy finishes its work. Result, last byte of the copy doesn't match first byte. >> > > I just meant that we're not constrained by the memcpy prior to this(this). > > I don't know whether we should disallow such copies. Probably we should, but I don't have a strong case. If we disable it, cowboys will be unhappy. If we enable it, others will. > > The problem is, if we disable even "this(this) shared" there's no chance to copy a shared object, even if you're willing to fix the inconsistencies. > > > Now, the really tricky question is should this one be copyable when >>>> shared: >>>> struct D { immutable string a; long b; } >>>> Since 'a' is immutable, you don't need to copy it atomically, only >>>> 'b' requires an atomic copy. So copying the struct "atomically" is >>>> possible by copying 'a' normally and 'b' atomically. Now, is that >>>> going to be supported? >>>> >>> But long isn't atomically copyable. Did you mean int? >>> >> >> Well, that depends on the architecture. I meant a type which is atomically copyable yes. >> > > Speaking of which: is it reasonable to assume that all 32-bit modern architectures have a 64-bit atomic assign? How about 64-bit atomic CAS? > > > Andrei > > _______________________________________________ > dmd-concurrency mailing list > dmd-concurrency at puremagic.com > http://lists.puremagic.com/mailman/listinfo/dmd-concurrency > Could you create a global array of something like 256 spinlocks, and then hash the pointer to what you are trying to modify to find the right lock? The shared operation (read or write) will consist of getting the lock, accessing the object in question, then releasing the lock. This requires two CAS like operations to lock any sized object rather than one CAS for four bytes. With 256 such locks, we should be able to keep a lot of threads busy even with the occasional accidental contention. The first tricky thing is how to deal with thread death during a shared operation. If you use the thread number as what is swapped into the spinlock, you can clean up when threads die. The second tricky thing is nested objects, e.g. how can you tell if the object you are fiddling with is locked at a higher level? I think you can do so by dividing memory into non-overlapping regions (say,. cache line sized, which I think is 64 bytes) and locking all of those that the object overlaps with instead of locking by object. This multiplies the number of CAS steps for larger objects, and you need to be tricky with the hashing of the pointer (you need a hash function such that a 256 byte object can't deadlock to itself due to colliding hashes) but I think it could still work. The final bit of the design is that if your architecture can do atomic ops on objects of 8 or 16 bytes, then you can just use a normal CAS, for that specific object. Since normal lock free designs tend to fall into the case where the compiler knows this integer Note that some parts of the above rely on knowing the object size. This might be tricky in some special cases but usually, I would think it is available to the compiler and when it isn't you could use the GC to find it as with dynamic array expansion. Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/dmd-concurrency/attachments/20100115/dc3f6d72/attachment.htm> |
January 15, 2010 [dmd-concurrency] shared arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Le 2010-01-14 ? 22:20, Andrei Alexandrescu a ?crit : > Michel Fortin wrote: >> But the source is shared. Couldn't it be updated while memcpy does its work, creating an incoherent state? Something like: memcpy copies half of it, context switch, other thread update first and last bytes, context switch, memcpy finishes its work. Result, last byte of the copy doesn't match first byte. > > I just meant that we're not constrained by the memcpy prior to this(this). > > I don't know whether we should disallow such copies. Probably we should, but I don't have a strong case. If we disable it, cowboys will be unhappy. If we enable it, others will. > > The problem is, if we disable even "this(this) shared" there's no chance to copy a shared object, even if you're willing to fix the inconsistencies. I think we should disable it. It's much easier to add the feature later when we have a proper solution than removing a feature later because it doesn't work right. Also, saying there is no chance to copy a shared object is a bit much. You can always define your own copy function, or a "copy constructor" (that you'd have to call explicitly): struct A { int a, b, c; this(const ref shared A) { ... } } shared A a1; A a2 = A(a1); > Speaking of which: is it reasonable to assume that all 32-bit modern architectures have a 64-bit atomic assign? How about 64-bit atomic CAS? No idea, but I doubt it. You could probably do atomic read and writes using the floating point registers (at the risk of raising a floating point exception), but that's a little contorted. -- Michel Fortin michel.fortin at michelf.com http://michelf.com/ |
January 15, 2010 [dmd-concurrency] shared arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kevin Bealer | Darnit, you're right. I dunno what I was thinking.
On Jan 14, 2010, at 1:21 PM, Kevin Bealer wrote:
> I'd suggest setting the high bit of the length instead. If I have this scenario:
>
> char[] foo = "abcd".dup;
> char[] bar = foo[1..2];
> Then bar is an array with an odd address.
>
> Kevin
> On Thu, Jan 14, 2010 at 10:16 AM, Sean Kelly <sean at invisibleduck.org> wrote:
> On Jan 14, 2010, at 4:10 AM, Steve Schveighoffer wrote:
>
> > Having implemented the array append patch to fix stomping, and reading the dmd-concurrency debate, I realized that what I implemented is no good for shared arrays.
> >
> > In fact, I wonder how shared arrays can support any array operations
>
> I sent an email about this a while back. In short, if we're going to allow array ops on shared arrays at all I think they'll have to use atomic ops to "lock" the array for the length of the update. Basically, set the 1 bit of the ptr field to indicate the array is locked. This gets tricky when more than one shared array is involved because the lock has to be acquired on each, and because different ops may try to lock arrays in different orders, if a spinlock acquire times out the code will have to release all the locks it's acquired, wait some random interval, and try again. Pretty complicated stuff if the goal is just to support shared array ops.
> _______________________________________________
> dmd-concurrency mailing list
> dmd-concurrency at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
>
> _______________________________________________
> dmd-concurrency mailing list
> dmd-concurrency at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
|
January 15, 2010 [dmd-concurrency] shared arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to Michel Fortin | ----- Original Message ---- > From: Michel Fortin <michel.fortin at michelf.com> > To: Discuss the concurrency model(s) for D <dmd-concurrency at puremagic.com> > Sent: Fri, January 15, 2010 8:20:48 AM > Subject: Re: [dmd-concurrency] shared arrays > > Le 2010-01-14 ? 22:20, Andrei Alexandrescu a ?crit : > > > Michel Fortin wrote: > >> But the source is shared. Couldn't it be updated while memcpy does its work, > creating an incoherent state? Something like: memcpy copies half of it, context switch, other thread update first and last bytes, context switch, memcpy finishes its work. Result, last byte of the copy doesn't match first byte. > > > > I just meant that we're not constrained by the memcpy prior to this(this). > > > > I don't know whether we should disallow such copies. Probably we should, but I > don't have a strong case. If we disable it, cowboys will be unhappy. If we enable it, others will. > > > > The problem is, if we disable even "this(this) shared" there's no chance to > copy a shared object, even if you're willing to fix the inconsistencies. > > I think we should disable it. It's much easier to add the feature later when we have a proper solution than removing a feature later because it doesn't work right. It depends on how you define "work right." I don't think shared is meant to fix all race issues. It can't possibly in fact, because you can simply decompose a shared copy to get the erroneous behaviour you want to prevent: struct T { int x; int y; } shared(T) t; void foo() { //T myt = t; // oops, compiler doesn't let me do this, I'll work around: T myt(t.x, t.y); } > > Also, saying there is no chance to copy a shared object is a bit much. You can always define your own copy function, or a "copy constructor" (that you'd have to call explicitly): > > struct A { > int a, b, c; > this(const ref shared A) { > ... > } > } > > shared A a1; > A a2 = A(a1); It's an artificial limitation -- the copy is still possible, just the syntax will suck. Now, with the new @disable property, you can force users to call your thread-safe implementation without resorting to this(this). Then you can create a truly contained thread-safe type. So I think it's ok to allow this(this) in cases where developers want to optimize or are sure the type is uniquely accessible (via an external lock). I think it's better for the compiler not to make assumptions about what the semantic meaning of a struct or array is. -Steve |
January 15, 2010 [dmd-concurrency] shared arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Jan 14, 2010, at 7:20 PM, Andrei Alexandrescu wrote: > > Speaking of which: is it reasonable to assume that all 32-bit modern architectures have a 64-bit atomic assign? How about 64-bit atomic CAS? For x86, I think this is reasonable today. I recall there being an issue a while back with some CPUs (I can't recall if they were from Intel or AMD) that were missing this instruction, but I think those days are gone. Here's a page about Windows and 8 byte CAS I ran across that seems relevant: http://www.geoffchappell.com/viewer.htm?doc=studies/windows/km/cpu/cx8.htm |
January 15, 2010 [dmd-concurrency] shared arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to Michel Fortin | On Fri, 15 Jan 2010 08:20:48 -0500, Michel Fortin <michel.fortin at michelf.com> wrote:
> Le 2010-01-14 ? 22:20, Andrei Alexandrescu a ?crit :
>
>> Michel Fortin wrote:
>>> But the source is shared. Couldn't it be updated while memcpy does its work, creating an incoherent state? Something like: memcpy copies half of it, context switch, other thread update first and last bytes, context switch, memcpy finishes its work. Result, last byte of the copy doesn't match first byte.
>>
>> I just meant that we're not constrained by the memcpy prior to this(this).
>>
>> I don't know whether we should disallow such copies. Probably we should, but I don't have a strong case. If we disable it, cowboys will be unhappy. If we enable it, others will.
>>
>> The problem is, if we disable even "this(this) shared" there's no chance to copy a shared object, even if you're willing to fix the inconsistencies.
>
> I think we should disable it. It's much easier to add the feature later when we have a proper solution than removing a feature later because it doesn't work right.
>
> Also, saying there is no chance to copy a shared object is a bit much. You can always define your own copy function, or a "copy constructor" (that you'd have to call explicitly):
>
> struct A {
> int a, b, c;
> this(const ref shared A) {
> ...
> }
> }
>
> shared A a1;
> A a2 = A(a1);
>
>
>> Speaking of which: is it reasonable to assume that all 32-bit modern architectures have a 64-bit atomic assign? How about 64-bit atomic CAS?
>
> No idea, but I doubt it.
>
> You could probably do atomic read and writes using the floating point registers (at the risk of raising a floating point exception), but that's a little contorted.
From wikipedia:
"Early AMD64 processors lacked the CMPXCHG16B instruction, which is an
extension of the CMPXCHG8B instruction present on most post-486
processors. Similar to CMPXCHG8B, CMPXCHG16B allows for atomic operations
on 128-bit double quadword (or oword) data types. This is useful for
parallel algorithms that use compare and swap on data larger than the size
of a pointer, common in lock-free and wait-free algorithms. Without
CMPXCHG16B one must use workarounds, such as a critical section or
alternative lock-free approaches"
So while 32-bit and 64-bit CPUs should have a 64-bit CAS, some 64-bit CPUs won't have a 128-bit CAS. And since so many things (like arrays) go from 64-bits to 128-bits when you switch from 32-bit DMD to 64-bit DMD, I think assuming the presence of a double width CAS to be erroneous in the language design.
|
January 15, 2010 [dmd-concurrency] shared arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to Robert Jacques | Robert Jacques wrote:
> From wikipedia:
> "Early AMD64 processors lacked the CMPXCHG16B instruction, which is an
> extension of the CMPXCHG8B instruction present on most post-486
> processors. Similar to CMPXCHG8B, CMPXCHG16B allows for atomic
> operations on 128-bit double quadword (or oword) data types. This is
> useful for parallel algorithms that use compare and swap on data larger
> than the size of a pointer, common in lock-free and wait-free
> algorithms. Without CMPXCHG16B one must use workarounds, such as a
> critical section or alternative lock-free approaches"
>
> So while 32-bit and 64-bit CPUs should have a 64-bit CAS, some 64-bit CPUs won't have a 128-bit CAS. And since so many things (like arrays) go from 64-bits to 128-bits when you switch from 32-bit DMD to 64-bit DMD, I think assuming the presence of a double width CAS to be erroneous in the language design.
Yah, I was asking whether we can assume long and double can be assumed atomically copyable. (But arrays won't.)
Andrei
|
January 15, 2010 [dmd-concurrency] shared arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | Okay, no big deal. So we set either ptr or len to size_t.max. Either should be safe as a sentinel.
On Jan 15, 2010, at 5:37 AM, Sean Kelly wrote:
> Darnit, you're right. I dunno what I was thinking.
>
> On Jan 14, 2010, at 1:21 PM, Kevin Bealer wrote:
>
>> I'd suggest setting the high bit of the length instead. If I have this scenario:
>>
>> char[] foo = "abcd".dup;
>> char[] bar = foo[1..2];
>> Then bar is an array with an odd address.
>>
>> Kevin
>> On Thu, Jan 14, 2010 at 10:16 AM, Sean Kelly <sean at invisibleduck.org> wrote:
>> On Jan 14, 2010, at 4:10 AM, Steve Schveighoffer wrote:
>>
>>> Having implemented the array append patch to fix stomping, and reading the dmd-concurrency debate, I realized that what I implemented is no good for shared arrays.
>>>
>>> In fact, I wonder how shared arrays can support any array operations
>>
>> I sent an email about this a while back. In short, if we're going to allow array ops on shared arrays at all I think they'll have to use atomic ops to "lock" the array for the length of the update. Basically, set the 1 bit of the ptr field to indicate the array is locked. This gets tricky when more than one shared array is involved because the lock has to be acquired on each, and because different ops may try to lock arrays in different orders, if a spinlock acquire times out the code will have to release all the locks it's acquired, wait some random interval, and try again. Pretty complicated stuff if the goal is just to support shared array ops.
>> _______________________________________________
>> dmd-concurrency mailing list
>> dmd-concurrency at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
>>
>> _______________________________________________
>> dmd-concurrency mailing list
>> dmd-concurrency at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
>
> _______________________________________________
> dmd-concurrency mailing list
> dmd-concurrency at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
|
January 15, 2010 [dmd-concurrency] shared arrays | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | You can't use anything in the array struct itself to lock the array data because the struct data is not necessarily shared.
-Steve
----- Original Message ----
> From: Sean Kelly <sean at invisibleduck.org>
> To: Discuss the concurrency model(s) for D <dmd-concurrency at puremagic.com>
> Sent: Fri, January 15, 2010 12:26:56 PM
> Subject: Re: [dmd-concurrency] shared arrays
>
> Okay, no big deal. So we set either ptr or len to size_t.max. Either should be safe as a sentinel.
>
> On Jan 15, 2010, at 5:37 AM, Sean Kelly wrote:
>
> > Darnit, you're right. I dunno what I was thinking.
> >
> > On Jan 14, 2010, at 1:21 PM, Kevin Bealer wrote:
> >
> >> I'd suggest setting the high bit of the length instead. If I have this
> scenario:
> >>
> >> char[] foo = "abcd".dup;
> >> char[] bar = foo[1..2];
> >> Then bar is an array with an odd address.
> >>
> >> Kevin
> >> On Thu, Jan 14, 2010 at 10:16 AM, Sean Kelly wrote:
> >> On Jan 14, 2010, at 4:10 AM, Steve Schveighoffer wrote:
> >>
> >>> Having implemented the array append patch to fix stomping, and reading the
> dmd-concurrency debate, I realized that what I implemented is no good for shared arrays.
> >>>
> >>> In fact, I wonder how shared arrays can support any array operations
> >>
> >> I sent an email about this a while back. In short, if we're going to allow
> array ops on shared arrays at all I think they'll have to use atomic ops to "lock" the array for the length of the update. Basically, set the 1 bit of the ptr field to indicate the array is locked. This gets tricky when more than one shared array is involved because the lock has to be acquired on each, and because different ops may try to lock arrays in different orders, if a spinlock acquire times out the code will have to release all the locks it's acquired, wait some random interval, and try again. Pretty complicated stuff if the goal is just to support shared array ops.
> >> _______________________________________________
> >> dmd-concurrency mailing list
> >> dmd-concurrency at puremagic.com
> >> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
> >>
> >> _______________________________________________
> >> dmd-concurrency mailing list
> >> dmd-concurrency at puremagic.com
> >> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
> >
> > _______________________________________________
> > dmd-concurrency mailing list
> > dmd-concurrency at puremagic.com
> > http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
>
> _______________________________________________
> dmd-concurrency mailing list
> dmd-concurrency at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
|
Copyright © 1999-2021 by the D Language Foundation