January 05, 2010
(er...)

primatives -> primitives
primitives -> intrinsics
Kevin
On Tue, Jan 5, 2010 at 5:30 PM, Kevin Bealer <kevinbealer at gmail.com> wrote:

> That's what I was guessing.  But then I thought maybe there might be some magic that either inserted a lock around the block of instructions from the first read to the last write, or else that detected these primatives and emitted a warning.  Not that I'm suggesting that approach, as a general solution is probably impractical / full of landmines.  And I guess getting lock free systems right is normally considered an "experts only" kind of thing, except maybe the one-pointer versions like linked list.
>
> Kevin
>
> On Tue, Jan 5, 2010 at 1:21 PM, Sean Kelly <sean at invisibleduck.org> wrote:
>
>>  On Jan 5, 2010, at 10:14 AM, Kevin Bealer wrote:
>>
>> > On Mon, Jan 4, 2010 at 6:58 PM, Andrei Alexandrescu <andrei at erdani.com>
>> wrote:
>> >
>> > shared int x;
>> > ...
>> > ++x;
>> >
>> > The putative user notices that that doesn't work, so she's like, meh,
>> I'll do this then:
>> >
>> > int y = x;
>> > ++y;
>> > x = y;
>> >
>> > And the user remains with this impression that the D compiler is a bit
>> dumb. Of course that doesn't avoid the race condition though. If the user would have to call atomicIncrement(x) that would be clearly an improvement, but even this would be an improvement:
>> >
>> > int y = sharedRead(x);
>> > ++y;
>> > sharedWrite(y, x);
>> >
>> > When writing such code the user inevitably hits on the documentation for
>> the two intrinsics, which clearly define their guarantees: only the sequence of sharedRead and sharedWrite is preserved. At that point, inspecting the code and understanding how it works is improved.
>> >
>> > ...
>> > Andrei
>> >
>> > Just to clarify, this is still broken, right?  I mean that if two users
>> are calling a method that does this,
>> > the value of x will only get incremented once if their calls to
>> sharedRead/sharedWrite are interleaved?
>>
>> Yes.  You either need to call a special hardware instruction (LOCK INC on
>> x86) or use a CAS loop:
>>
>> void increment( ref shared int x )
>> {
>>    int t;
>>    do {
>>        t = atomicLoad( x );
>>    } while( !atomicStoreIf( x, t + 1, t ) );  // atomicStoreIf is a CAS
>>  }
>>
>> _______________________________________________
>> dmd-concurrency mailing list
>> dmd-concurrency at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/dmd-concurrency/attachments/20100105/82e0410f/attachment.htm>
January 06, 2010
2010/1/5, Sean Kelly <sean at invisibleduck.org>:
> On Jan 4, 2010, at 3:58 PM, Andrei Alexandrescu wrote:
>
>> Of course that doesn't avoid the race condition though. If the user would have to call atomicIncrement(x) that would be clearly an improvement, but even this would be an improvement:
>>
>> int y = sharedRead(x);
>> ++y;
>> sharedWrite(y, x);
>>
>> When writing such code the user inevitably hits on the documentation for the two intrinsics, which clearly define their guarantees: only the sequence of sharedRead and sharedWrite is preserved. At that point, inspecting the code and understanding how it works is improved.

Is there a way that to make this eventually generalizable to any operation or groups of operations that should be performed atomically?

something like:

atomic {
  ++x;
}

or even

atomic ++x;
atomic(++)x;

that would create a variable doing the work of "y", but in case you want to specify the operations.

atomic {
  int y = x;
  y++;
  x = y;
}

"y++" won't be performed as an atomic operation because the compiler will detect is a variable not shared. So it would be about the same thing as the previous.

This could lead to something usable as a basic STM. Is it too hard to implement? STM has already been discarded from D? I know also that you want to avoid new reserved keywords (what about scope(atomic) :P).

Best Regards,

?lvaro Castro-Castilla
January 05, 2010
Kevin Bealer wrote:
>  And I guess getting lock free systems right is normally
> considered an "experts only" kind of thing, except maybe the one-pointer
> versions like linked list.

Even lock-free slists are heinously difficult to define correctly...

Andrei
January 05, 2010
I read through a few papers on hazard pointers, and I still only know enough gun safety to know that the gun is always loaded.  But I can't help but ask...

What if this code:

void methodName(ref shared int a)
{
    do_stuff();

    int a1 = borrowShared(a);

     auto x = computeX(a1);
    auto y = computeY(a1);
    auto z = computeZ(a1);
    int xyz = combine(x, y, z);

    returnShared(a, xyz);

    more_stuff();
}

Was converted to this (borrowing Sean's example):

void methodName( ref shared int a )
{
   do_stuff();

   int temp1;
   do {
       int a1 = temp1 = a;

        auto x = computeX(a1);
       auto y = computeY(a1);
       auto z = computeZ(a1);
       a1 = combine(x, y, z);

   } while( !atomicStoreIf( a, a1, temp1 ) );  // atomicStoreIf is a CAS
   more_stuff();
}
It would only work for single variable modifications of course, and if
computeX,Y,Z is slow, it is probably pointless or a live-lock.  The user
would need to understand that this is really a loop, so maybe a different
syntax is needed:

 void methodName(ref shared int a)
{
    do_stuff();

    bool quit = false;

    forshare(a; a1; ! quit) {
         auto x = computeX(a1);
        auto y = computeY(a1);
        auto z = computeZ(a1);
         quit = /* iteration count test or something */;
        a1 = combine(x, y, z);
    }

    more_stuff();
}

I included the quit concept as an optional feature to remind users that this might live-lock; they could of course say "if (...) break" at any time as well.

(I'm sure there are some critics who will say that putting easy to reach lock-free algorithm primitives next to simple to use language features is like selling rubber snakes and real snakes from the same bin.)

Kevin
On Tue, Jan 5, 2010 at 6:58 PM, Andrei Alexandrescu <andrei at erdani.com>wrote:

> Kevin Bealer wrote:
>
>>  And I guess getting lock free systems right is normally
>> considered an "experts only" kind of thing, except maybe the one-pointer
>> versions like linked list.
>>
>
> Even lock-free slists are heinously difficult to define correctly...
>
> Andrei
>
> _______________________________________________
> dmd-concurrency mailing list
> dmd-concurrency at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/dmd-concurrency
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/dmd-concurrency/attachments/20100105/6d818d05/attachment-0001.htm>
January 05, 2010
Le 2010-01-05 ? 18:31, ?lvaro Castro-Castilla a ?crit :

> Is there a way that to make this eventually generalizable to any operation or groups of operations that should be performed atomically?
> 
> something like:
> 
> atomic {
>  ++x;
> }
> 
> or even
> 
> atomic ++x;
> atomic(++)x;

I proposed this earlier:

	++atomic(y)

The idea is to create a function template "atomic" that would return a temporary struct. All operators defined for that struct would translate to atomic operations done on y, or memory barriers for read and writes.


> that would create a variable doing the work of "y", but in case you want to specify the operations.
> 
> atomic {
>  int y = x;
>  y++;
>  x = y;
> }

This syntax I don't like. While enclosing one operation in an atomic block is fine, here it looks as though all those actions are done in one atomic operation, which isn't at all the case. How hard is it to write this instead:

	int y = atomic(x);
	y++;
	atomic(x) = y;

Here at least it's obvious that the whole isn't an atomic operation. And somehow it also express that something could happen to x between the two assignments.


> "y++" won't be performed as an atomic operation because the compiler will detect is a variable not shared. So it would be about the same thing as the previous.

atomic(y) could be overloaded so that it doesn't do expensive atomic operations when y isn't shared.


> This could lead to something usable as a basic STM. Is it too hard to implement? STM has already been discarded from D? I know also that you want to avoid new reserved keywords (what about scope(atomic) :P).

I think my proposal is completely implementable right now.

What I'd like from the compiler though is that it disallows all operations on shared variables. This way, you'd be forced to use atomic() with shared variables, which would make it easier to read and understand code using them.

For cases where you don't really want atomic operations, it should be explicit too. Perhaps we could define another function in the same style for reads and and writes with no barrier:

	atomic(y) = 10; // with write barrier
	latent(y) = 20; // no write barrier, subject to propagation latency


-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



January 05, 2010
On Jan 4, 2010, at 5:16 PM, Jason House wrote:

> On Jan 4, 2010, at 8:00 PM, Andrei Alexandrescu <andrei at erdani.com> wrote:
>> 
>> Same here. All - please advise.
> 
> I like this style as long as the compiler isn't completely brain dead and incorrectly force sme to put sharedRead and sharedWrite throughout my code base. I actually have a tweaked set of Tango's atomics in my D2 code base. We can't forget CAS and atomic increment/decrement for shared data.

I created the Tango atomics module and would love to roll it into Druntime in some form.  At the very least I agree that we need a CAS operation.

>>> The only catch with the approach above (and you've mentioned this before) is:
>>>   class A {
>>>       void fnA() shared { x = 5; }
>>>       void fnB() synchronized { x = 6; }
>>>       int x;
>>>   }
>>> I had thought that explicitly labeling variables as shared would sidestep this by requiring ops on the vars to always be atomic.  An alternative (as you've said before) would be to not allow shared and synchronized methods to both be used in a class, but it's pretty common that I'll want to do something like this:
>>>   class A {
>>>       void fnA() synchronized { sharedWrite( flag, true ); }
>>>       void fnB() shared { return sharedRead( flag ); }
>>>       shared flag;
>>>   }
>>> Maybe my explicitly declaring flag as shared somehow provides an exemption?  Or did you have another idea for a way around this?
>> 
>> Hmmm... if a field is shared in a non-shared object, it means you're not using object's own lock to control access to that field. (You can e.g. assume that other threads have the address of that field.) So that field, inside a synchronized method, will not benefit of the tail-shared exemption and will be subjected to all limitations specific to e.g. a shared global bool.
> 
> 
> It's worse than that. Notice how in Sean's code that he didn't replace the sharedWrite call with a more conventional assignment. One really wants to keep full protection with (most) lock free variables. This means always using sharedRead and sharedWrite with them. The compiler should facilitate this...

Yeah, if a shared variable is ever accessed outside a synchronized method then all accesses to it must be made atomic.  This is why I'd previously thought that shared fields would be explicitly labeled as such, and the non-labeled fields could only be accessed in a synchronized method.  I'm personally fine with using sharedRead and sharedWrite in this instance because only a handful of variables would ever be labeled as shared.  Things are obviously quite different if the shared attribute of a class instance extends to all of its fields, and the compiler is expected to optimize away what it can though.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/dmd-concurrency/attachments/20100105/673be981/attachment.htm>
January 05, 2010
On Jan 5, 2010, at 4:37 PM, Kevin Bealer wrote:

> I read through a few papers on hazard pointers, and I still only know enough gun safety to know that the gun is always loaded.  But I can't help but ask...
> 
> What if this code:
> 
> void methodName(ref shared int a)
> {
>     do_stuff();
> 
>     int a1 = borrowShared(a);
> 
>     auto x = computeX(a1);
>     auto y = computeY(a1);
>     auto z = computeZ(a1);
>     int xyz = combine(x, y, z);
> 
>     returnShared(a, xyz);
> 
>     more_stuff();
> }

Strangely enough, this is basically how LL/SC (Load-Linked/Store-Conditional) works.  It's the alternative to CAS that you'll find in some hardware (like SPARC), and I crafted atomicStoreIf() to work as a wrapper for either.

> Was converted to this (borrowing Sean's example):
> 
> void methodName( ref shared int a )
> {
>    do_stuff();
> 
>    int temp1;
>    do {
>        int a1 = temp1 = a;
> 
>        auto x = computeX(a1);
>        auto y = computeY(a1);
>        auto z = computeZ(a1);
>        a1 = combine(x, y, z);
> 
>    } while( !atomicStoreIf( a, a1, temp1 ) );  // atomicStoreIf is a CAS
>    more_stuff();
> }
> It would only work for single variable modifications of course, and if computeX,Y,Z is slow, it is probably pointless or a live-lock.

You could use a secondary value as a primitive lock, if you wanted to modify more than one variable.  But at that point you may as well just use a mutex in most cases.

> The user would need to understand that this is really a loop, so maybe a different syntax is needed:
> 
> void methodName(ref shared int a)
> {
>     do_stuff();
> 
>     bool quit = false;
> 
>     forshare(a; a1; ! quit) {
>         auto x = computeX(a1);
>         auto y = computeY(a1);
>         auto z = computeZ(a1);
>         quit = /* iteration count test or something */;
>         a1 = combine(x, y, z);
>     }
> 
>     more_stuff();
> }
> 
> I included the quit concept as an optional feature to remind users that this might live-lock; they could of course say "if (...) break" at any time as well.
> 
> (I'm sure there are some critics who will say that putting easy to reach lock-free algorithm primitives next to simple to use language features is like selling rubber snakes and real snakes from the same bin.)

From where current research is going, I think it's more likely that this form of lock-free programming will largely disappear before too terribly long.  So I'd be disinclined to add language features specifically intended to make it more usable.  The nice thing about "shared" is that its real purpose is to denote visibility, so the lock-free stuff is added almost as a side-effect.  If things were to change in the industry later on, the language may not have to change too much to adapt.
January 05, 2010
On Jan 5, 2010, at 4:57 PM, Michel Fortin wrote:

> Le 2010-01-05 ? 18:31, ?lvaro Castro-Castilla a ?crit :
> 
>> Is there a way that to make this eventually generalizable to any operation or groups of operations that should be performed atomically?
>> 
>> something like:
>> 
>> atomic {
>> ++x;
>> }
>> 
>> or even
>> 
>> atomic ++x;
>> atomic(++)x;
> 
> I proposed this earlier:
> 
> 	++atomic(y)
> 
> The idea is to create a function template "atomic" that would return a temporary struct. All operators defined for that struct would translate to atomic operations done on y, or memory barriers for read and writes.
> 
>> This could lead to something usable as a basic STM. Is it too hard to implement? STM has already been discarded from D? I know also that you want to avoid new reserved keywords (what about scope(atomic) :P).
> 
> I think my proposal is completely implementable right now.

It is.  The only tricky thing about shared variables which may or may not be trickier with a struct is that copying them is kind of weird.  Copying a shared value type makes the result non-shared.  I guess the struct would have a reference to the shared value and be non-copyable for correctness.

> What I'd like from the compiler though is that it disallows all operations on shared variables. This way, you'd be forced to use atomic() with shared variables, which would make it easier to read and understand code using them.

Same here.

> For cases where you don't really want atomic operations, it should be explicit too. Perhaps we could define another function in the same style for reads and and writes with no barrier:
> 
> 	atomic(y) = 10; // with write barrier
> 	latent(y) = 20; // no write barrier, subject to propagation latency

In the atomics lib I wrote, it's possible to specify the memory barrier used for each operation.  I'm not sure how this would work into the operator overloading approach though.
January 05, 2010
Michel Fortin wrote:
> Le 2010-01-05 ? 18:31, ?lvaro Castro-Castilla a ?crit :
> 
>> Is there a way that to make this eventually generalizable to any operation or groups of operations that should be performed atomically?
>> 
>> something like:
>> 
>> atomic { ++x; }
>> 
>> or even
>> 
>> atomic ++x; atomic(++)x;
> 
> I proposed this earlier:
> 
> ++atomic(y)
> 
> The idea is to create a function template "atomic" that would return a temporary struct. All operators defined for that struct would translate to atomic operations done on y, or memory barriers for read and writes.

I agree with Sean's assessment. About defining atomic(y) and the associated template Atomic(T), I thought some more and it looks like a worthwhile library addition.

Andrei
January 05, 2010
Le 2010-01-05 ? 20:12, Sean Kelly a ?crit :

> Michel Fortin wrote:
> 
>> I think my proposal is completely implementable right now.

> 
> It is.  The only tricky thing about shared variables which may or may not be trickier with a struct is that copying them is kind of weird.  Copying a shared value type makes the result non-shared.  I guess the struct would have a reference to the shared value and be non-copyable for correctness.


Yes, the struct would hold a reference to the shared value and be non-copyable. It should not be possible to store the temporary struct into a variable; you should be forced to write atomic() once for each atomic operation:

	atomic(y) = 3 + atomic(y); // two distinct atomic operations

	atomic(y) += 3; // this one is actually atomic

	auto z = atomic(y);
	z = 3 + z; // misleading, z should not be allowed to exist


>> For cases where you don't really want atomic operations, it should be explicit too. Perhaps we could define another function in the same style for reads and and writes with no barrier:
>> 
>> 	atomic(y) = 10; // with write barrier
>> 	latent(y) = 20; // no write barrier, subject to propagation latency
> 
> In the atomics lib I wrote, it's possible to specify the memory barrier used for each operation. I'm not sure how this would work into the operator overloading approach though.

Perhaps this way:

	atomic(y) = 10;
	atomic!rel(y) = 10;
	atomic!seq(y) = 10;
	atomic!acq(y) = 10;
	atomic!ssb(y) = 10;
	...
	latent(y) = 10;

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/