January 06, 2010
2010/1/6 Michel Fortin <michel.fortin at michelf.com>

> Le 2010-01-05 ? 18:31, ?lvaro Castro-Castilla a ?crit :
>
> > Is there a way that to make this eventually generalizable to any operation or groups of operations that should be performed atomically?
> >
> > something like:
> >
> > atomic {
> >  ++x;
> > }
> >
> > or even
> >
> > atomic ++x;
> > atomic(++)x;
>
> I proposed this earlier:
>
>        ++atomic(y)
>
> The idea is to create a function template "atomic" that would return a temporary struct. All operators defined for that struct would translate to atomic operations done on y, or memory barriers for read and writes.
>
>
>

Well, of course that makes way more sense if its a function template :). I was referring to a compiler



>  > that would create a variable doing the work of "y", but in case you
> > want to specify the operations.
> >
> > atomic {
> >  int y = x;
> >  y++;
> >  x = y;
> > }
>
> This syntax I don't like. While enclosing one operation in an atomic block is fine, here it looks as though all those actions are done in one atomic operation, which isn't at all the case. How hard is it to write this instead:
>
>        int y = atomic(x);
>        y++;
>        atomic(x) = y;
>
>

Just to clarify better what I meant:
atomic{} would be supported from the compiler for defining critical regions
as transactions. It would group the whole transaction "atomicIncrement". I
understand that generalizing this would be hard, or maybe not possible if
you are calling functions inside that code. However, allowing only calling
pure functions might be doable.

Anyway, I just pointed this thing out, because the whole debate heads to Message Passing as the main way to do concurrency in D, leaving traditional threads and especially Software Transactional Memory behind. But how easy/possible would be to implement as a library? (just a question) I guess threads will be kept, but STM is not being mentioned. My point is that there are situations where MP is not the best/easiest solution. For instance, for multi-agent simulations, you could think that MP would work well. However, when you want to visualize the simulation you need to duplicate the number of messages and send them to a central thread for processing the whole data and visualize it. With normal threading this would be the typical "dirty flag", stop the simulation and bring the data into the vis. thread; with STM you could just read a snapshot of the data and visualize it with no need for locks.

I would say:
classical threads -> message passing -> transactional
indicates a progression from more computational concept to more natural.
This might be highly questionable, but I think of it as abstractions from
the natural world. I could bore you a bit more with analogies and so on, but
I must go back to work. What is clear at this point, however, is that we are
not sure which one of those or other concurrency methods is going to prevail
or coexist. Does D want to tie with MP only? or create the tools for various
methods? At its least, I think it would be a good decision to keep MP with
some possibilities for traditional threads. I'm a great supporter of MP,
anyway (that's why I'm here).

Best Regards,

?lvaro Castro-Castilla
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/dmd-concurrency/attachments/20100106/d0156fba/attachment.htm>
January 06, 2010
I didn't invent that syntax, I remembered it from a paper and I was looking for it. So here it is:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.9270

Best regards,

?lvaro Castro-Castilla




El 6 de enero de 2010 13:52, ?lvaro Castro-Castilla < alvaro.castro.castilla at gmail.com> escribi?:

>
>
> 2010/1/6 Michel Fortin <michel.fortin at michelf.com>
>
> Le 2010-01-05 ? 18:31, ?lvaro Castro-Castilla a ?crit :
>>
>> > Is there a way that to make this eventually generalizable to any operation or groups of operations that should be performed atomically?
>> >
>> > something like:
>> >
>> > atomic {
>> >  ++x;
>> > }
>> >
>> > or even
>> >
>> > atomic ++x;
>> > atomic(++)x;
>>
>> I proposed this earlier:
>>
>>        ++atomic(y)
>>
>> The idea is to create a function template "atomic" that would return a temporary struct. All operators defined for that struct would translate to atomic operations done on y, or memory barriers for read and writes.
>>
>>
>>
>
> Well, of course that makes way more sense if its a function template :). I was referring to a compiler
>
>
>
>>  > that would create a variable doing the work of "y", but in case you
>> > want to specify the operations.
>> >
>> > atomic {
>> >  int y = x;
>> >  y++;
>> >  x = y;
>> > }
>>
>> This syntax I don't like. While enclosing one operation in an atomic block is fine, here it looks as though all those actions are done in one atomic operation, which isn't at all the case. How hard is it to write this instead:
>>
>>        int y = atomic(x);
>>        y++;
>>        atomic(x) = y;
>>
>>
>
> Just to clarify better what I meant:
> atomic{} would be supported from the compiler for defining critical regions
> as transactions. It would group the whole transaction "atomicIncrement". I
> understand that generalizing this would be hard, or maybe not possible if
> you are calling functions inside that code. However, allowing only calling
> pure functions might be doable.
>
> Anyway, I just pointed this thing out, because the whole debate heads to Message Passing as the main way to do concurrency in D, leaving traditional threads and especially Software Transactional Memory behind. But how easy/possible would be to implement as a library? (just a question) I guess threads will be kept, but STM is not being mentioned. My point is that there are situations where MP is not the best/easiest solution. For instance, for multi-agent simulations, you could think that MP would work well. However, when you want to visualize the simulation you need to duplicate the number of messages and send them to a central thread for processing the whole data and visualize it. With normal threading this would be the typical "dirty flag", stop the simulation and bring the data into the vis. thread; with STM you could just read a snapshot of the data and visualize it with no need for locks.
>
> I would say:
> classical threads -> message passing -> transactional
> indicates a progression from more computational concept to more natural.
> This might be highly questionable, but I think of it as abstractions from
> the natural world. I could bore you a bit more with analogies and so on, but
> I must go back to work. What is clear at this point, however, is that we are
> not sure which one of those or other concurrency methods is going to prevail
> or coexist. Does D want to tie with MP only? or create the tools for various
> methods? At its least, I think it would be a good decision to keep MP with
> some possibilities for traditional threads. I'm a great supporter of MP,
> anyway (that's why I'm here).
>
>
> Best Regards,
>
> ?lvaro Castro-Castilla
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/dmd-concurrency/attachments/20100106/e391b261/attachment.htm>
January 06, 2010
Le 2010-01-06 ? 7:52, ?lvaro Castro-Castilla a ?crit :

> 
>>> that would create a variable doing the work of "y", but in case you want to specify the operations.
>>> 
>>> atomic {
>>> int y = x;
>>> y++;
>>> x = y;
>>> }
>> 
>> This syntax I don't like. While enclosing one operation in an atomic block is fine, here it looks as though all those actions are done in one atomic operation, which isn't at all the case. How hard is it to write this instead:
>> 
>>       int y = atomic(x);
>>       y++;
>>       atomic(x) = y;
> 
> Just to clarify better what I meant:
> atomic{} would be supported from the compiler for defining critical regions
> as transactions. It would group the whole transaction "atomicIncrement". I
> understand that generalizing this would be hard, or maybe not possible if
> you are calling functions inside that code. However, allowing only calling
> pure functions might be doable.
> 
> Anyway, I just pointed this thing out, because the whole debate heads to Message Passing as the main way to do concurrency in D, leaving traditional threads and especially Software Transactional Memory behind. But how easy/possible would be to implement as a library? (just a question)

Ok, so I understand that you meant "atomic { int y = x; y++; x = y; }" to be some kind of transaction, where the whole becomes atomic. I'd be very happy to see transactional memory in D. I have experience with database transactions, but I'm not sure how familiar I am with the topic of software transactional memory.

Anyway, let's talk about it. I see two ways to implement transactions: with some sort of copy on write, where everything you touch in the transaction is copied: when successful you commit and it replaces existing data with your modified copy, on failure (when someone concurrently changed something used in the transaction) you discard the result and start again. Another is to acquire locks on everything you touch: when successful you commit and release the lock, and on failure (most likely a deadlock) you undo every changes and start again. You can even mix these two methods together.

I don't think the language should impose any particular way of doing transactions. In fact, I'd even say any standard transaction machinery in the language should integrate nicely with external transactional engines (databases, mostly).

So here is a simple but still verbose way to implement a transaction:

	void doTransaction(void delegate() transaction) {
		bool done = false;
		while (!done) {
			try {
				transaction();
				done = true;
			}
			catch (ConcurrentAccessException e)
				done = false;
		}
	}

	void myTransaction() pure synchronized {
		auto saved_x = x;
		x = createNew();
		scope(failure) x = saved_x;

		// should detect deadlocks and throw a ConcurrentAccessException
		syncrhonized (z) {
			void delegate() z_rollback = z.add(x);
			scope(failure) z_rollback();

			// rest of transaction goes here.
		}
	}

	doTransaction(&myTransaction);

Here, if synchronization fails for z in myTransaction because of a deadlock, a deadlock exception is thrown, x is reverted to its previous state, and the transaction is attempted again. Note that it's important that all functions in a transaction be pure: they should only affect the argument you pass to them, otherwise you can't really restore their state.

If the compiler was to help with something, I'd say it should define a "transactional" attribute for methods, which would rollback everything upon failure. Basically, that's the strong exception guaranty: if an exception occurs, everything is left in the same state as it was before. Additionally, you need to be able to rollback the changes later after the function call, so a transactional function would needs to give a rollback delegate to its caller (like z.add() does above).

That'd be quite interesting. Unfortunately, I doubt there will be enough time for anything like that.


> I guess
> threads will be kept, but STM is not being mentioned. My point is that there
> are situations where MP is not the best/easiest solution. For instance, for
> multi-agent simulations, you could think that MP would work well. However,
> when you want to visualize the simulation you need to duplicate the number
> of messages and send them to a central thread for processing the whole data
> and visualize it. With normal threading this would be the typical "dirty
> flag", stop the simulation and bring the data into the vis. thread; with STM
> you could just read a snapshot of the data and visualize it with no need for
> locks.

Your comparison of normal threading and STM is interesting. You say: "with STM you could just read a snapshot of the data". This implies that an immutable snapshot of the data exists, which implies some sort of copy-on-write where the simulation copies the data every time it changes. It won't stop the simulation in order to access the data, but it'll likely make it perform slower with all those allocations. It's really a tradeoff between data immutability and mutability. To access mutable data, you need a lock; not so for immutable data.

So while I agree that supporting transactions would be great, I'm not sure how great it'd be for your use case. Especially since if you allocate a lot of memory for immutable data, the GC will kick in more often and lock the world anyway.

Message passing, where you could subscribe and unsubscribe to various parts of the simulation data, could be a good solution because then the simulation thread has to create copies of only what has been requested.


-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



January 06, 2010
Michel Fortin wrote:
> Ok, so I understand that you meant "atomic { int y = x; y++; x = y; }" to be some kind of transaction, where the whole becomes atomic. I'd be very happy to see transactional memory in D. I have experience with database transactions, but I'm not sure how familiar I am with the topic of software transactional memory.
> 
> Anyway, let's talk about it.

Transactional memory is simplifying a great deal of multithreaded programming, and there's little to not like about it. However, as far as I know we don't have any STM expert on board who's also willing to implement it for D, which is overriding the debate on whether STM will be a prevailing mechanism of doing threaded work. So as of the time being we don't plan to add STM capabilities to D.

If one or some on this list are willing to provide a soup-to-nuts design and implementation of STM for D in a matter of weeks, please speak up. Otherwise or until then, I encourage the interested people to discuss it off-list. Thanks.


Andrei

January 06, 2010
> Your comparison of normal threading and STM is interesting. You say: "with STM you could just read a snapshot of the data". This implies that an immutable snapshot of the data exists, which implies some sort of copy-on-write where the simulation copies the data every time it changes. It won't stop the simulation in order to access the data, but it'll likely make it perform slower with all those allocations. It's really a tradeoff between data immutability and mutability. To access mutable data, you need a lock; not so for immutable data.
>
> So while I agree that supporting transactions would be great, I'm not sure how great it'd be for your use case. Especially since if you allocate a lot of memory for immutable data, the GC will kick in more often and lock the world anyway.
>
> Message passing, where you could subscribe and unsubscribe to various parts of the simulation data, could be a good solution because then the simulation thread has to create copies of only what has been requested.
>
>
Well, in case you don't care that all the snapshot shows the same time step for all agents,  there won't be a big performance impact.

On the other hand, I think a nice combination to look at is Clojure's concurrency model:

"While Vars <http://clojure.org/Vars> ensure safe use of mutable storage locations via thread *isolation*, transactional references (Refs) ensure safe *shared* use of mutable storage locations via a software transactional memory <http://en.wikipedia.org/wiki/Software_transactional_memory> (STM) system. Refs are bound to a single storage location for their lifetime, and only allow mutation of that location to occur within a transaction."

>From http://clojure.org/refs

However, I believe a similar effect can be achieve with message passing.

[At this point I received Andrei's post]. It's right to concentrate on message passing then. Just wanted to bring it into discussion, and know some points of view.

Best Regards,

?lvaro Castro-Castilla
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/dmd-concurrency/attachments/20100106/24dab9e6/attachment.htm>
January 06, 2010
On Jan 6, 2010, at 7:35 AM, Andrei Alexandrescu wrote:

> Michel Fortin wrote:
>> Ok, so I understand that you meant "atomic { int y = x; y++; x = y;
>> }" to be some kind of transaction, where the whole becomes atomic.
>> I'd be very happy to see transactional memory in D. I have experience
>> with database transactions, but I'm not sure how familiar I am with
>> the topic of software transactional memory.
>> Anyway, let's talk about it.
> 
> Transactional memory is simplifying a great deal of multithreaded programming, and there's little to not like about it. However, as far as I know we don't have any STM expert on board who's also willing to implement it for D, which is overriding the debate on whether STM will be a prevailing mechanism of doing threaded work. So as of the time being we don't plan to add STM capabilities to D.
> 
> If one or some on this list are willing to provide a soup-to-nuts design and implementation of STM for D in a matter of weeks, please speak up. Otherwise or until then, I encourage the interested people to discuss it off-list. Thanks.

So how about a quick summary to figure out where we are?  From memory, I think it's pretty well established that:

1. shared denotes visibility.  A global shared variable is visible from all threads, an unlabeled variable is thread-local.
2. Attaching the shared property to a class instance affects method visibility.  shared and synchronized methods are visible, unlabeled ones are not.
3. Attaching the shared property to a value makes that value effectively atomic.  All access and mutation is accomplished via library calls.

I think there was some concern about the shared property of a class instance automatically extending to all of its fields.  Were there other concerns as well?  I know that Jason had mentioned issues with non-statically verifiable shared semantics in general, but that's another ball of wax entirely.
January 06, 2010
Le 2010-01-06 ? 11:42, Sean Kelly a ?crit :

> So how about a quick summary to figure out where we are?  From memory, I think it's pretty well established that:
> 
> 1. shared denotes visibility.  A global shared variable is visible from all threads, an unlabeled variable is thread-local.
> 2. Attaching the shared property to a class instance affects method visibility.  shared and synchronized methods are visible, unlabeled ones are not.
> 3. Attaching the shared property to a value makes that value effectively atomic.  All access and mutation is accomplished via library calls.
> 
> I think there was some concern about the shared property of a class instance automatically extending to all of its fields.  Were there other concerns as well?  I know that Jason had mentioned issues with non-statically verifiable shared semantics in general, but that's another ball of wax entirely.

Given all the examples I've seen, I'm a little concerned that synchronization will only work with class members. What about structs and shared globals? Is there going to be a safe way to keep them synchronized? Or is that irrelevant?

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



January 06, 2010
Michel Fortin wrote:
> Given all the examples I've seen, I'm a little concerned that synchronization will only work with class members. What about structs and shared globals? Is there going to be a safe way to keep them synchronized? Or is that irrelevant?

I think it's fair to impose that if you want lock-based synchronization, you must use classes. Mutexes protect data, and currently our only means to

A remote second is to instantiate and use your own Mutex objects and cowboy things around by using casts.


Andrei
January 06, 2010
On Jan 6, 2010, at 9:54 AM, Michel Fortin wrote:

> Le 2010-01-06 ? 11:42, Sean Kelly a ?crit :
> 
>> So how about a quick summary to figure out where we are?  From memory, I think it's pretty well established that:
>> 
>> 1. shared denotes visibility.  A global shared variable is visible from all threads, an unlabeled variable is thread-local.
>> 2. Attaching the shared property to a class instance affects method visibility.  shared and synchronized methods are visible, unlabeled ones are not.
>> 3. Attaching the shared property to a value makes that value effectively atomic.  All access and mutation is accomplished via library calls.
>> 
>> I think there was some concern about the shared property of a class instance automatically extending to all of its fields.  Were there other concerns as well?  I know that Jason had mentioned issues with non-statically verifiable shared semantics in general, but that's another ball of wax entirely.
> 
> Given all the examples I've seen, I'm a little concerned that synchronization will only work with class members. What about structs and shared globals? Is there going to be a safe way to keep them synchronized? Or is that irrelevant?

I think we've already discussed how language-defined value types will work in the discussion regarding class fields, but there are probably a few things worth mentioning:

shared int x; // 1
shared MyClass y; // 2
shared(MyClass) z; // 3

In case 1, the value is visible across all threads and must be operated on using the atomic() routines.  In case 3, the reference is local but the class instance is shared, so only shared or synchronized methods may be called, etc.  In case 2, the reference is shared and the instance is shared as well, so only shared and synchronized methods may be called on the instance, but obtaining the reference must be done via the atomic() routines.  ie. "atomic(y).doSomething()."  In this case it may be worth having the compiler make the atomic() bit implicit if the only thing being done is calling a member function, but that raises questions about what happens with operator overloading, etc.  I'm inclined to say that the atomic() must be present, since it's consistent with how everything else works and better communicates what's actually happening.

For structs I was inclined to say that the compiler could inject a monitor pointer like it does with nested structs, but I think the problems outweigh the benefits--memory layout is affected, things get weird with copying, etc.  So I think Andrei has the right of it in that synchronization is left up to the user.  And since we have a Mutex class in druntime, hopefully this won't be too onerous.

For built-in fancy types, since associative arrays are referenced via a plain old pointer just like classes I think they can work essentially the same way.  A "shared int[string]" can lock and unlock an internal mutex on accesses, and modifying the reference itself could be done using atomic().

For delegates, I'm not sure what's done today.  I guess there are two cases:

class MyClass {
    void fnA() synchronized {}
    void fnB() shared {}
}

auto x = new MyClass;
auto y = &x.fnA;
auto z = &x.fnB;

I'd guess that the type of y is "delegate() synchronized" and the type of z is "delegate() shared"?  Assuming this is right, then it seems like the mutex would be locked when y is called, and the appropriate thing would happen when z is called as well.

That leaves arrays, which are a tad more complicated because they have both a length and a pointer field.  Let's see:

shared int[] x;
x ~= 42; // 1
x = x[1 .. 2]; // 2
x = [1, 2] ~ x; // 3

Case 1 may require a reallocation with the append, so both the pointer and length will be modified as well as the assignment of 42.  I imagine this could be done lock-free, but it won't be trivial.  Case 2 means both a pointer and length reassignment as well, and case 3 is just a disaster.  The easiest thing would just be to synchronize all shared array operations on the same mutex and document that weird things could still occur in complex expressions.  This would be abysmal in terms of concurrency, but it's about as safe as such things can be.  Hm... let's see if immutable arrays are any better:

shared string x;
shared string y;
x = x ~ 'a'; // 1
x = 'a' ~ x; // 2
x = y;

In cases 1 and 2, all we really need to worry about is the assignment to x.  On recent x86 we should be able to use 8 byte CAS to do it in one shot, but on other architectures the algorithm seems more complicated.  I'd need to poke around to see if there's a published algorithm for updating two values together, but here's a shot at it that uses a special value assigned to the array pointer as a sentinel:

struct Array { size_t len; byte* ptr; }

/**
 * string is a value type so "shared string" is effectively "shared(Array)"
 * y is a new array created from the expression "'a' ~ x" so there's no contention to worry about,
 * perhaps this makes it safe to eliminate the "shared" from the y parameter?  let's assume so,
 * since it makes the code easier to write.
 */
void assignUnique( ref shared(Array) x, ref Array y ) {
    byte* p_old;
    // use p_old.max as a sentinel to indicate that another atomic assignment
    // is in progress, since we'll likely never see this value in practice (?)
    do {
         p_old = atomicLoad( x.ptr );
    } while( p_old == p_old.max || !atomicStoreIf( x.ptr, p_old.max, p_old ) );
    atomicStore( x.len, y.len );
    atomicStore( x.ptr, y.ptr );
}

In case 3 you have to worry about a consistent read from y as well as a consistent write to x.  A double-wide CAS comes to the rescue again for both the read and write, but without it... I dunno.  I guess you could use the same sentinel approach for both the read of y and the write to x, but again, maybe there's something better.  I really don't do much "real" lock-free programming so I'm not up on all the latest algorithms, I just grok the mechanics.
January 06, 2010
On Jan 6, 2010, at 11:52 AM, Sean Kelly wrote:
> 
> In cases 1 and 2, all we really need to worry about is the assignment to x.  On recent x86 we should be able to use 8 byte CAS to do it in one shot, but on other architectures the algorithm seems more complicated.  I'd need to poke around to see if there's a published algorithm for updating two values together, but here's a shot at it that uses a special value assigned to the array pointer as a sentinel

I just thought of a better approach.  Simply set the '1' bit of the pointer to indicate that an update is taking place, since it really will never be set under normal circumstances.