Thread overview
Some missing things in the current threading implementation
Sep 12, 2010
Sönke Ludwig
Sep 12, 2010
dsimcha
Sep 15, 2010
Sönke Ludwig
Sep 12, 2010
Michel Fortin
Sep 15, 2010
Sönke Ludwig
Sep 15, 2010
dsimcha
Sep 12, 2010
Robert Jacques
Sep 15, 2010
Sönke Ludwig
September 12, 2010
Recently I thought it would be a good idea to try out the new concurrency system once again. Some time back, when 'shared' was still new, I already tried it several times but since it was completely unusable I gave up on it at that time (and as it seems, many others also did this).

Now, however, after TDPL has been released and there is some documentation + std.concurrency, the system should be in a state where it is actually useful and only some bugs should be there to fix - which does not include inherent system changes. The reality is quite different once you step anywhere beside the already walked path (defined by the book examples and similar things).

Just for the record, I've done a lot with most kinds of threading schemes (even if the only lockless thing I implemented was a simple Shared/WeakPtr implementation *shiver*). This may very well have the effect that there are some patterns burned into my head that somehow clash with some ideas behind the current system. But, for most of the points, I am quite sure that there is no viable alternative if performance and memory consumption should be anywhere new the optimum.

I apologize for the length of this post, although I already tried to make it as short as possible and left out a lot of details. Also it is very possible that I assume some false things about the concurrency implementation because my knowledge is mostly based only on the NG and the book chapter.

The following problems are those that I found during a one day endeavor to convert some parts of my code base to spawn/shared (not really successful, partly because of the very viral nature of shared).


1. spawn and objects

	Spawn only supports 'function' + some bound parameters. Since taking the address of an object method in D always yields a delegate, it is not possible to call class members without a static wrapper function. This can be quite disturbing when working object oriented (C++ obviously has the same problem).


2. error messages

	Right now, error messages just state that there is a shared/unshared mismatch somewhere. For a non-shared-expert, this can be a real bummer. You have to know a lot of implications 'shared' has to be able to correctly interpret these messages and track down the cause. Not very good for a feature that is meant to make threading easier.


3. everything in implicit

	This may seem kind of counter-intuitive, but using 'synchronized' classes and features like setSameMutex - which are deadly necessary, it is stupid to neglect the importance of lock based threading in an object oriented environment - creates a feeling of climbing without a safety rope. Not stating how you really want to synchronize/lock and not being able to directly read  from the code how this is really done just leaves a black-box feeling. This in turn means threading newcomers will not be educated, they just use the system somehow and it magically works. But as soon as you get problems such as deadlocks, you suddenly have to understand the details and in this moment you have to read up and remember everything that is going on in the background - plus everything you would have to know about threading/synchronization in C. I'm not sure if this is the right course here or if there is any better one.


4. steep learning curve - more a high learning wall to climb on

	Resulting from the first points, my feeling tells me that a newcomer, who has not followed the discussions and thoughts about the system here, will see himself standing before a very high barrier of material to learn, before he can actually put anything of it to use. Also I imagine this to be a very painful process because of all the things that you discover are not possible or those error messages that potentially make you banging your head against the wall.

	
5. advanced synchronization primitives need to be considered

	Things such as core.sync.condition (the most important one) need to be considered in the 'shared'-system. This means there needs to be a condition variable that takes a shared object instead of a mutex or you have to be able to query an objects mutex.

	
6. temporary unlock

	There are often situations when you do lock-based programming, in which you need to temporarily unlock your mutex, perform some time consuming external task (disk i/o, ...) and then reaquire the mutex. For this feature, which is really important also because it is really difficult and dirty to work around it, needs language support, could be something like the inverse of a synchronized {} scope or the possibility to define a special kind of private member function that unlocks the mutex. Then, inside whose blocks the compiler of course has to make sure that the appropriate access rules are not broken (could be as conservative as disallowing access to any class member).

	
7. optimization of pseudo-shared objects

	Since the sharability/'synchronized' of an object is already decided at class definition time, for performance reasons it should be possible to somehow disable the mutex of those instances that are only used thread locally. Maybe it should be necessary to declare objects as "shared C c;" even if the class is defined as "synchronized class C {}" or you will get an object without a mutex which is not shared?

	
8. very strong split of shared and non-shared worlds

	For container classes in particular it is really nasty that you have to define two versions of the container, one shared and the other non-shared if you want to be able to use it in both contexts and be able to put non-shared objects in it in a non-shared context. Also there should really be a way to declare a class to be hygienic in a way similar to pure, so that it would be possible to allow it to be used in a synchronized context and store shared objects, although it is not shared itself.

	
9. unique

	Unique objects or chunks of data are really important not only to be able to check that a cast to 'immutable' is correct, but also to allow for passing objects to another thread for computations without making a superfluous copy or doing superfluous computation.

	
10. recursive locking

	The only option right now is to have mutexes behave recursively. This is good to easily avoid deadlocks in the same thread. However, in my experience they are very dangerous because typically no one takes into account what happens when an algorithm is invoked recursively from the middle of its computation. This can happen easily in a threaded environment where you often use signals/slots or message passing. A deadlock or at least an assertion in debug mode is a good indicator in 90% of the situations that there just happened something that should not. Of course objects with shared mutexes are a different matter - in this case you actually need to have an ownership relation to do anything useful with non-recursive mutexes.

	
11. holes in the system

	It seems like there are a lot of ways in which you can still slip in non-shared data into a shared context.

	One example is that you can pass a shared array
	---
		void fnc(int[] arr);
		void fnc2(){
			shared int[] arr;
			spawn(&fnc, arr);
		}
	---
	
	compiles. This is just a bug and probably easy to fix but what about:
	
	---
		class C {
			private void method();
			private void method2(){
				spawn( void function(C inst){ inst.method(); }, this );
			}
		}
	---
	
	unless private functions to recursive locking (which in turn is usually useless overhead), method() will be invoked in a completely unprotected context. Tthis one has to be fixed somehow in the language. I'm sure there are other things like these.

	
12. more practical examples need to be considered

	It seems right now, that all the examples, that are used to explore the features needed in the system, are somehow of a very academical nature. Either the most simple i/o or pure functional comptation, maybe a network protocol. However, when it comes to practical high performance computation on real systems, where memory consumption and low-level performance can really matter, there seems to be quite some no-mans-land here.
	
	Here some simple examples where I immediately came to a grinding halt:
	
	I. A an object loader with background processing
	
		You have a shared class Loader which uses multiple threads to load objects on demand and then fires a signal or returns from its loadObject(x) method.
		
		The problem is that the actual loading of an object must happen outside of a synchronized region of the loader or you get no parallelism out of this. Also, you have to use an external function because of 'spawn' instead of being able to directly use a member function. Fortunately in this case this is also the solution. Defining an external function, that takes the arguments needed to load the object, loading it, and then passing it back to the class.
		Waiting for finished objects can be implemented using message passing without worry here because the MP overhead is probably low enough.
		
		Features missing:
			- spawn with methods
			- temporary unlock
			
	II. Implementation of a ThreadPool
	
		The majority of applications can very well be broken up into small chunks of work that can be processed in parallel. Instead of using a costly thread-create, run task, thread-destroy cycle, it would be wise to reuse the threads for later tasks. The implementation of a thread pool that does this is of course a low-level thing and you could argue that it is ok to use some casts and such stuff here. Anyway, there are quite some things missing here.
		
		Features Missing:
			- spawn with methods
			- temporary unlock
			- condition variables (message passing too slow + you need to manage destinations)

	III. multiple threads computing separate parts of an array
		
		Probably the most simple form of parallelism is to perform similar operations on each element of an array (or similar things on regions of the array) and to do this in separate threads.
		The good news is that this works in the current implementation. The bad news is that this is really slow because you have to use atomic operations on the elements or it is unsafe and prone to low-level races. Right now the compiler checks almost nothing.
		The alternative would be to pass unique
		
		To illustrate the current situation, this compiles and runs:

		---
			import std.concurrency;
			import std.stdio;

			void doCompute(size_t offset, int[] arr){ // arr should be shared
				foreach(i, ref el; arr){
					el *= 2; // should be an atomic operation, which would make this useless because of the performance penalty
					writefln("Thread %s computed element %d: %d", thisTid(), i + offset, cast(int)el);
				}
			}

			void waitForThread(Tid thread){
				// TODO: implement in some complex way using messages or maybe there is a simple function for this
			}

			void main(){
				shared int[] myarray = [1, 2, 3, 4];
				Tid[2] threads;
				foreach( i, ref t; threads )
					t = spawn(&doCompute, i, myarray[i .. i+3]); // should error out because the slice is not shared
				foreach( t; threads )
					waitForThread(t);
			}
		---
		
		Features missing:
			- unique
			- some way to safely partition/slice an array and get a set of still unique slices


- Sönke
September 12, 2010
== Quote from Sönke_Ludwig (ludwig@informatik.uni-luebeck.de)'s article
> Now, however, after TDPL has been released and there is some documentation + std.concurrency, the system should be in a state where it is actually useful and only some bugs should be there to fix - which does not include inherent system changes. The reality is quite different once you step anywhere beside the already walked path (defined by the book examples and similar things).

std.concurrency takes the point of view that simplicity and safety should come
first, and performance and flexibility second.  I thoroughly appreciate this post,
as it gives ideas for either improving std.concurrency or creating alternative models.

> I apologize for the length of this post, although I already tried to make it as short as possible and left out a lot of details.

No need to apologize, I think it's great that you're willing to put this much effort into it.

> 1. spawn and objects
> 	Spawn only supports 'function' + some bound parameters. Since taking
> the address of an object method in D always yields a delegate, it is not
> possible to call class members without a static wrapper function. This
> can be quite disturbing when working object oriented (C++ obviously has
> the same problem).

Except in the case of an immutable or shared object this would be unsafe, as it would allow implicit sharing.  I do agree, though, that delegates need to be allowed if they're immutable or shared delegates.  Right now taking the address of a shared/immutable member function doesn't yield a shared/immutable delegate. There are bug reports somewhere in Bugzilla on this.

> 2. error messages
> 	Right now, error messages just state that there is a shared/unshared
> mismatch somewhere. For a non-shared-expert, this can be a real bummer.
> You have to know a lot of implications 'shared' has to be able to
> correctly interpret these messages and track down the cause. Not very
> good for a feature that is meant to make threading easier.

Agreed.  Whenever you run into an unreasonably obtuse error message, a bug report would be appreciated.  Bug reports related to wrong or extremely obtuse error messages are considered "real", though low priority, bugs around here.

> 4. steep learning curve - more a high learning wall to climb on
> 	Resulting from the first points, my feeling tells me that a newcomer,
> who has not followed the discussions and thoughts about the system here,
> will see himself standing before a very high barrier of material to
> learn, before he can actually put anything of it to use. Also I imagine
> this to be a very painful process because of all the things that you
> discover are not possible or those error messages that potentially make
> you banging your head against the wall.

True, but I think this is just a fact of life when dealing with concurrency in general.  Gradually (partly due to the help of people like you pointing out the relevant issues) the documentation, etc. will improve.

> 5. advanced synchronization primitives need to be considered
> 	Things such as core.sync.condition (the most important one) need to be
> considered in the 'shared'-system. This means there needs to be a
> condition variable that takes a shared object instead of a mutex or you
> have to be able to query an objects mutex.

The whole point of D's flagship concurrency model is that you're supposed to use message passing for most things.  Therefore, lock-based programming is kind of half-heartedly supported.  It sounds like you're looking for a low-level model (which is available via core.thread and core.sync, though it isn't the flagship model).  std.concurrency is meant to be a high-level model useful for simple, safe everyday concurrency, not the **only** be-all-and-end-all model of multithreading in D.

> 6. temporary unlock
> 	There are often situations when you do lock-based programming, in which
> you need to temporarily unlock your mutex, perform some time consuming
> external task (disk i/o, ...) and then reaquire the mutex. For this
> feature, which is really important also because it is really difficult
> and dirty to work around it, needs language support, could be something
> like the inverse of a synchronized {} scope or the possibility to define
> a special kind of private member function that unlocks the mutex. Then,
> inside whose blocks the compiler of course has to make sure that the
> appropriate access rules are not broken (could be as conservative as
> disallowing access to any class member).

Again, the point of std.concurrency is to be primarily message passing-based.  It really sounds like what you want is a lower-level model.  Again, it's available, but it's not considered the flagship model.

> 7. optimization of pseudo-shared objects
> 	Since the sharability/'synchronized' of an object is already decided at
> class definition time, for performance reasons it should be possible to
> somehow disable the mutex of those instances that are only used thread
> locally. Maybe it should be necessary to declare objects as "shared C
> c;" even if the class is defined as "synchronized class C {}" or you
> will get an object without a mutex which is not shared?

Agreed.  IMHO locks should only be taken on a synchronized object if its compile-time type is shared.  Casting away shared should result in locks not being used.

> 9. unique
> 	Unique objects or chunks of data are really important not only to be
> able to check that a cast to 'immutable' is correct, but also to allow
> for passing objects to another thread for computations without making a
> superfluous copy or doing superfluous computation.

A Unique type is in std.typecons.  I don't know how well it currently works, but I agree that we need a way to express uniqueness to make creating immutable data possible.

> 11. holes in the system
> 	It seems like there are a lot of ways in which you can still slip in
> non-shared data into a shared context.
> 	One example is that you can pass a shared array
> 	---
> 		void fnc(int[] arr);
> 		void fnc2(){
> 			shared int[] arr;
> 			spawn(&fnc, arr);
> 		}
> 	---
> 	compiles. This is just a bug and probably easy to fix but what about:

Definitely just a bug.

> 	---
> 		class C {
> 			private void method();
> 			private void method2(){
> 				spawn( void function(C inst){ inst.method(); }, this );
> 			}
> 		}
> 	---

Just tested this, and it doesn't compile.

> 	II. Implementation of a ThreadPool
> 		The majority of applications can very well be broken up into small
> chunks of work that can be processed in parallel. Instead of using a
> costly thread-create, run task, thread-destroy cycle, it would be wise
> to reuse the threads for later tasks. The implementation of a thread
> pool that does this is of course a low-level thing and you could argue
> that it is ok to use some casts and such stuff here. Anyway, there are
> quite some things missing here.

My std.parallelism module that's currently being reviewed for inclusion in Phobos has a thread pool and task parallelism, though it is completely unsafe (i.e. it allows implicit sharing and will not be allowed in @safe code).  std.concurrency was simply not designed for pull-out-all-stops parallelism, and pull-out-all-stops parallelism is inherently harder than basic concurrency to make safe.  I've given up making most std.parallelism safe, but I think I may be able to make a few islands of it safe.  The question is whether those islands would allow enough useful things to be worth the effort.  See the recent safe asynchronous function calls thread.  Since it sounds like you need something like this, I'd sincerely appreciate your comments on this module.  The docs are at:

http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html

Code is at:

http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/std_parallelism.d


> 	III. multiple threads computing separate parts of an array
> 		Probably the most simple form of parallelism is to perform similar
> operations on each element of an array (or similar things on regions of
> the array) and to do this in separate threads.
> 		The good news is that this works in the current implementation. The
> bad news is that this is really slow because you have to use atomic
> operations on the elements or it is unsafe and prone to low-level races.
> Right now the compiler checks almost nothing.

Also in the proposed std.parallelism module, though completely unsafe because it needs to be fast.

September 12, 2010
I must say I agree with most of your observations. Here are some comments...

On 2010-09-12 09:35:42 -0400, Sönke Ludwig <ludwig@informatik.uni-luebeck.de> said:

> 3. everything in implicit
> 
> 	This may seem kind of counter-intuitive, but using 'synchronized' classes and features like setSameMutex - which are deadly necessary, it is stupid to neglect the importance of lock based threading in an object oriented environment - creates a feeling of climbing without a safety rope. Not stating how you really want to synchronize/lock and not being able to directly read  from the code how this is really done just leaves a black-box feeling. This in turn means threading newcomers will not be educated, they just use the system somehow and it magically works. But as soon as you get problems such as deadlocks, you suddenly have to understand the details and in this moment you have to read up and remember everything that is going on in the background - plus everything you would have to know about threading/synchronization in C. I'm not sure if this is the right course here or if there is any better one.

I'm a little uncomfortable with implicit synchronization too.

Ideally you should do as little as possible from inside a synchronized statement, and be careful about what functions you call (especially if they take some other lock). But the way synchronized classes work, they basically force you to do the reverse -- put everything under the lock -- and you don't have much control over it. Implicit synchronization is good for the simple getter/setter case, but for longer functions they essentially encourage a bad practice.


> 6. temporary unlock
> 
> 	There are often situations when you do lock-based programming, in which you need to temporarily unlock your mutex, perform some time consuming external task (disk i/o, ...) and then reaquire the mutex. For this feature, which is really important also because it is really difficult and dirty to work around it, needs language support, could be something like the inverse of a synchronized {} scope or the possibility to define a special kind of private member function that unlocks the mutex. Then, inside whose blocks the compiler of course has to make sure that the appropriate access rules are not broken (could be as conservative as disallowing access to any class member).

Well, you can work around this by making another shared class wrapping the synchronized class and making things you want to happen in a synchronized block functions of the synchronized class. But that certainly is a lot of trouble. I'd tend to say implicit synchronization is the problem.


> 9. unique
> 
> 	Unique objects or chunks of data are really important not only to be able to check that a cast to 'immutable' is correct, but also to allow for passing objects to another thread for computations without making a superfluous copy or doing superfluous computation.

Indeed, no-aliasing guaranties are important and useful, and not only for multithreading. But unique as a type modifier also introduce other complexities to the language, and I can understand why it was chosen not to add it to D2. I still wish we had it.


> 11. holes in the system
> 
> 	It seems like there are a lot of ways in which you can still slip in non-shared data into a shared context.

Your examples are just small bugs in spawn. They'll eventually get fixed.

If you want a real big hole in the type system, look at the destructor problem.
<http://d.puremagic.com/issues/show_bug.cgi?id=4621>

Some examples of bugs that slip by because of it:
<http://d.puremagic.com/issues/show_bug.cgi?id=4624>


> 12. more practical examples need to be considered
> 
> [...]
> 
> 	III. multiple threads computing separate parts of an array

If we had a no-aliasing guaranty in the type system (unique), we could make a "splitter" function that splits a unique array at the right positions and returns unique chunks which can be accessed independently by different cores with no race. You could then send each chunk to a different thread with correctness assured. Without this no-aliasing guaranty you can still implement this splitter function, but you're bound to use casts when using it (or suffer the penalty of atomic operations).

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

September 12, 2010
On Sun, 12 Sep 2010 09:35:42 -0400, Sönke Ludwig <ludwig@informatik.uni-luebeck.de> wrote:
> 9. unique
>
> 	Unique objects or chunks of data are really important not only to be able to check that a cast to 'immutable' is correct, but also to allow for passing objects to another thread for computations without making a superfluous copy or doing superfluous computation.

Unique (or for those with an Occam background 'mobile') has several proponents in the D community (myself included). It was seriously considered for inclusion in the type system, but Walter found several issues with it on a practical level. If I recall correctly, Walter's exact issues weren't made public, but probably stem from the fact that unique/mobile types in other languages are generally library defined and are 'shallow'. They exist as a 'please use responsibly' / 'here be dragons' feature. For unique to be safe, it needs to be transitive, but this severely limits the objects that can be represented. For example, a doubly-linked-list can not be unique. Unique has been integrated into the Clean and Mercury functional languages (or so says Wikipedia), so there might be reasonable solutions to these problems.
September 15, 2010
>> 1. spawn and objects
>> 	Spawn only supports 'function' + some bound parameters. Since taking
>> the address of an object method in D always yields a delegate, it is not
>> possible to call class members without a static wrapper function. This
>> can be quite disturbing when working object oriented (C++ obviously has
>> the same problem).
>
> Except in the case of an immutable or shared object this would be unsafe, as it
> would allow implicit sharing.  I do agree, though, that delegates need to be
> allowed if they're immutable or shared delegates.  Right now taking the address of
> a shared/immutable member function doesn't yield a shared/immutable delegate.
> There are bug reports somewhere in Bugzilla on this.
>

Good to know that there are already bug reports. I remember the discussion about allowing shared(delegate) or immutable(delegate) and this would be a possible solution. However, I still find the idea that those attributes are bound to the delegate type awkward and wrong, as a delegate is typically supposed to hide away the internally used objects/state and this is just a special case for direct member function delegates (what about an inline (){ obj.method(); }?). Also const is not part of the delegate and does not have to be because it can be checked at delegate creation time. But this is probably a topic on its own.

>> 2. error messages
>> 	Right now, error messages just state that there is a shared/unshared
>> mismatch somewhere. For a non-shared-expert, this can be a real bummer.
>> You have to know a lot of implications 'shared' has to be able to
>> correctly interpret these messages and track down the cause. Not very
>> good for a feature that is meant to make threading easier.
>
> Agreed.  Whenever you run into an unreasonably obtuse error message, a bug report
> would be appreciated.  Bug reports related to wrong or extremely obtuse error
> messages are considered "real", though low priority, bugs around here.
>

I will definitely file bug reports when I continue in this area, I just wanted to stress how important the error messages are in this part of the language, because the root cause is most often very non-obvious compared to other type-system errors.

>> 4. steep learning curve - more a high learning wall to climb on
>> 	Resulting from the first points, my feeling tells me that a newcomer,
>> who has not followed the discussions and thoughts about the system here,
>> will see himself standing before a very high barrier of material to
>> learn, before he can actually put anything of it to use. Also I imagine
>> this to be a very painful process because of all the things that you
>> discover are not possible or those error messages that potentially make
>> you banging your head against the wall.
>
> True, but I think this is just a fact of life when dealing with concurrency in
> general.  Gradually (partly due to the help of people like you pointing out the
> relevant issues) the documentation, etc. will improve.

...plus that even for someone who is already experienced with threading in other langugages, there is a lot to know now in D if you go the shared path instead of the C++/__gshared path.

>
>> 5. advanced synchronization primitives need to be considered
>> 	Things such as core.sync.condition (the most important one) need to be
>> considered in the 'shared'-system. This means there needs to be a
>> condition variable that takes a shared object instead of a mutex or you
>> have to be able to query an objects mutex.
>
> The whole point of D's flagship concurrency model is that you're supposed to use
> message passing for most things.  Therefore, lock-based programming is kind of
> half-heartedly supported.  It sounds like you're looking for a low-level model
> (which is available via core.thread and core.sync, though it isn't the flagship
> model).  std.concurrency is meant to be a high-level model useful for simple, safe
> everyday concurrency, not the **only** be-all-and-end-all model of multithreading
> in D.
>
>> 6. temporary unlock
>> 	There are often situations when you do lock-based programming, in which
>> you need to temporarily unlock your mutex, perform some time consuming
>> external task (disk i/o, ...) and then reaquire the mutex. For this
>> feature, which is really important also because it is really difficult
>> and dirty to work around it, needs language support, could be something
>> like the inverse of a synchronized {} scope or the possibility to define
>> a special kind of private member function that unlocks the mutex. Then,
>> inside whose blocks the compiler of course has to make sure that the
>> appropriate access rules are not broken (could be as conservative as
>> disallowing access to any class member).
>
> Again, the point of std.concurrency is to be primarily message passing-based.  It
> really sounds like what you want is a lower-level model.  Again, it's available,
> but it's not considered the flagship model.
>

Agreed that the flagship model is message passing and to a degree I think that is quite reasonable (except that object orientation + message passing comes a bit too short IMO). However, I think the support for the rest is a bit too half hearted if you have to use casts for everything. There are quite some low hanging fruits where a simple syntax or library extension could increase the flexibility without sacrificing safety or complexity.

>> 9. unique
>
> Just tested this, and it doesn't compile.
>

Forgot the 'shared' in that example:

---
import std.concurrency;

synchronized class Test {
	void publicMethod(){
		spawn( function void(shared Test inst){ inst.privateMethod(); }, this );
	}
	
	private void privateMethod(){
	}
}
---

>> 	II. Implementation of a ThreadPool
>
> My std.parallelism module that's currently being reviewed for inclusion in Phobos
> has a thread pool and task parallelism, though it is completely unsafe (i.e. it
> allows implicit sharing and will not be allowed in @safe code).  std.concurrency
> was simply not designed for pull-out-all-stops parallelism, and pull-out-all-stops
> parallelism is inherently harder than basic concurrency to make safe.  I've given
> up making most std.parallelism safe, but I think I may be able to make a few
> islands of it safe.  The question is whether those islands would allow enough
> useful things to be worth the effort.  See the recent safe asynchronous function
> calls thread.  Since it sounds like you need something like this, I'd sincerely
> appreciate your comments on this module.  The docs are at:
>
> http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html
>
> Code is at:
>
> http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/std_parallelism.d
>
>
>> 	III. multiple threads computing separate parts of an array
>
> Also in the proposed std.parallelism module, though completely unsafe because it
> needs to be fast.
>

I will definitely be looking into std.parallelism (I already have a thread pool, but that right now is not really sophisticated, mostly because of the previous lack of some advanced synchronization primitives).
September 15, 2010
Am 12.09.2010 17:54, schrieb Robert Jacques:
> On Sun, 12 Sep 2010 09:35:42 -0400, Sönke Ludwig
> <ludwig@informatik.uni-luebeck.de> wrote:
>> 9. unique
>>
>> Unique objects or chunks of data are really important not only to be
>> able to check that a cast to 'immutable' is correct, but also to allow
>> for passing objects to another thread for computations without making
>> a superfluous copy or doing superfluous computation.
>
> Unique (or for those with an Occam background 'mobile') has several
> proponents in the D community (myself included). It was seriously
> considered for inclusion in the type system, but Walter found several
> issues with it on a practical level. If I recall correctly, Walter's
> exact issues weren't made public, but probably stem from the fact that
> unique/mobile types in other languages are generally library defined and
> are 'shallow'. They exist as a 'please use responsibly' / 'here be
> dragons' feature. For unique to be safe, it needs to be transitive, but
> this severely limits the objects that can be represented. For example, a
> doubly-linked-list can not be unique. Unique has been integrated into
> the Clean and Mercury functional languages (or so says Wikipedia), so
> there might be reasonable solutions to these problems.

Unique indeed seems to be a complicated problem when you want to make it flexible for the case of nested objects. However, I think it might already be very useful, even if it works well only with POD types and arrays. It would be interesting to collect different use cases and see what is really needed here so that an overall solution can be created later.
September 15, 2010
Am 12.09.2010 17:00, schrieb Michel Fortin:
>> 9. unique
>>
>> Unique objects or chunks of data are really important not only to be
>> able to check that a cast to 'immutable' is correct, but also to allow
>> for passing objects to another thread for computations without making
>> a superfluous copy or doing superfluous computation.
>
> Indeed, no-aliasing guaranties are important and useful, and not only
> for multithreading. But unique as a type modifier also introduce other
> complexities to the language, and I can understand why it was chosen not
> to add it to D2. I still wish we had it.
>
>
>> 11. holes in the system
>>
>> It seems like there are a lot of ways in which you can still slip in
>> non-shared data into a shared context.
>
> Your examples are just small bugs in spawn. They'll eventually get fixed.
>
> If you want a real big hole in the type system, look at the destructor
> problem.
> <http://d.puremagic.com/issues/show_bug.cgi?id=4621>
>
> Some examples of bugs that slip by because of it:
> <http://d.puremagic.com/issues/show_bug.cgi?id=4624>
>
>
>> 12. more practical examples need to be considered
>>
>> [...]
>>
>> III. multiple threads computing separate parts of an array
>
> If we had a no-aliasing guaranty in the type system (unique), we could
> make a "splitter" function that splits a unique array at the right
> positions and returns unique chunks which can be accessed independently
> by different cores with no race. You could then send each chunk to a
> different thread with correctness assured. Without this no-aliasing
> guaranty you can still implement this splitter function, but you're
> bound to use casts when using it (or suffer the penalty of atomic
> operations).
>

If the language allows for creating an array, splitting it and processing the chunks in separate threads - and that without any cast in the user part of the code + the user code is safely checked - I think everything would be fine. Of course a full solution in the language would be ideal, but my worries are more that in general you have to leave the checked part of the type system so often, that all that type checking might be completely useless as only the most simple threading constructs are checked. In that way a library solution that hides the casts and still guarantees (almost) safe behaviour would already be a huge step forward.

Maybe a UniqeArray(T) type that is library checked and that you can pass through spawn() would be a sufficient solution to at least this problem. It could make sure that T is POD and that only operations are allowed that still guarantee uniqueness of the elements.
September 15, 2010
== Quote from Sönke_Ludwig (ludwig@informatik.uni-luebeck.de)'s
> If the language allows for creating an array, splitting it and
> processing the chunks in separate threads - and that without any cast in
> the user part of the code + the user code is safely checked - I think
> everything would be fine. Of course a full solution in the language
> would be ideal, but my worries are more that in general you have to
> leave the checked part of the type system so often, that all that type
> checking might be completely useless as only the most simple threading
> constructs are checked. In that way a library solution that hides the
> casts and still guarantees (almost) safe behaviour would already be a
> huge step forward.
> Maybe a UniqeArray(T) type that is library checked and that you can pass
> through spawn() would be a sufficient solution to at least this problem.
> It could make sure that T is POD and that only operations are allowed
> that still guarantee uniqueness of the elements.

I thought about making a safe std.parallelism.map().  (There's currently an unsafe
one.)   It's do-able under some limited circumstances but there are a few roadblocks:

1.  The array passed in would have to be immutable, which also would make it very difficult to make map() work on generic ranges.

2.  The return value of the mapping function would not be allowed to have unshared aliasing.

3.  No user-supplied buffers for writing the result to.

A safe parallel foreach just Ain't Gonna Work (TM) because the whole point of foreach is that it takes a delegate and everything reachable from the stack frame is visible in all worker threads.