Data Parallelism

I realize it's probably a little late to be jumping into the concurrency game.  I've been mostly staying on the sidelines, reading this list here and there, since the only multithreading I understand well or personally have any use for is data parallelism/use-every-core-I-have.  However, from reading the TDPL drafts I am getting concerned that D is moving towards message passing as the "one true way", and if this doesn't do what you need you're left to cowboy everything anyhow.  I've been slowly hacking away, improving my Parallelfuture library (which currently does basically cowboy everything), and am wondering if some of the main architects (Sean, Andrei) could take a look at it and see if some relatively small changes could be made to either the library or the language to make this library reasonably safe so it could be treated as a first-class citizen.

Ideally I'd like to make this thing safe(r) and get it into Phobos, though I don't know if it could be made to play nicely with the message passing/sharing model that's been discussed here.  I'd like to avoid the situation where message passing becomes the "one true way" of doing multithreading in D and the concurrency model still forces you to cowboy just about everything if message passing isn't what you need.

The docs for Parallelfuture are at: http://cis.jhu.edu/~dsimcha/parallelFuture.html

The code is at: http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d

January 30, 2010

[dmd-concurrency] Data Parallelism

Posted by Robert Jacques
in reply to David Simcha

Permalink

Robert Jacques

Posted in reply to David Simcha

Permalink

On Mon, 25 Jan 2010 09:24:00 -0500, David Simcha <dsimcha at gmail.com> wrote:

> I realize it's probably a little late to be jumping into the concurrency game.  I've been mostly staying on the sidelines, reading this list here and there, since the only multithreading I understand well or personally have any use for is data parallelism/use-every-core-I-have.  However, from reading the TDPL drafts I am getting concerned that D is moving towards message passing as the "one true way", and if this doesn't do what you need you're left to cowboy everything anyhow.  I've been slowly hacking away, improving my Parallelfuture library (which currently does basically cowboy everything), and am wondering if some of the main architects (Sean, Andrei) could take a look at it and see if some relatively small changes could be made to either the library or the language to make this library reasonably safe so it could be treated as a first-class citizen.
>
> Ideally I'd like to make this thing safe(r) and get it into Phobos, though I don't know if it could be made to play nicely with the message passing/sharing model that's been discussed here.  I'd like to avoid the situation where message passing becomes the "one true way" of doing multithreading in D and the concurrency model still forces you to cowboy just about everything if message passing isn't what you need.
>
> The docs for Parallelfuture are at: http://cis.jhu.edu/~dsimcha/parallelFuture.html
>
> The code is at: http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d

I agree that TDPL needs to acknowledge more than just shared state and message passing. For example, both OpenMP, Intel's Threading Building blocks, NVIDIA's CUDA and Apple's Libdispatch are all very popular. The oft cited Patterns for Parallel Programming outlines six major patterns (IIRC): pipelines, message passing, events, tasks, data-parallel and divide-and-conquer, of which the last three are addressed by your library. And David Callahan / Herb Sutter have talked about the pillars of concurrency: Responsiveness and Isolation Via Asynchronous Agents (MPI), Throughput and Scalability Via Concurrent Collections (OpenMP,tasks) and Consistency Via Safely Shared Resources (locks).

As for safety, I think that using shared delegates, and where appropriate const delegates, will allow the library to be (mostly) safe. Indeed, D should allow this paradigm to be an order of magnitude safer than any of the example solutions I listed above. (For Michel, shared/const delegates are not in the language yet, so you can't expect the library to use them yet)

API notes
- Is there a reason for runCallable to be in the public API?
- task.done needs some documentation. Also, is it thread/race safe?
- Calling a wait routine to get a value seems less then intuitive and is
non-standard for task/future APIs. Task should have a .value method which
performs the default join method(workWait?) and returns the value. Also,
to be consistent with core.thread the word 'join', not 'wait' should be
used. (This was a terminology change between the phobos runtime and
druntime)
- Tasks that are stack allocated should finalize in their function calls.
The documentation seems to indicate this, but it should be clearer.
- task should be able to take delegate using lazy syntax. i.e. auto value
= task( x + y );
- Pool.waitStop should be pool.join to be consistent with thread.join, etc.
- The 'scan' primitive should also probably be included in addition to
map/reduce/task/etc. Details are here
http://en.wikipedia.org/wiki/Prefix_sum.
- Just as there is parallel and map, there should be a reduction option
for reduce. i.e.:

foreach(a,b; pool.reduction( result, values, 100)) {
     return a + b;
}

- There should also be a way to create multiple tasks, do them in parallel and immediately join. The return value should be contained in a tuple. The advantage of this over individual tasks is the ability to use const delegates instead of shared delegates once they are added to the language.

auto values = pool.tasks( a + b, c + d );
assert(values[0] = a + b);
assert(values[1] = c + d);

Forums