Jump to page: 1 25  
Page
Thread overview
std.parallelism changes done
Mar 24, 2011
dsimcha
Mar 24, 2011
Simen kjaeraas
Mar 24, 2011
Sönke Ludwig
Mar 24, 2011
Michel Fortin
Mar 25, 2011
Sönke Ludwig
Mar 25, 2011
dsimcha
Mar 25, 2011
Sönke Ludwig
Mar 25, 2011
dsimcha
Mar 25, 2011
Sönke Ludwig
Mar 24, 2011
dsimcha
Mar 25, 2011
Sönke Ludwig
Mar 25, 2011
dsimcha
Mar 25, 2011
Sönke Ludwig
Mar 25, 2011
dsimcha
Mar 25, 2011
Sönke Ludwig
Mar 25, 2011
Sönke Ludwig
Mar 25, 2011
dsimcha
Mar 26, 2011
Sönke Ludwig
Mar 24, 2011
bearophile
Mar 24, 2011
dsimcha
Mar 24, 2011
bearophile
Mar 24, 2011
dsimcha
Mar 24, 2011
bearophile
Mar 24, 2011
spir
Mar 24, 2011
dsimcha
Mar 24, 2011
spir
Mar 25, 2011
dsimcha
Mar 24, 2011
Russel Winder
Mar 24, 2011
dsimcha
Mar 24, 2011
Sönke Ludwig
Mar 24, 2011
Michel Fortin
Mar 24, 2011
dsimcha
Mar 24, 2011
Michel Fortin
Mar 24, 2011
dsimcha
Mar 24, 2011
Michel Fortin
Mar 24, 2011
dsimcha
Mar 24, 2011
dsimcha
Mar 24, 2011
spir
Mar 24, 2011
spir
Mar 24, 2011
dsimcha
Mar 24, 2011
Jonathan M Davis
March 24, 2011
I've finished all of the changes that were discussed in the initial std.parallelism review.  I know I said I needed more time than this, but honestly, I hit a best-case scenario.  I had more time than I anticipated to work on it *and* the changes (especially fixing the exception handling issue) took less time than I anticipated.

I would say that the documentation has now improved "radically", like Andrei suggested it needs to.  I want to thank Andrei and Michael Fortin for their extremely useful suggestions, and apologize for getting defensive at times.  I felt compelled to defend my initial design in cases where I shouldn't have, due to a combination of time pressure and a misunderstanding of the review process.  Andrei's suggestions in particular led to a tremendous improvement of the documentation and, to a lesser extent, an improvement of the API.

In addition to improving the documentation, I added Task.executeInNewThread() to allow Task to be useful without a TaskPool.  (Should this have a less verbose name?)  I also fixed some exception handling bugs, implemented exception chaining for exceptions thrown concurrently, and fixed some silliness with respect to seed values in reduce().

One thing Andrei mentioned that I'm really not sure about is what to do with TaskPool.join().  My example for it is still terrible, because I think it's an evolutionary artifact.  It was useful in earlier designs that were never released and didn't have high-level data parallelism primitives.  I never use it, don't have any good use cases for it and would be inclined to remove it entirely.  Andrei seems to have some good use cases in mind, but he has not detailed any that I believe are reasonably implementable and I'm not sure whether they could be solved better using the higher-level data parallelism primitives.

As far as the vote, I know I asked for more time and to allow another module ahead of me, and of course I'll honor that.  I'd like to un-git stash this review as soon as isemail is done, though.  I anticipate a decent amount more suggestions now, given how major the changes have been, but most will probably be minor things like documentation clarifications and renaming stuff.

The new docs are at http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html .
March 24, 2011
On Thu, 24 Mar 2011 05:32:26 +0100, dsimcha <dsimcha@yahoo.com> wrote:

> In addition to improving the documentation, I added Task.executeInNewThread() to allow Task to be useful without a TaskPool.   (Should this have a less verbose name?)

spawnAndRun?


-- 
Simen
March 24, 2011
Am 24.03.2011 05:32, schrieb dsimcha:
> In addition to improving the documentation, I added
> Task.executeInNewThread() to allow Task to be useful without a TaskPool.
> (Should this have a less verbose name?)

The threading system I designed for the company I work for uses priority per task to control which tasks can overtake others. A special priority is out-of-bands (the name my be debatable), which will guarantee that the task will run in its own thread so it can safely wait for other tasks. However, those threads that process OOB tasks are also cached in the thread pool and reused for new OOB tasks. Only if the number of parallel OOB tasks goes over a specific number, new threads will be created and destroyed. This can safe quite a bit of time for those tasks.

Both kinds of priority have been very useful and I would suggest to put at least the executeInNewThread() method into ThreadPool to be later able to make such an optimization.

The task priority thing in general may only be necessary for complex applications with user interaction, where you have to statisfy certain interactivity needs. I wouldn't be too sad if this is not implemented now, but it would be good to keep it in mind as a possible improvement for later.

Sönke
March 24, 2011
dsimcha:

> and apologize for getting defensive at times.

It happens to mammals, don't worry.


> The new docs are at http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html .

>    real getTerm(int i) {
>        immutable x = ( i - 0.5 ) * delta;
>        return delta / ( 1.0 + x * x ) ;
>    }
>    immutable pi = 4.0 * taskPool.reduce!"a + b"(
>        std.algorithm.map!getTerm(iota(n))
>    );

For the examples I suggest to use q{a + b} instead of "a + b".

When D will gain a good implementation of conditional purity, I think taskPool.reduce and taskPool.map may accept only pure functions to map on and pure iterables to work on.


>template map(functions...)
>    Eager parallel map.

So in Phobos you have std.algorithm.map that's lazy and taskPool.map that's eager. I think it's not so good to have two functions with the same name but subtly different purposes. I have suggested Andrei to add a std.algorithm.amap, that is array(map()):
http://d.puremagic.com/issues/show_bug.cgi?id=5756
So I suggest to rename "taskPool.map" as "askPool.amap" and to rename "taskPool.lazyMap" as "taskPool.map".


>    auto squareRoots = new float[numbers.length];
>    taskPool.map!sqrt(numbers, squareRoots);

Currently to do the same thing with the normal std.algorithm.map you need something like:
auto squareRoots = new double[numbers.length];
copy(map!sqrt(numbers), squareRoots);


> A semi-lazy parallel map that can be used for pipelining.

The idea of such vectorized lazyness may give a performance enhancement even for the serial std.algorithm.map.

In the module documentation I'd like to see a graph that shows how the parallel map/reduce/foreach scale as the number of cores goes to 1 to 2 to 4 to 8 (or more) :-)

Thank you for this very nice Phobos module.

Bye,
bearophile
March 24, 2011
Hm depending on the way the pool is used, it might be a better default to have the number of threads equal the number of cpu cores. In my experience the control thread is mostly either waiting for tasks or processing messages and blocking in between so it rarely uses a full core, wasting the available computation time in this case.

However, I'm not really sure if it is like this for the majority of all applications or if there are more cases where the control thread will continue to do computations in parallel. Maybe we could collect some opinions on this?

On another note, I would like to see a rough description on what the default workUnitSize is depending on the size of the input. Otherwise it feels rather uncomfortable to use this version of parallel().

Another small addition would be to state that the object returned by asyncBuf either is an InputRange or which useful methods it might have (some kind of progress counter could also be useful here).

Btw., sorry if anything of this has already been discussed. I have missed the previous discussion unfortunately.

Sönke
March 24, 2011
On 03/24/2011 05:32 AM, dsimcha wrote:
> One thing Andrei mentioned that I'm really not sure about is what to do with
> TaskPool.join().  My example for it is still terrible, because I think it's an
> evolutionary artifact.  It was useful in earlier designs that were never
> released and didn't have high-level data parallelism primitives.  I never use
> it, don't have any good use cases for it and would be inclined to remove it
> entirely.  Andrei seems to have some good use cases in mind, but he has not
> detailed any that I believe are reasonably implementable and I'm not sure
> whether they could be solved better using the higher-level data parallelism
> primitives.

If I may have a suggestion: just let it aside. Instead of piling up features, just have the core available, and let actual needs show up in real life.
An exception may be when a given potential feature would require major re-design. Then, better to anticipate with a design that could accomodate it. Else, better to apply the famous phrase that design is finished when nothing is left to remove. Very true (even more in PL design where it's about impossible to remove a feature once released, as an after-thought).

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

March 24, 2011
On 2011-03-24 03:29:52 -0400, Sönke Ludwig <ludwig@informatik.uni-luebeck.de> said:

> Hm depending on the way the pool is used, it might be a better default to have the number of threads equal the number of cpu cores. In my experience the control thread is mostly either waiting for tasks or processing messages and blocking in between so it rarely uses a full core, wasting the available computation time in this case.
> 
> However, I'm not really sure if it is like this for the majority of all applications or if there are more cases where the control thread will continue to do computations in parallel. Maybe we could collect some opinions on this?

The current default is good for command line applications where the main thread generally blocks while you're doing your work. The default you're proposing is good when you're using the task pool to pile up tasks to perform in background, which is generally what you do in an event-driven application. The current default is good to keeps simpler the simpler programs which are more linear in nature.

My use case is like yours: a event-driven main thread which starts tasks to be performed in the background.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

March 24, 2011
On 2011-03-24 03:00:01 -0400, Sönke Ludwig <ludwig@informatik.uni-luebeck.de> said:

> Am 24.03.2011 05:32, schrieb dsimcha:
>> In addition to improving the documentation, I added
>> Task.executeInNewThread() to allow Task to be useful without a TaskPool.
>> (Should this have a less verbose name?)
> 
> The threading system I designed for the company I work for uses priority per task to control which tasks can overtake others. A special priority is out-of-bands (the name my be debatable), which will guarantee that the task will run in its own thread so it can safely wait for other tasks. However, those threads that process OOB tasks are also cached in the thread pool and reused for new OOB tasks. Only if the number of parallel OOB tasks goes over a specific number, new threads will be created and destroyed. This can safe quite a bit of time for those tasks.
> 
> Both kinds of priority have been very useful and I would suggest to put at least the executeInNewThread() method into ThreadPool to be later able to make such an optimization.
> 
> The task priority thing in general may only be necessary for complex applications with user interaction, where you have to statisfy certain interactivity needs. I wouldn't be too sad if this is not implemented now, but it would be good to keep it in mind as a possible improvement for later.

Do you think having multiple task pools each with a different thread priority would do the trick? Simply put tasks in the task pool with the right priority...  I had a similar use case in mind and this is what I proposed in the previous discussion.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

March 24, 2011
On 03/24/2011 05:32 AM, dsimcha wrote:
> [...]
>
> The new docs are at http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html .

About the doc: very good. I could understand most of it, while knowing nearly nothing about parallelism prior to reading.
2 details:
* highlight key words only on first occurrence (bold online)
* wrong doc for Task.isPure (gets a copy of Tast.args' doc)

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

March 24, 2011
On 3/24/2011 3:00 AM, Sönke Ludwig wrote:
> Am 24.03.2011 05:32, schrieb dsimcha:
>> In addition to improving the documentation, I added
>> Task.executeInNewThread() to allow Task to be useful without a TaskPool.
>> (Should this have a less verbose name?)
>
> The threading system I designed for the company I work for uses priority
> per task to control which tasks can overtake others. A special priority
> is out-of-bands (the name my be debatable), which will guarantee that
> the task will run in its own thread so it can safely wait for other
> tasks.

This may not be an issue in the std.parallelism design.  A TaskPool task can safely wait on other tasks.  What prevents this from causing a deadlock is that calling yieldForce, spinForce, or waitForce on a task that has not started executing yet will execute the task immediately in the thread that tried to force it, regardless of where it is in the queue.

However, those threads that process OOB tasks are also cached in
> the thread pool and reused for new OOB tasks. Only if the number of
> parallel OOB tasks goes over a specific number, new threads will be
> created and destroyed. This can safe quite a bit of time for those tasks.

Unfortunately this is not implementable without a massive overhaul of TaskPool.  There are some baked in assumptions that the number of worker threads in a pool will not change after the pool's creation. Furthermore, I'm not sure how worker-local storage would be efficiently implementable without this restriction.

>
> Both kinds of priority have been very useful and I would suggest to put
> at least the executeInNewThread() method into ThreadPool to be later
> able to make such an optimization.

Can you elaborate on this?  The whole point of executeInNewThread() was supposed to be that a TaskPool is not needed for simple cases.
« First   ‹ Prev
1 2 3 4 5