Jump to page: 1 2
Thread overview
[phobos] std.parallelism: Request for review/comment
Aug 29, 2010
David Simcha
Aug 29, 2010
David Simcha
Aug 31, 2010
David Simcha
Aug 31, 2010
Sean Kelly
Aug 31, 2010
David Simcha
Aug 31, 2010
Sean Kelly
Aug 31, 2010
Brad Roberts
Aug 31, 2010
Sean Kelly
Sep 01, 2010
David Simcha
Aug 31, 2010
David Simcha
Sep 01, 2010
David Simcha
Sep 01, 2010
David Simcha
Sep 02, 2010
David Simcha
Aug 31, 2010
Masahiro Nakagawa
August 29, 2010
  Since there seems to be interest in my parallelfuture library becoming
std.parallelism even if it uses unchecked implicit sharing, I've cleaned
up the code, removed anything that could possibly be construed as having
been borrowed from Tango (which previously was just a few tiny,
unoriginal ASM snippets) and improved the documentation.  I'm requesting
review of it.  The code is available at:

http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/std_parallelism.d

A draft of how the documentation would look is available at:

http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html

BTW, how are we working the review process for new modules?  Do they get reviewed here first, then in the NG, should they be posted directly to the NG, or what?

--Dave
August 29, 2010
I have tried it before, and based on that (unless you have made substantial changes since it's first incarnation as 'parallelfuture') I say it's good to go.

I only have two suggestions:

 1. Can we call it std.parallel instead?

 2. IMO, the killer feature of the module is the parallel foreach.
Therefore, I think parallel() should be moved to the top of the module,
so it's the first thing one sees upon reading the documentation.

-Lars


On Sun, 2010-08-29 at 13:56 -0400, David Simcha wrote:
> Since there seems to be interest in my parallelfuture library becoming std.parallelism even if it uses unchecked implicit sharing, I've cleaned up the code, removed anything that could possibly be construed as having been borrowed from Tango (which previously was just a few tiny, unoriginal ASM snippets) and improved the documentation.  I'm requesting review of it.  The code is available at:
> 
> http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/std_parallelism.d
> 
> A draft of how the documentation would look is available at:
> 
> http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html
> 
> BTW, how are we working the review process for new modules?  Do they get reviewed here first, then in the NG, should they be posted directly to the NG, or what?
> 
> --Dave
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos


August 29, 2010
  This is basically the latest incarnation of ParallelFuture, but with
improved documentation.  A lot has changed, though, since the first
incarnation.  Here are some of the more important changes:

1.  Parallel map and reduce.
2.  Merge Task and Future.
3.  foreach(i, elem; pool.parallel(range)) now works.
4.  ref foreach now works even for non-random access ranges.  (It's
implemented by incrementally copying pointers into an array.)
5.  Worker-local storage.

On 8/29/2010 4:42 PM, Lars Tandle Kyllingstad wrote:
> I have tried it before, and based on that (unless you have made substantial changes since it's first incarnation as 'parallelfuture') I say it's good to go.
>
> I only have two suggestions:
>
>   1. Can we call it std.parallel instead?
>
>   2. IMO, the killer feature of the module is the parallel foreach.
> Therefore, I think parallel() should be moved to the top of the module,
> so it's the first thing one sees upon reading the documentation.
>
> -Lars
>
>
> On Sun, 2010-08-29 at 13:56 -0400, David Simcha wrote:
>> Since there seems to be interest in my parallelfuture library becoming std.parallelism even if it uses unchecked implicit sharing, I've cleaned up the code, removed anything that could possibly be construed as having been borrowed from Tango (which previously was just a few tiny, unoriginal ASM snippets) and improved the documentation.  I'm requesting review of it.  The code is available at:
>>
>> http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/std_parallelism.d
>>
>> A draft of how the documentation would look is available at:
>>
>> http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html
>>
>> BTW, how are we working the review process for new modules?  Do they get reviewed here first, then in the NG, should they be posted directly to the NG, or what?
>>
>> --Dave
>> _______________________________________________
>> phobos mailing list
>> phobos at puremagic.com
>> http://lists.puremagic.com/mailman/listinfo/phobos
>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>

August 31, 2010
Point (3) is pretty cool.  I just used your module for my current project at work, and the ability to get the index made the code a lot nicer.

Another question:  Why have you chosen the default number of work units to be just two units per thread?  In my experience, it's not uncommon that calculations are harder on some parts of the range than others, and then there is a risk of some cores running out of work to do.  I'd think that having more work units, 3-4 per thread, say, would allow for better distribution of work between cores.

-Lars


On Sun, 2010-08-29 at 17:32 -0400, David Simcha wrote:
> This is basically the latest incarnation of ParallelFuture, but with improved documentation.  A lot has changed, though, since the first incarnation.  Here are some of the more important changes:
> 
> 1.  Parallel map and reduce.
> 2.  Merge Task and Future.
> 3.  foreach(i, elem; pool.parallel(range)) now works.
> 4.  ref foreach now works even for non-random access ranges.  (It's
> implemented by incrementally copying pointers into an array.)
> 5.  Worker-local storage.
> 
> On 8/29/2010 4:42 PM, Lars Tandle Kyllingstad wrote:
> > I have tried it before, and based on that (unless you have made substantial changes since it's first incarnation as 'parallelfuture') I say it's good to go.
> >
> > I only have two suggestions:
> >
> >   1. Can we call it std.parallel instead?
> >
> >   2. IMO, the killer feature of the module is the parallel foreach.
> > Therefore, I think parallel() should be moved to the top of the module,
> > so it's the first thing one sees upon reading the documentation.
> >
> > -Lars
> >
> >
> > On Sun, 2010-08-29 at 13:56 -0400, David Simcha wrote:
> >> Since there seems to be interest in my parallelfuture library becoming std.parallelism even if it uses unchecked implicit sharing, I've cleaned up the code, removed anything that could possibly be construed as having been borrowed from Tango (which previously was just a few tiny, unoriginal ASM snippets) and improved the documentation.  I'm requesting review of it.  The code is available at:
> >>
> >> http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/std_parallelism.d
> >>
> >> A draft of how the documentation would look is available at:
> >>
> >> http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html
> >>
> >> BTW, how are we working the review process for new modules?  Do they get reviewed here first, then in the NG, should they be posted directly to the NG, or what?
> >>
> >> --Dave
> >> _______________________________________________
> >> phobos mailing list
> >> phobos at puremagic.com
> >> http://lists.puremagic.com/mailman/listinfo/phobos
> >
> > _______________________________________________
> > phobos mailing list
> > phobos at puremagic.com
> > http://lists.puremagic.com/mailman/listinfo/phobos
> >
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos


August 31, 2010
  On 8/31/2010 6:22 AM, Lars Tandle Kyllingstad wrote:
> Point (3) is pretty cool.  I just used your module for my current project at work, and the ability to get the index made the code a lot nicer.
>
> Another question:  Why have you chosen the default number of work units to be just two units per thread?  In my experience, it's not uncommon that calculations are harder on some parts of the range than others, and then there is a risk of some cores running out of work to do.  I'd think that having more work units, 3-4 per thread, say, would allow for better distribution of work between cores.
>
> -Lars

Good point.  I should probably change this, as the more I think about it the more I realize that I never use the default for the reason you mention.  It seemed like a good idea in iteration 1, and then I just never reconsidered.
August 31, 2010
A few general comments:

I'm wondering whether the use of a TaskPool should always be explicit.  Perhaps a default pool should be considered to exist, and calling parallel() instead of pool.parallel() will use that?  It's rare for apps to have multiple distinct pools anyway, so this may be an easy way to help obscure a tiny bit more of the implementation.

Rather than making the use of 'break' inside parallel foreach undefined, why not say that it aborts the processing of the current block?

Ideally, the syntax for map() and reduce() should match std.algorithm.  Not sure what this means for blockSize though.  Maybe it should be a template parameter?


Regarding the code...

There are a lot of places where you do:

    lock();
    doSomething();
    unlock();

This isn't exception safe, which isn't a big deal since the enclosed code won't throw anyway, but it's still bad form.  Use this instead:

    synchronized (mutex) {
        doSomething();
    }

There's some black magic in the Mutex class that makes it work with the synchronized statement, so it will do exactly the same thing but be exception-safe as well.  You can even do:

    class C { this() { mutex = new Mutex(this); } }

to make the mutex the enclosing object's monitor so that "synchronized void func() {}" will lock the mutex.  This may turn out to be necessary when/if shared is applied to the code (which will require core.sync to work properly with shared, etc... something I'm going to take care of shortly).

When waiting on condition variables, there's a remote chance that wait() will return when there isn't actually something to process.  Instead of:

    if (!condition)
        wait();

use:

    while (!condition)
        wait();

I saw the "if" approach in "AbstractTask* pop()"... I think that's the only one.

In "ReturnType spinWait()" you throw "exception" if it exists but don't clear the variable.  Will this be a problem?

That's all I saw in a quick scan of the code.  I'll trust that there are no races and play with it in a bit.
August 31, 2010
On Tue, Aug 31, 2010 at 3:07 PM, Sean Kelly <sean at invisibleduck.org> wrote:

> A few general comments:
>
> I'm wondering whether the use of a TaskPool should always be explicit.
>  Perhaps a default pool should be considered to exist, and calling
> parallel() instead of pool.parallel() will use that?  It's rare for apps to
> have multiple distinct pools anyway, so this may be an easy way to help
> obscure a tiny bit more of the implementation.
>

I could create a *default* lazily initialized Singleton for the TaskPool (a few other people I've showed this code to have requested this, too, so maybe it's a trend), but I absolutely insist on being able to manually initialize it.  First of all, this gives the user the ability to manually decide how many threads to use.  Secondly, I've had use cases where all threads in a pool are blocked waiting for the same really expensive computation to complete, so I fire up a new pool to do that really expensive computation in parallel.


>
> Rather than making the use of 'break' inside parallel foreach undefined, why not say that it aborts the processing of the current block?
>

Eventually I'll define it to do something, but I kept it undefined for now because I don't want to pin down what it does yet.  Ideally I'd like it to not only stop the current block, but end the processing of the whole rest of the parallel foreach loop.  This is more difficult to implement, though.


>
> Ideally, the syntax for map() and reduce() should match std.algorithm.  Not
> sure what this means for blockSize though.  Maybe it should be a template
> parameter?
>

It does match std.algorithm as far as I can tell, as long as you use the default block size and allow a buffer to be automatically allocated.  If it doesn't, let me know, because I meant to make it match.


>
>
> Regarding the code...
>
> There are a lot of places where you do:
>
>    lock();
>    doSomething();
>    unlock();
>
> This isn't exception safe, which isn't a big deal since the enclosed code won't throw anyway, but it's still bad form.  Use this instead:
>
>    synchronized (mutex) {
>        doSomething();
>    }
>

I'm aware that that's not exception safe, but I did it anyhow because I knew that the enclosed code wouldn't throw.  The reason it's like this is because I was experimenting with different locking mechanisms, such as making everything spinlock-based, at one point.  Unless there are strong objections, I prefer to leave it like this so that I can easily experiment with different locking mechanisms in the future.

>
> There's some black magic in the Mutex class that makes it work with the synchronized statement, so it will do exactly the same thing but be exception-safe as well.  You can even do:
>
>    class C { this() { mutex = new Mutex(this); } }
>
> to make the mutex the enclosing object's monitor so that "synchronized void func() {}" will lock the mutex.  This may turn out to be necessary when/if shared is applied to the code (which will require core.sync to work properly with shared, etc... something I'm going to take care of shortly).
>
> When waiting on condition variables, there's a remote chance that wait() will return when there isn't actually something to process.  Instead of:
>
>    if (!condition)
>        wait();
>
> use:
>
>    while (!condition)
>        wait();
>
> I saw the "if" approach in "AbstractTask* pop()"... I think that's the only
> one.
>

Thanks, will do.  How does this happen, though?


>
> In "ReturnType spinWait()" you throw "exception" if it exists but don't clear the variable.  Will this be a problem?
>

I can't see why it would be a problem.  Also, if for some crazy reason the user handles the exception and then tries to get at the return value again, IMHO the correct behavior is to throw it again.  If there's something I'm missing, please let me know.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100831/ea59b442/attachment.html>
September 01, 2010
Previously, I wrote future module.
http://bitbucket.org/repeatedly/scrap/src/tip/future.d
This is not product level. This module for asynchronous calculation in my
test.

I think parallel features are very important (Currently, many core is
default).
In addition, some products support parallel-processing framework (MongoDB
provides MapReduce for batch manipulation).
So, I want such features in Phobos.
This is a good candidate if other modules doesn't exist.

At the moment, I am busy. Sorry, please wait the review.


Masahiro


On Mon, 30 Aug 2010 02:56:55 +0900, David Simcha <dsimcha at gmail.com> wrote:

>   Since there seems to be interest in my parallelfuture library becoming
> std.parallelism even if it uses unchecked implicit sharing, I've cleaned
> up the code, removed anything that could possibly be construed as having
> been borrowed from Tango (which previously was just a few tiny,
> unoriginal ASM snippets) and improved the documentation.  I'm requesting
> review of it.  The code is available at:
>
> http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/std_parallelism.d
>
> A draft of how the documentation would look is available at:
>
> http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html
>
> BTW, how are we working the review process for new modules?  Do they get reviewed here first, then in the NG, should they be posted directly to the NG, or what?
>
> --Dave
August 31, 2010
On Aug 31, 2010, at 12:41 PM, David Simcha wrote:
> 
> On Tue, Aug 31, 2010 at 3:07 PM, Sean Kelly <sean at invisibleduck.org> wrote: A few general comments:
> 
> I'm wondering whether the use of a TaskPool should always be explicit.  Perhaps a default pool should be considered to exist, and calling parallel() instead of pool.parallel() will use that?  It's rare for apps to have multiple distinct pools anyway, so this may be an easy way to help obscure a tiny bit more of the implementation.
> 
> I could create a default lazily initialized Singleton for the TaskPool (a few other people I've showed this code to have requested this, too, so maybe it's a trend), but I absolutely insist on being able to manually initialize it.  First of all, this gives the user the ability to manually decide how many threads to use.  Secondly, I've had use cases where all threads in a pool are blocked waiting for the same really expensive computation to complete, so I fire up a new pool to do that really expensive computation in parallel.

Right.  I've added some fancy behavior to the TaskPool itself so the thread count is a bit more dynamic, but that tends to overcomplicate the code.  Your approach is probably better.

> Rather than making the use of 'break' inside parallel foreach undefined, why not say that it aborts the processing of the current block?
> 
> 
> When waiting on condition variables, there's a remote chance that wait() will return when there isn't actually something to process.  Instead of:
> 
>    if (!condition)
>        wait();
> 
> use:
> 
>    while (!condition)
>        wait();
> 
> I saw the "if" approach in "AbstractTask* pop()"... I think that's the only one.
> 
> Thanks, will do.  How does this happen, though?

Theoretical mumbo-jumbo, mostly.  Here's the wiki entry: http://en.wikipedia.org/wiki/Spurious_wakeup  For the most part, it's just good programming practice to double-check the invariant on wakeup.

> 
> In "ReturnType spinWait()" you throw "exception" if it exists but don't clear the variable.  Will this be a problem?
> 
> I can't see why it would be a problem.  Also, if for some crazy reason the user handles the exception and then tries to get at the return value again, IMHO the correct behavior is to throw it again.  If there's something I'm missing, please let me know.

I hadn't spent enough time with the code to know what the correct behavior should be, so I asked.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100831/7a8786f0/attachment-0001.html>
August 31, 2010
On Tue, 31 Aug 2010, Sean Kelly wrote:

> On Aug 31, 2010, at 12:41 PM, David Simcha wrote:
> > 
> > When waiting on condition variables, there's a remote chance that wait() will return when there isn't actually something to process.  Instead of:
> > 
> >    if (!condition)
> >        wait();
> > 
> > use:
> > 
> >    while (!condition)
> >        wait();
> > 
> > I saw the "if" approach in "AbstractTask* pop()"... I think that's the only one.
> > 
> > Thanks, will do.  How does this happen, though?
> 
> Theoretical mumbo-jumbo, mostly.  Here's the wiki entry: http://en.wikipedia.org/wiki/Spurious_wakeup  For the most part, it's just good programming practice to double-check the invariant on wakeup.

Very much not theoretical.  I've seen it happen more than a little.  A lot of it can be blamed on using notify all rather than notify one type behavior.  The other ways it can happen is signals for things like gc pause/resume, or pipe, or ... breaking in-progress sleeps.

It's one of the class of issues that tends not to happen in small simple apps, so don't learn the risks until later in some large or complex app, wierd things happen.

Anyway, gotta do it, it does happen.

Later,
Brad
-------------- next part --------------
_______________________________________________
phobos mailing list
phobos at puremagic.com
http://lists.puremagic.com/mailman/listinfo/phobos
« First   ‹ Prev
1 2