Jump to page: 1 2
Thread overview
std.parallelism equivalents for posix fork and multi-machine processing
May 13, 2015
Laeeth Isharc
May 13, 2015
weaselcat
May 13, 2015
Laeeth Isharc
May 14, 2015
John Colvin
May 14, 2015
Laeeth Isharc
May 14, 2015
Daniel Murphy
May 14, 2015
Laeeth Isharc
May 14, 2015
Laeeth Isharc
May 15, 2015
Laeeth Isharc
May 13, 2015
Is there value to having equivalents to the std.parallelism approach that works with processes rather than threads, and makes it easy to manage tasks over multiple machines?

I took a look at std.parallelism and it's beyond what I can do for now.  But it seems like this might be a useful project, and not one of unmanageable difficulty...
May 13, 2015
On Wednesday, 13 May 2015 at 20:28:02 UTC, Laeeth Isharc wrote:
> Is there value to having equivalents to the std.parallelism approach that works with processes rather than threads, and makes it easy to manage tasks over multiple machines?

I'm not sure if you're asking because of this thread, but see

http://forum.dlang.org/thread/tczkndtepnvppggzmews@forum.dlang.org#post-tczkndtepnvppggzmews:40forum.dlang.org

python outperforming D because it doesn't have to deal with synchronization headaches. I found D to be way faster when reimplemented with fork, but having to use the stdc API is ugly(IMO)
May 13, 2015
On Wednesday, 13 May 2015 at 20:34:24 UTC, weaselcat wrote:
> On Wednesday, 13 May 2015 at 20:28:02 UTC, Laeeth Isharc wrote:
>> Is there value to having equivalents to the std.parallelism approach that works with processes rather than threads, and makes it easy to manage tasks over multiple machines?
>
> I'm not sure if you're asking because of this thread, but see
>
> http://forum.dlang.org/thread/tczkndtepnvppggzmews@forum.dlang.org#post-tczkndtepnvppggzmews:40forum.dlang.org
>
> python outperforming D because it doesn't have to deal with synchronization headaches. I found D to be way faster when reimplemented with fork, but having to use the stdc API is ugly(IMO)

yes - that is what spurred me to post,but it had been on my mind for a while (especially the multi-machine stuff).
May 14, 2015
"Laeeth Isharc"  wrote in message news:ejbhesbstgazkxnpvqsl@forum.dlang.org...

> Is there value to having equivalents to the std.parallelism approach that works with processes rather than threads, and makes it easy to manage tasks over multiple machines?
>
> I took a look at std.parallelism and it's beyond what I can do for now. But it seems like this might be a useful project, and not one of unmanageable difficulty...

Yes, there is enormous value.  It's just waiting for someone to do it. 

May 14, 2015
On Wednesday, 13 May 2015 at 20:34:24 UTC, weaselcat wrote:
> On Wednesday, 13 May 2015 at 20:28:02 UTC, Laeeth Isharc wrote:
>> Is there value to having equivalents to the std.parallelism approach that works with processes rather than threads, and makes it easy to manage tasks over multiple machines?
>
> I'm not sure if you're asking because of this thread, but see
>
> http://forum.dlang.org/thread/tczkndtepnvppggzmews@forum.dlang.org#post-tczkndtepnvppggzmews:40forum.dlang.org
>
> python outperforming D because it doesn't have to deal with synchronization headaches. I found D to be way faster when reimplemented with fork, but having to use the stdc API is ugly(IMO)

It was also easy to get D very fast by just being a little more eager with IO and reducing the enormous number of little allocations being made.
May 14, 2015
On Thursday, 14 May 2015 at 16:33:46 UTC, John Colvin wrote:
> On Wednesday, 13 May 2015 at 20:34:24 UTC, weaselcat wrote:
>> On Wednesday, 13 May 2015 at 20:28:02 UTC, Laeeth Isharc wrote:
>>> Is there value to having equivalents to the std.parallelism approach that works with processes rather than threads, and makes it easy to manage tasks over multiple machines?
>>
>> I'm not sure if you're asking because of this thread, but see
>>
>> http://forum.dlang.org/thread/tczkndtepnvppggzmews@forum.dlang.org#post-tczkndtepnvppggzmews:40forum.dlang.org
>>
>> python outperforming D because it doesn't have to deal with synchronization headaches. I found D to be way faster when reimplemented with fork, but having to use the stdc API is ugly(IMO)
>
> It was also easy to get D very fast by just being a little more eager with IO and reducing the enormous number of little allocations being made.

Yes - thank you for your highly educational rewrite, which I personally very much appreciate your taking the trouble to do.  Perhaps this should be turned (by you or someone else) into a mini case-study on the wiki of how to write idiomatic and efficient D code.  Or maybe just put up the slides from your forthcoming talk (which I look forward to watching later when it is up).

It's good to know D can in fact deliver on the implicit promise in a real use case with not too much work.  (Yes, naively written code was a bit slow when dealing with millions of lines, but in which language of comparable flexibility would that not be true).  It's also interesting that your code was idiomatic.  (I was reading up about Scala, which seems beautiful in many ways, but it is terribly disturbing to see that the idiomatic way often seems to be the most inefficient, at least as things stood a couple of years ago).

But, even so, I think having a wrapper for fork and an API for multiprocessing (which you could then hook up to eg the Digital Ocean, AWS apis etc) would be rather helpful.

I spoke with a friend of mine at one of the most admired/hated Wall Street firms.  One of the smartest quants I know who has now moved to portfolio management.  He was doing a study on tick data going back to 2000.  I asked him how long it took to run on his firm's infrastructure.  An hour!  And the operations were pretty simple.  I think it should only take a couple of minutes.  And it would be nice to show an example of - from a spreadsheet - spinning up 100 digital ocean instances - and running the numbers not just on one security, but every relevant security, and having a nice summary appear back in the sheet within a couple of minutes.

The reason speed matters is that long waits interfere with rapid iteration and the creative thought process.  In a market environment you may well have forgotten what you wanted after an hour...


Laeeth.
May 14, 2015
On Thursday, 14 May 2015 at 10:15:48 UTC, Daniel Murphy wrote:
> "Laeeth Isharc"  wrote in message news:ejbhesbstgazkxnpvqsl@forum.dlang.org...
>
>> Is there value to having equivalents to the std.parallelism approach that works with processes rather than threads, and makes it easy to manage tasks over multiple machines?
>>
>> I took a look at std.parallelism and it's beyond what I can do for now. But it seems like this might be a useful project, and not one of unmanageable difficulty...
>
> Yes, there is enormous value.  It's just waiting for someone to do it.

To start the process off (because small beginnings are better than no beginning): what are the key features of processes vs threads one would need to bear in mind when designing such a thing?  Because I spent the past couple of decades in a different field, multiprocessing passed me by, so I am only now slowly catching up.
May 14, 2015
On Thursday, 14 May 2015 at 20:06:55 UTC, Laeeth Isharc wrote:
> To start the process off (because small beginnings are better than no beginning): what are the key features of processes vs threads one would need to bear in mind when designing such a thing?  Because I spent the past couple of decades in a different field, multiprocessing passed me by, so I am only now slowly catching up.

"nobody" understands multiprocessing. Or rather… you need to understand the hardware and the concrete problem space first. There are no general solutions.
May 14, 2015
On Thursday, 14 May 2015 at 20:15:38 UTC, Ola Fosheim Grøstad wrote:
> On Thursday, 14 May 2015 at 20:06:55 UTC, Laeeth Isharc wrote:
>> To start the process off (because small beginnings are better than no beginning): what are the key features of processes vs threads one would need to bear in mind when designing such a thing?  Because I spent the past couple of decades in a different field, multiprocessing passed me by, so I am only now slowly catching up.
>
> "nobody" understands multiprocessing. Or rather… you need to understand the hardware and the concrete problem space first. There are no general solutions.

Yes, I certainly understand that it is a highly specialist and complex area where the best minds in the world have not yet the answers.  So if one were addressing the problem from a computer science academic perspective, then perhaps one will arrive at a different answer.

My own is a pragmatic commercial one.  I have some problems which perhaps scale quite well, and rather than write it using fork directly, I would rather have a higher level wrapper along the lines of std.parallelism.  Perhaps such would be flawed and limited, but often something is better than nothing, even if not perfect.  And I mention it on the forum only because usually I have found the problems I face turn out to be those faced by many others too..

If you have any thoughts on what should be considered, I would very much appreciate them.  (And I owe you a response on our last suspended discussion, but haven't had time of late).


Laeeth.
May 14, 2015
On Thursday, 14 May 2015 at 20:28:20 UTC, Laeeth Isharc wrote:
> My own is a pragmatic commercial one.  I have some problems which perhaps scale quite well, and rather than write it using fork directly, I would rather have a higher level wrapper along the lines of std.parallelism.

Languages like Chapel and extended versions of C++ have built in support for parallel computing that is relatively effortless and designed by experts (Cray/IBM etc) to cover common patterns in demanding batch processing for those who wants something higher level than plain C++ (or in this case D which is pretty much the same thing).

However, you could consider combining single threaded processes in D with e.g. Python as a supervising process if the datasets allow it. You'll find lots of literature on Inter Process Communication (IPC) for Unix. Performance will be lower, but your own productivity might be higher, YMMV.

> Perhaps such would be flawed and limited, but often something is better than nothing, even if not perfect.  And I mention it on the forum only because usually I have found the problems I face turn out to be those faced by many others too..

You need momentum in order to get from a raw state to something polished, so you essentially need a larger community that both have experience with the topic and a need for it in order to get a sensible framework that is maintained.

If you can get away with it, the most common simplistic approach seems to be map-reduce. Because it is easy to distribute over many machines and there are frameworks that do the tedious bits for you.

> If you have any thoughts on what should be considered, I would very much appreciate them.  (And I owe you a response on our last suspended discussion, but haven't had time of late).

Nah, you owe me nothing ;-). And I also have no time atm. ;-)

Ola.
« First   ‹ Prev
1 2