Parallel processing and further use of output

I've run into an issue, which I guess could be resolved easily, if I knew how... [CODE] ulong i = 0; foreach (f; parallel(iota(1, 1000000+1))) { i += f; } thread_joinAll(); i.writeln; [/CODE] It's basically an example which adds all the numbers from 1 to 1000000 and should therefore give 500000500000. Running the above code gives 205579930677, leaving out "thread_joinAll()" the output is 210161213519. I suspect there's some sort of data race. Any hint how to get this straight?

On Saturday, 26 September 2015 at 12:18:16 UTC, Zoidberg wrote: > I've run into an issue, which I guess could be resolved easily, if I knew how... > > [CODE] > ulong i = 0; > foreach (f; parallel(iota(1, 1000000+1))) > { > i += f; > } > thread_joinAll(); > i.writeln; > [/CODE] > > It's basically an example which adds all the numbers from 1 to 1000000 and should therefore give 500000500000. Running the above code gives 205579930677, leaving out "thread_joinAll()" the output is 210161213519. > > I suspect there's some sort of data race. Any hint how to get this straight? Here's a correct version: import std.parallelism, std.range, std.stdio, core.atomic; void main() { shared ulong i = 0; foreach (f; parallel(iota(1, 1000000+1))) { i.atomicOp!"+="(f); } i.writeln; }

> Here's a correct version: > > import std.parallelism, std.range, std.stdio, core.atomic; > void main() > { > shared ulong i = 0; > foreach (f; parallel(iota(1, 1000000+1))) > { > i.atomicOp!"+="(f); > } > i.writeln; > } Thanks! Works fine. So "shared" and "atomic" is a must?

September 26, 2015

Re: Parallel processing and further use of output

Posted by anonymous
in reply to Zoidberg

Permalink

anonymous

Posted in reply to Zoidberg

Permalink

On Saturday 26 September 2015 14:18, Zoidberg wrote:

> I've run into an issue, which I guess could be resolved easily, if I knew how...
> 
> [CODE]
>      ulong i = 0;
>      foreach (f; parallel(iota(1, 1000000+1)))
>      {
>          i += f;
>      }
>      thread_joinAll();
>      i.writeln;
> [/CODE]
> 
> It's basically an example which adds all the numbers from 1 to 1000000 and should therefore give 500000500000. Running the above code gives 205579930677, leaving out "thread_joinAll()" the output is 210161213519.
> 
> I suspect there's some sort of data race. Any hint how to get this straight?

Definitely a race, yeah. You need to prevent two += operations happening concurrently.

You can use core.atomic.atomicOp!"+=" instead of plain +=:
----
    shared ulong i = 0;
    foreach (f; parallel(iota(1, 1000000+1)))
    {
        import core.atomic: atomicOp;
        i.atomicOp!"+="(f);
    }
----
i is shared because atomicOp requires a shared variable. I'm not sure what the implications of that are, if any.

Alternatively, you could use `synchronized`:
----
    ulong i = 0;
    foreach (f; parallel(iota(1, 1000000+1)))
    {
        synchronized i += f;
    }
----
I'm pretty sure atomicOp is faster, though.

On Saturday, 26 September 2015 at 12:33:45 UTC, anonymous wrote: > foreach (f; parallel(iota(1, 1000000+1))) > { > synchronized i += f; > } Is this valid syntax? I've never seen synchronized used like this before.

On Saturday, 26 September 2015 at 13:09:54 UTC, Meta wrote: > On Saturday, 26 September 2015 at 12:33:45 UTC, anonymous wrote: >> foreach (f; parallel(iota(1, 1000000+1))) >> { >> synchronized i += f; >> } > > Is this valid syntax? I've never seen synchronized used like this before. Atomic worked perfectly and reasonably fast. "Synchronized" may work as well, but I had to abort the execution prior to finishing because it seemed horribly slow.

std.parallelism.reduce documentation provides an example of a parallel sum. This works: auto sum3 = taskPool.reduce!"a + b"(iota(1.0,1000001.0)); This results in a compile error: auto sum3 = taskPool.reduce!"a + b"(iota(1UL,1000001UL)); I believe there was discussion of this problem recently ...

On Saturday, 26 September 2015 at 13:09:54 UTC, Meta wrote: > On Saturday, 26 September 2015 at 12:33:45 UTC, anonymous wrote: >> foreach (f; parallel(iota(1, 1000000+1))) >> { >> synchronized i += f; >> } > > Is this valid syntax? I've never seen synchronized used like this before. I'm sure it's valid. A mutex is created for that instance of synchronized. I.e., only one thread can execute that piece of code at a time. If you're missing the braces, they're optional for single statements, as usual. http://dlang.org/statement.html#SynchronizedStatement

btw, on my corei5, in debug build, reduce (using double): 11msec non_parallel: 37msec parallel with atomicOp: 123msec so, that is the reason for using parallel reduce, assuming the ulong range thing will get fixed.

This is a work-around to get a ulong result without having the ulong as the range variable. ulong getTerm(int i) { return i; } auto sum4 = taskPool.reduce!"a + b"(std.algorithm.map!getTerm(iota(1000000001)));

Forums