Thread overview | |||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
September 26, 2015 Parallel processing and further use of output | ||||
---|---|---|---|---|
| ||||
I've run into an issue, which I guess could be resolved easily, if I knew how... [CODE] ulong i = 0; foreach (f; parallel(iota(1, 1000000+1))) { i += f; } thread_joinAll(); i.writeln; [/CODE] It's basically an example which adds all the numbers from 1 to 1000000 and should therefore give 500000500000. Running the above code gives 205579930677, leaving out "thread_joinAll()" the output is 210161213519. I suspect there's some sort of data race. Any hint how to get this straight? |
September 26, 2015 Re: Parallel processing and further use of output | ||||
---|---|---|---|---|
| ||||
Posted in reply to Zoidberg | On Saturday, 26 September 2015 at 12:18:16 UTC, Zoidberg wrote:
> I've run into an issue, which I guess could be resolved easily, if I knew how...
>
> [CODE]
> ulong i = 0;
> foreach (f; parallel(iota(1, 1000000+1)))
> {
> i += f;
> }
> thread_joinAll();
> i.writeln;
> [/CODE]
>
> It's basically an example which adds all the numbers from 1 to 1000000 and should therefore give 500000500000. Running the above code gives 205579930677, leaving out "thread_joinAll()" the output is 210161213519.
>
> I suspect there's some sort of data race. Any hint how to get this straight?
Here's a correct version:
import std.parallelism, std.range, std.stdio, core.atomic;
void main()
{
shared ulong i = 0;
foreach (f; parallel(iota(1, 1000000+1)))
{
i.atomicOp!"+="(f);
}
i.writeln;
}
|
September 26, 2015 Re: Parallel processing and further use of output | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | > Here's a correct version:
>
> import std.parallelism, std.range, std.stdio, core.atomic;
> void main()
> {
> shared ulong i = 0;
> foreach (f; parallel(iota(1, 1000000+1)))
> {
> i.atomicOp!"+="(f);
> }
> i.writeln;
> }
Thanks! Works fine. So "shared" and "atomic" is a must?
|
September 26, 2015 Re: Parallel processing and further use of output | ||||
---|---|---|---|---|
| ||||
Posted in reply to Zoidberg | On Saturday 26 September 2015 14:18, Zoidberg wrote:
> I've run into an issue, which I guess could be resolved easily, if I knew how...
>
> [CODE]
> ulong i = 0;
> foreach (f; parallel(iota(1, 1000000+1)))
> {
> i += f;
> }
> thread_joinAll();
> i.writeln;
> [/CODE]
>
> It's basically an example which adds all the numbers from 1 to 1000000 and should therefore give 500000500000. Running the above code gives 205579930677, leaving out "thread_joinAll()" the output is 210161213519.
>
> I suspect there's some sort of data race. Any hint how to get this straight?
Definitely a race, yeah. You need to prevent two += operations happening concurrently.
You can use core.atomic.atomicOp!"+=" instead of plain +=:
----
shared ulong i = 0;
foreach (f; parallel(iota(1, 1000000+1)))
{
import core.atomic: atomicOp;
i.atomicOp!"+="(f);
}
----
i is shared because atomicOp requires a shared variable. I'm not sure what the implications of that are, if any.
Alternatively, you could use `synchronized`:
----
ulong i = 0;
foreach (f; parallel(iota(1, 1000000+1)))
{
synchronized i += f;
}
----
I'm pretty sure atomicOp is faster, though.
|
September 26, 2015 Re: Parallel processing and further use of output | ||||
---|---|---|---|---|
| ||||
Posted in reply to anonymous | On Saturday, 26 September 2015 at 12:33:45 UTC, anonymous wrote:
> foreach (f; parallel(iota(1, 1000000+1)))
> {
> synchronized i += f;
> }
Is this valid syntax? I've never seen synchronized used like this before.
|
September 26, 2015 Re: Parallel processing and further use of output | ||||
---|---|---|---|---|
| ||||
Posted in reply to Meta | On Saturday, 26 September 2015 at 13:09:54 UTC, Meta wrote:
> On Saturday, 26 September 2015 at 12:33:45 UTC, anonymous wrote:
>> foreach (f; parallel(iota(1, 1000000+1)))
>> {
>> synchronized i += f;
>> }
>
> Is this valid syntax? I've never seen synchronized used like this before.
Atomic worked perfectly and reasonably fast. "Synchronized" may work as well, but I had to abort the execution prior to finishing because it seemed horribly slow.
|
September 26, 2015 Re: Parallel processing and further use of output | ||||
---|---|---|---|---|
| ||||
Posted in reply to Zoidberg | std.parallelism.reduce documentation provides an example of a parallel sum. This works: auto sum3 = taskPool.reduce!"a + b"(iota(1.0,1000001.0)); This results in a compile error: auto sum3 = taskPool.reduce!"a + b"(iota(1UL,1000001UL)); I believe there was discussion of this problem recently ... |
September 26, 2015 Re: Parallel processing and further use of output | ||||
---|---|---|---|---|
| ||||
Posted in reply to Meta | On Saturday, 26 September 2015 at 13:09:54 UTC, Meta wrote: > On Saturday, 26 September 2015 at 12:33:45 UTC, anonymous wrote: >> foreach (f; parallel(iota(1, 1000000+1))) >> { >> synchronized i += f; >> } > > Is this valid syntax? I've never seen synchronized used like this before. I'm sure it's valid. A mutex is created for that instance of synchronized. I.e., only one thread can execute that piece of code at a time. If you're missing the braces, they're optional for single statements, as usual. http://dlang.org/statement.html#SynchronizedStatement |
September 26, 2015 Re: Parallel processing and further use of output | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jay Norwood | btw, on my corei5, in debug build, reduce (using double): 11msec non_parallel: 37msec parallel with atomicOp: 123msec so, that is the reason for using parallel reduce, assuming the ulong range thing will get fixed. |
September 26, 2015 Re: Parallel processing and further use of output | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jay Norwood | This is a work-around to get a ulong result without having the ulong as the range variable. ulong getTerm(int i) { return i; } auto sum4 = taskPool.reduce!"a + b"(std.algorithm.map!getTerm(iota(1000000001))); |
Copyright © 1999-2021 by the D Language Foundation