Thread overview
Parallel For
Jun 15, 2021
seany
Jun 15, 2021
seany
Jun 15, 2021
jfondren
Jun 15, 2021
seany
Jun 15, 2021
Ali Çehreli
Jun 15, 2021
seany
Jun 16, 2021
z
Jun 16, 2021
jfondren
June 15, 2021

I know that c# has parallel for like this .

I know that D has parallel foreach like this.

I want to do the following :

Given 4 sets , A = {a_1, a_2, ... }; B = {b_1, b_2, ... } ; C = {c_1 , ... } ; D = {d_1, ... } - I need to make all Cartesian products, such as {a_1, b_1, c_1, d_1 }; {a_1, b_1, c_1, d_2}; ... {a_n, b_n, c_n, d_n} ; then run an operation on the elements .

Then, I need to extract the maximum and minimum values of the results - either by storing them in an array, or by some other suitable means.

My attempt to solve can be seen in this minimal example :

    import std.stdio;
    import std.math;
    import std.stdio;
    import std.conv;
    import std.format;
    import std.math;
    import std.algorithm;
    import std.net.curl;
    import std.json;
    //import dlib.image;
    import std.path;
    import std.array;
    import std.net.curl;
    import core.stdc.stdlib;
    import std.datetime;
    import std.file;
    import opmix.dup;
    import std.parallelism;


    void main() {

            int[] a = [1,2,3,4,5,6,7,8,9];
            int[] b = [11,12,13,14,15,16,17,18];

            int[] c ;
            foreach(aa; parallel (a)) {
                    foreach (bb; parallel(b)) {

                            c ~= aa+bb;

                    }

            }
            writeln(c);

    }

I expect the output : [ 12, 13, 14 ....27 ] ( not necessarily in this order - but that is okey).

I am getting the output : [12, 13, 14, 15, 16, 17, 18, 19, 14, 15, 16, 17, 18, 19, 20, 21, 15, 16, 17, 18, 19, 20, 21, 22, 0, 0, 0, 0, 0, 0, 0, 17, 18, 19, 20, 21, 22, 23, 24, 19, 20, 21, 22, 18, 0, 0, 0, 16, 21, 18, 22, 23, 24, 25, 20, 21, 22, 19].

In the next run : [12, 13, 14, 15, 16, 17, 18, 19, 14, 15, 16, 17, 18, 19, 20, 21, 15, 16, 17, 18, 19, 20, 21, 22, 16, 17, 18, 19, 20, 21, 22, 23, 18, 19, 20, 21, 22, 23, 24, 25, 17, 18, 19, 19, 21, 20, 23, 23, 24, 24, 20, 21, 22, 23, 24, 25, 26, 27]

In the next run : [12, 13, 14, 15, 16, 17, 18, 19, 14, 15, 16, 17, 0, 0, 16, 18, 19, 20, 21, 17, 18, 19, 20, 15, 16, 17, 18, 19, 20, 21, 22, 18, 19, 20, 21, 22, 23, 24, 25, 19, 20, 21, 22, 16, 17, 20, 21, 22, 23, 24, 21, 22, 27, 23]

In the 4th run : [12, 13, 14, 15, 16, 17, 18, 19, 15, 16, 17, 18, 19, 20, 21, 22, 14, 13, 14, 15, 16, 17, 18, 19, 20, 20, 21, 22, 23, 24, 25, 26, 27, 18, 20, 22, 23, 24, 25, 20, 21, 22, 23, 24]

What am I doing wrong?

Thank you .

June 15, 2021

On Tuesday, 15 June 2021 at 06:39:24 UTC, seany wrote:

>

I know that c# has parallel for like this .

[...]

PS :

I need the entire include list - while they are not necessary for this minimal example - they are needed for the ful project. That is why I kept them here.

DMD DMD64 D Compiler v2.096.1

June 15, 2021

On Tuesday, 15 June 2021 at 06:39:24 UTC, seany wrote:

>

What am I doing wrong?

add a writeln(c.length); in your inner loop and consider
the output. If you were always pushing to the end of c, then
only unique numbers should be output. But I see e.g. six
occurrences of 0, four of 8 ...

Here's a non-solution:

import std;
import core.sync.mutex;

void main() {
    int[] a = [1, 2, 3, 4, 5, 6, 7, 8, 9];
    int[] b = [11, 12, 13, 14, 15, 16, 17, 18];
    int[] c;
    shared Mutex mtx = new shared Mutex();
    foreach (aa; parallel(a)) {
        foreach (bb; parallel(b)) {
            mtx.lock_nothrow();
            c ~= aa + bb;
            mtx.unlock_nothrow();
        }
    }
    writeln(c);
}

That solves the inconsistent access to c, but the parallelism
is probably pointless.

Real solutions might include

  1. c[i] = aa + bb;, with a calculated unique i per assignment.

  2. std.concurrency and message passing to build c?

  3. using core.atomic for i?

June 15, 2021

On Tuesday, 15 June 2021 at 07:41:06 UTC, jfondren wrote:

>

On Tuesday, 15 June 2021 at 06:39:24 UTC, seany wrote:

>

[...]

add a writeln(c.length); in your inner loop and consider
the output. If you were always pushing to the end of c, then
only unique numbers should be output. But I see e.g. six
occurrences of 0, four of 8 ...

[...]

My attempt is to make such huge O(N^4) or O(N^5) algorithms faster.

Wouldn't all these calculation of unique i make the program slow? I would like to know how to do this properly.

June 15, 2021
On 6/14/21 11:39 PM, seany wrote:

> I know that D has parallel foreach [like
> this](http://ddili.org/ders/d.en/parallelism.html).

I gave an example of it in my DConf Online 2020 presentation as well:

  https://www.youtube.com/watch?v=dRORNQIB2wA&t=1324s

>                  int[] c ;
>                  foreach(aa; parallel (a)) {
>                          foreach (bb; parallel(b)) {
>
>                                  c ~= aa+bb;

That is violating a parallelism requirement that loop bodies must be independent. (I use a similar example during my presentation above.) You need to either pre-allocate the array (as jfondren said) or not store the elements at all but use them independently in the loop body.

Yes, std.concurrency is another option. I show a "recipe" of usage here:

  https://www.youtube.com/watch?v=dRORNQIB2wA&t=1737s

Ali

June 15, 2021
On Tuesday, 15 June 2021 at 09:09:29 UTC, Ali Çehreli wrote:
> On 6/14/21 11:39 PM, seany wrote:
>
> > [...]
>
> I gave an example of it in my DConf Online 2020 presentation as well:
>
>   https://www.youtube.com/watch?v=dRORNQIB2wA&t=1324s
>
> >                                  [...]
>
> That is violating a parallelism requirement that loop bodies must be independent. (I use a similar example during my presentation above.) You need to either pre-allocate the array (as jfondren said) or not store the elements at all but use them independently in the loop body.
>
> Yes, std.concurrency is another option. I show a "recipe" of usage here:
>
>   https://www.youtube.com/watch?v=dRORNQIB2wA&t=1737s
>
> Ali

Ali Chehreli is an angel, Thank you.
June 16, 2021

On Tuesday, 15 June 2021 at 06:39:24 UTC, seany wrote:

>

...

This is the best I could do: https://run.dlang.io/is/dm8LBP
For some reason, LDC refuses to vectorize or even just unroll the nonparallel version, and more than one parallel corrupts the results.
But judging by the results you expected and what you described, you could maybe replace it by a ton of c[] = a[] *operand* b[] operations?
Unless you use conditionals after or do something else that confuses the compiler, it will maybe use SSE/AVX instructions, and at worst use basic loop unrolling.

June 16, 2021

On Wednesday, 16 June 2021 at 06:29:21 UTC, z wrote:

>

On Tuesday, 15 June 2021 at 06:39:24 UTC, seany wrote:

>

...

This is the best I could do: https://run.dlang.io/is/dm8LBP
For some reason, LDC refuses to vectorize or even just unroll the nonparallel version, and more than one parallel corrupts the results.

The same trick as before is useful here: insert a write(i+ii);
before the assignment and see if the output looks reasonable.

Instead of only unique indices, there are many repeats, as of
course (i=0)+(ii=1) == (i=1)+(ii=0)

import std;

void main() {
    int[] a = [1, 2, 3, 4, 5, 6, 7, 8, 9];
    int[] b = [11, 12, 13, 14, 15, 16, 17, 18];
    int[] c = new int[a.length * b.length];
    foreach (i, aa; parallel(a)) {
        foreach (ii, bb; parallel(b)) {
            c[i * b.length + ii] = aa + bb;
        }
    }
    writeln(c);
}