Thread overview
Recommendation for parallelism with nested for loops?
Aug 19, 2022
Shriramana Sharma
Aug 19, 2022
Adam D Ruppe
Aug 19, 2022
bachmeier
Aug 19, 2022
Ali Çehreli
Aug 20, 2022
Christian Köstlin
Aug 20, 2022
Christian Köstlin
August 19, 2022

Hello. I want to parallelize a computation which has two for loops, one nested within another. All inner-loop-param+outer-loop-param combinations can be computed independent of one another.

As I suspected, [https://forum.dlang.org/post/xysyidbkjdinclmrxzgt@forum.dlang.org](this forum post) says that only one loop can be parallelized. Will it be an error or inefficient or useless if I try to do both?

Also, what is the best way to do parallelism in such a situation?

August 19, 2022
On Friday, 19 August 2022 at 01:49:43 UTC, Shriramana Sharma wrote:
> Also, what is the best way to do parallelism in such a situation?

If the inner loops are about the same size for each pass of the outer loop, you can just simply parallel on the outer loop and get the same benefit.

Even if they aren't equal, you'll get decent benefit from parallel on the outer one alone, but not as good since the work won't be balanced.

Still, that's what I'd try.
August 18, 2022
On 8/18/22 18:49, Shriramana Sharma wrote:
> Hello. I want to parallelize a computation which has two for loops

An option is to add tasks individually but I am not sure how wise doing this and I don't know how to determine whether all tasks are completed.

In any case, Roy Margalit's DConf 2022 presentation is very much on topic. :)

  http://dconf.org/2022/index.html#roym

And the following program cannot show any benefit because the tasks are so short.

import std.stdio;
import std.parallelism;
import std.conv;

enum I = 1_000;
enum J = 1_000;

void main() {
  auto results = new int[I * J];

  // In case you want a new TaskPool:
  // auto tp = new TaskPool(totalCPUs);
  // (And use tp. below instead of taskPool.)

  foreach (i; 0 .. I) {
    foreach (j; 0 .. J) {
      taskPool.put(task!foo(i, j, results));
    }
  }

  // WARNING: I am not sure whether one can trust the results are
  //          ready yet. (?)
  //
  // parallel() does call yieldForce() on each task but we don't seem
  // to have that option for tasks that are .put() into the pool. (?)

  enum toPrint = 10;
  writeln(results[0..toPrint]);
  writeln("[...]");
  writeln(results[$-toPrint..$]);
}

void foo(size_t i, size_t j, int[] results) {
  results[i * J + j] = to!int(i * J + j);
}

Ali

August 19, 2022
On Friday, 19 August 2022 at 02:02:57 UTC, Adam D Ruppe wrote:

> Even if they aren't equal, you'll get decent benefit from parallel on the outer one alone, but not as good since the work won't be balanced.

Unless there's some kind of blocking going on in D's implementation, if the number of passes on the outer loop is large enough relative to the number of cores, applying parallel to the outer loop is the best you can do - uneven amounts of work on the inner loop will get spread out across cores. There are always counterexamples, but the ratio of passes to cores needs to be pretty low for further optimizations to have any chance to help.

August 20, 2022
On 19.08.22 03:49, Shriramana Sharma wrote:
> Hello. I want to parallelize a computation which has two for loops, one nested within another. All inner-loop-param+outer-loop-param combinations can be computed independent of one another.
> 
> As I suspected, [https://forum.dlang.org/post/xysyidbkjdinclmrxzgt@forum.dlang.org](this forum post) says that only one loop can be parallelized. Will it be an error or inefficient or useless if I try to do both?
> 
> Also, what is the best way to do parallelism in such a situation?
You could also do a custom range that makes a one-dimensional range (aka iota out of your nested loops) and then process this with parallel.
Another way (more similar to Ali's solution) would be to write the nested loops as you have them, but collect the parameters for the work in an array or something and then process this array in parallel.

Kind regards,
Christian


August 20, 2022
On 20.08.22 12:28, Christian Köstlin wrote:
> On 19.08.22 03:49, Shriramana Sharma wrote:
>> Hello. I want to parallelize a computation which has two for loops, one nested within another. All inner-loop-param+outer-loop-param combinations can be computed independent of one another.
>>
>> As I suspected, [https://forum.dlang.org/post/xysyidbkjdinclmrxzgt@forum.dlang.org](this forum post) says that only one loop can be parallelized. Will it be an error or inefficient or useless if I try to do both?
>>
>> Also, what is the best way to do parallelism in such a situation?
> You could also do a custom range that makes a one-dimensional range (aka iota out of your nested loops) and then process this with parallel.
> Another way (more similar to Ali's solution) would be to write the nested loops as you have them, but collect the parameters for the work in an array or something and then process this array in parallel.
> 
> Kind regards,
> Christian
> 
> 
Actually phobos brings already all that is required (especially the cartesian product):

```d
import std;

void doSomething(int i, string s)
{
    writeln("%s-%s".format(i, s));
}

void main()
{
    foreach (t; cartesianProduct(iota(1, 4), ["abc", "def", "ghi"]).parallel) {
        doSomething(t.expand);
    }
}
```

kind regards,
Christian