Thread overview
Looking for writing parallel foreach kind of statement for nested for-loops
Feb 09, 2013
Sparsh Mittal
Feb 09, 2013
FG
Feb 10, 2013
Sparsh Mittal
Feb 10, 2013
FG
Feb 10, 2013
Philippe Sigaud
Feb 10, 2013
Sparsh Mittal
February 09, 2013
I saw foreach and parallel commands on
http://dlang.org/phobos/std_parallelism.html

I have nested for-loops as follows:

for(int i=1; i< N; i++)
 {
  for(int j=1; j < M; j++)
   {
     func(i,j);
   }
 }

Here func(i,j) is such that there is no dependency or communication b/w different (i,j) tuple values. Thus, problem is very simple.

My question is how can I write a parallel foreach loop (if at all) for such problems in D. I would be grateful for your help.
February 09, 2013
On 2013-02-09 23:32, Sparsh Mittal wrote:
> I saw foreach and parallel commands on
> http://dlang.org/phobos/std_parallelism.html
>
> I have nested for-loops as follows:
>
> for(int i=1; i< N; i++)
>   {
>    for(int j=1; j < M; j++)
>     {
>       func(i,j);
>     }
>   }
>
> Here func(i,j) is such that there is no dependency or communication b/w
> different (i,j) tuple values. Thus, problem is very simple.

Huh?
for(int i=1; i< N; i++)    <==>    foreach(i; iota(1, N))
so you can use:  foreach(i; parallel(iota(1, N))) { ... }

February 10, 2013
> for(int i=1; i< N; i++)    <==>    foreach(i; iota(1, N))
> so you can use:  foreach(i; parallel(iota(1, N))) { ... }
Thanks a lot. This one divides the x-cross-y region by rows. Suppose dimension is 8*12 and 4 parallel threads are there, so current method is dividing by 2*12 to each of 4 threads.

The current reply answers my question, but I was just curious. Can we have a method which divides the 2d region as follows: 8*12 divided into 4*6 to each of 4 threads.





February 10, 2013
On 2013-02-10 02:22, Sparsh Mittal wrote:
> The current reply answers my question, but I was just curious. Can we have a
> method which divides the 2d region as follows: 8*12 divided into 4*6 to each of
> 4 threads.

Think again if you need that. Things start getting pretty ugly. :)

    const uint parts = 2; // in each dimension
    foreach (block; parallel(iota(parts*parts))) {
        uint p1 = block / parts;
        foreach (i; (p1*N/parts)..(p1==parts-1 ? N : (p1+1)*N/parts)) {
            uint p2 = block % parts;
            foreach (j; (p2*M/parts)..(p2==parts-1 ? M : (p2+1)*M/parts)) {
                func(i, j, p1, p2, block);
            }
        }
    }


February 10, 2013
On Sun, Feb 10, 2013 at 8:45 AM, FG <home@fgda.pl> wrote:
> On 2013-02-10 02:22, Sparsh Mittal wrote:
>>
>> The current reply answers my question, but I was just curious. Can we have
>> a
>> method which divides the 2d region as follows: 8*12 divided into 4*6 to
>> each of
>> 4 threads.
>
>
> Think again if you need that. Things start getting pretty ugly. :)

Indeed... Sparsh, any reason you need the calculation to be done on 2d blocks instead of independent slots?
February 10, 2013
>> Think again if you need that. Things start getting pretty ugly. :)
>
Yes, it is not at all intuitive.
> Indeed... Sparsh, any reason you need the calculation to be done on 2d
> blocks instead of independent slots?

For my problem, original answer was fine, since parallel calculations are not at all dependent on each others.

Sometime there are calculations to be done on 2d grid, where calculations are not uniform across the grid(see paper "An overview of parallel visualisation methods for Mandelbrot and Julia sets" where you can see Julia sets parallelization), and hence, dividing in a particular way leads to better load-balancing than others.

Thanks a lot for your answer and time.