What's the "right" way to do openmp-style parallelism? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » What's the "right" way to do openmp-style parallelism?

Thread overview

What's the "right" way to do openmp-style parallelism?
Sep 07, 2015 Charles
Sep 07, 2015 Meta
Sep 08, 2015 Russel Winder
Sep 08, 2015 Dominikus Dittes Scherkl
Sep 09, 2015 Russel Winder

September 07, 2015

What's the "right" way to do openmp-style parallelism?

Posted by Charles

Charles

Friends,

I have a program that would be pretty easy to parallelize with an openmp pragra in C. I'd like to avoid the performance cost of using message passing, and the shared qualifier seems like it's enforcing guarantees I don't need. Essentially, I have

x = float[imax][jmax]; //x is about 8 GB of floats
for(j = 0; j < jmax; j++){
//create some local variables.
    for(i = 0; i < imax; i++){
        x[j][i] = complicatedFunction(i, x[j-1], other, local, variables);
    }
}

In C, I'd just stick a #pragma omp parallel for around the inner loop (since the outer loop obviously can't be parallelized).

How should I go about this in D? I want to avoid copying data around if it's possible since these arrays are huge.

Cheers,
Charles.

September 07, 2015

Re: What's the "right" way to do openmp-style parallelism?

Posted by Meta
in reply to Charles

Meta

Posted in reply to Charles

On Monday, 7 September 2015 at 02:56:04 UTC, Charles wrote:
> Friends,
>
> I have a program that would be pretty easy to parallelize with an openmp pragra in C. I'd like to avoid the performance cost of using message passing, and the shared qualifier seems like it's enforcing guarantees I don't need. Essentially, I have
>
> x = float[imax][jmax]; //x is about 8 GB of floats
> for(j = 0; j < jmax; j++){
> //create some local variables.
>     for(i = 0; i < imax; i++){
>         x[j][i] = complicatedFunction(i, x[j-1], other, local, variables);
>     }
> }
>
> In C, I'd just stick a #pragma omp parallel for around the inner loop (since the outer loop obviously can't be parallelized).
>
> How should I go about this in D? I want to avoid copying data around if it's possible since these arrays are huge.
>
> Cheers,
> Charles.

I believe this is what you want: http://dlang.org/phobos/std_parallelism.html#.parallel.

I believe that all you would need to change is to have your inner loop become:

foreach(i, ref f; x[j].parallel)
{
    f = complicatedFUnction(i, x[j-1], etc...);
}

Don't quote me on that, though, as I'm not very experienced with std.parallelism.

September 08, 2015

Re: What's the "right" way to do openmp-style parallelism?

Posted by Russel Winder
in reply to Charles

Russel Winder

Posted in reply to Charles

Attachments:

signature.asc (This is a digitally signed message part)

On Mon, 2015-09-07 at 02:56 +0000, Charles via Digitalmars-d-learn wrote:
> […]
> 
> x = float[imax][jmax]; //x is about 8 GB of floats
> for(j = 0; j < jmax; j++){
> //create some local variables.
>      for(i = 0; i < imax; i++){
>          x[j][i] = complicatedFunction(i, x[j-1], other, local,
> variables);
>      }
> }

So as to run things through a compiler, I expanded the code fragment to:

    float complicatedFunction(int i, float[] x) pure {
      return 0.0;
    }

    void main() {
      immutable imax = 10;
      immutable jmax = 10;
      float[imax][jmax] x;
      for(int j = 1; j < jmax; j++){
        for(int i = 0; i < imax; i++){
          x[j][i] = complicatedFunction(i, x[j-1]);
        }
      }
    }

Hopefully this is an accurate representation of the original code. Note the change in the j iteration since j-1 is being used as an index. Of course I immediately wanted to change this to:

    float complicatedFunction(int i, float[] x) pure {
      return 0.0;
    }

    void main() {
      immutable imax = 10;
      immutable jmax = 10;
      float[imax][jmax] x;
      foreach(int j; 1..jmax){
        foreach(int i; 0..imax){
          x[j][i] = complicatedFunction(i, x[j-1]);
        }
      }
    }

Hopefully this is not now wrong as a representation of the original problem.

> In C, I'd just stick a #pragma omp parallel for around the inner loop (since the outer loop obviously can't be parallelized).

I would hope you meant C++ , not C, there. ;-)

I am not sure OpenMP would work to parallelize your C++ (or C) code.

Given that complicatedFunction has access to the whole of x[j-1] there could be coupling between x[j-1][m] and x[j-1][n] in the function which would lead to potentially different results being computed in the sequential and parallel cases. This is not a C/C++/D thing this is a data coupling thing.

So although Meta suggested a parallel foreach (the D equivalent of OpenMP parallel for pragma), something along the lines:

    import std.parallelism: parallel;

    float complicatedFunction(int i, float[] x) pure {
      return 0.0;
    }

    void main() {
      immutable imax = 10;
      immutable jmax = 10;
      float[imax][jmax] x;
      foreach(int j; 1..jmax){
        foreach(int i, ref item; parallel(x[j-1])){
          x[j][i] = complicatedFunction(i, item);
        }
      }
    }

(though sadly, this doesn't compile for a reason I can't fathom instantly) this brings into stark relieve the fact that there is a potential coupling between x[j-1][m] and x[j-1][n] which means enforcing parallelism here will almost certainly result in the wrong values being calculated.

This is a standard pipeline describable with a map, something along the lines of:

    import std.algorithm: map;

    float complicatedFunction(int i, float[] x) pure {
      return 0.0;
    }

    void main() {
      immutable imax = 10;
      immutable jmax = 10;
      float[imax][jmax] x;
      foreach(int j; 1..jmax){
        x[j] = map!(a => complicatedFunction(a, x[j-1]))(x[j-1]);
      }
    }

(but this also has a compilation error, which I hope someone can fix…)

This is the step prior to using parallel map, but cast in this way highlights that in order to then be parallelized at all in any way, complicatedFunction must have no couplings between x[j-1][m] and x[j -1][n].

(I am guessing this is some form of cellular automaton or some Markov
process problem?)

> How should I go about this in D? I want to avoid copying data around if it's possible since these arrays are huge.

Indeed. With C, C++, D, (Go, Rust,…) you have to use references (aka pointers) and hope you do not get any ownership problems. It might be interesting to see whether a language such as Haskell, which has copy semantics but optimizes as much as it can away, would fare with this.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

September 08, 2015

Re: What's the "right" way to do openmp-style parallelism?

Posted by Dominikus Dittes Scherkl
in reply to Russel Winder

Dominikus Dittes Scherkl

Posted in reply to Russel Winder

On Tuesday, 8 September 2015 at 05:50:30 UTC, Russel Winder wrote:
>     void main() {
>       immutable imax = 10;
>       immutable jmax = 10;
>       float[imax][jmax] x;
>       foreach(int j; 1..jmax){
>         foreach(int i, ref item; parallel(x[j-1])){
>           x[j][i] = complicatedFunction(i, item);
>         }
>       }
>     }
>
> (though sadly, this doesn't compile for a reason I can't fathom instantly)
Hmm. Shouldn't you instead parallel the outer loop?

September 09, 2015

Re: What's the "right" way to do openmp-style parallelism?

Posted by Russel Winder
in reply to Dominikus Dittes Scherkl

Russel Winder

Posted in reply to Dominikus Dittes Scherkl

Attachments:

signature.asc (This is a digitally signed message part)

On Tue, 2015-09-08 at 07:33 +0000, Dominikus Dittes Scherkl via Digitalmars-d-learn wrote:
> On Tuesday, 8 September 2015 at 05:50:30 UTC, Russel Winder wrote:
> >     void main() {
> >       immutable imax = 10;
> >       immutable jmax = 10;
> >       float[imax][jmax] x;
> >       foreach(int j; 1..jmax){
> >         foreach(int i, ref item; parallel(x[j-1])){
> >           x[j][i] = complicatedFunction(i, item);
> >         }
> >       }
> >     }
> > 
> > (though sadly, this doesn't compile for a reason I can't fathom instantly)
> Hmm. Shouldn't you instead parallel the outer loop?

Can't do that because it is a pipeline: the current computation is input to the next one. As far as I can tell there is no way the code as presented can be parallelized in the general case. If there were some guarantees on complicatedFunction, then it is a different game.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation