June 24, 2021
On 6/24/21 1:41 PM, seany wrote:

> Is there any way to control the number of CPU cores used in
> parallelization ?

Yes. You have to create a task pool explicitly:

import std.parallelism;

void main() {
  enum threadCount = 2;
  auto myTaskPool = new TaskPool(threadCount);
  scope (exit) {
    myTaskPool.finish();
  }

  enum workUnitSize = 1; // Or 42 or something else. :)
  foreach (e; myTaskPool.parallel([ 1, 2, 3 ], workUnitSize)) {
    // ...
  }
}

I've touched on a few parallelism concepts at this point in a presentation:

  https://www.youtube.com/watch?v=dRORNQIB2wA&t=1332s

Ali

June 25, 2021

On Thursday, 24 June 2021 at 21:19:19 UTC, Ali Çehreli wrote:

>

On 6/24/21 1:41 PM, seany wrote:

>

Is there any way to control the number of CPU cores used in
parallelization ?

Yes. You have to create a task pool explicitly:

import std.parallelism;

void main() {
enum threadCount = 2;
auto myTaskPool = new TaskPool(threadCount);
scope (exit) {
myTaskPool.finish();
}

enum workUnitSize = 1; // Or 42 or something else. :)
foreach (e; myTaskPool.parallel([ 1, 2, 3 ], workUnitSize)) {
// ...
}
}

I've touched on a few parallelism concepts at this point in a presentation:

https://www.youtube.com/watch?v=dRORNQIB2wA&t=1332s

Ali

I tried this .

            int[][] pnts ;
	pnts.length = fld.length;

	enum threadCount = 2;
	auto prTaskPool = new TaskPool(threadCount);

	scope (exit) {
		prTaskPool.finish();
	}

	enum workUnitSize = 1;

	foreach(i, fLine; prTaskPool.parallel(fld, workUnitSize)) {
              //....
            }

This is throwing random segfaults.
CPU has 2 cores, but usage is not going above 37%

Even much deeper down in program, much further down the line...
And the location of segfault is random.

June 25, 2021

On Friday, 25 June 2021 at 13:53:17 UTC, seany wrote:

>

On Thursday, 24 June 2021 at 21:19:19 UTC, Ali Çehreli wrote:

>

[...]

I tried this .

            int[][] pnts ;
  pnts.length = fld.length;

  enum threadCount = 2;
  auto prTaskPool = new TaskPool(threadCount);

  scope (exit) {
  	prTaskPool.finish();
  }

  enum workUnitSize = 1;

  foreach(i, fLine; prTaskPool.parallel(fld, workUnitSize)) {
              //....
            }

This is throwing random segfaults.
CPU has 2 cores, but usage is not going above 37%

Even much deeper down in program, much further down the line...
And the location of segfault is random.

PS. line this I am running into bus errors Too , sometimes way down the line after these foreach calls are completed...

June 25, 2021
On 6/25/21 6:53 AM, seany wrote:

> I tried this .
>
>                  int[][] pnts ;
>          pnts.length = fld.length;
>
>          enum threadCount = 2;
>          auto prTaskPool = new TaskPool(threadCount);
>
>          scope (exit) {
>              prTaskPool.finish();
>          }
>
>          enum workUnitSize = 1;
>
>          foreach(i, fLine; prTaskPool.parallel(fld, workUnitSize)) {
>                    //....
>                  }
>
>
> This is throwing random segfaults.
> CPU has 2 cores, but usage is not going above 37%

Performance is not guaranteed depending on many factors. For example, inserting a writeln() call in the loop would make all threads compete with each other for stdout. There can be many contention points some of which depending on your program logic. (And "Amdahl's Law" applies.)

Another reason: 1 can be a horrible value for workUnitSize. Try 100, 1000, etc. and see whether it helps with performance.

> Even much deeper down in program, much further down the line...
> And the location of segfault is random.

Do you still have two parallel loops? Are both with explicit TaskPool objects? If not, I wonder whether multiple threads are using the convenient 'parallel' function, stepping over each others' toes. (I am not sure about this because perhaps it's safe to do this; never tested.)

It is possible that the segfaults are caused by your code. The code you showed in your original post (myFunction0() and others), they all work on independent data structures, right?

Ali

June 25, 2021

On Friday, 25 June 2021 at 13:53:17 UTC, seany wrote:

>

I tried this .

            int[][] pnts ;
  pnts.length = fld.length;

  enum threadCount = 2;
  auto prTaskPool = new TaskPool(threadCount);

  scope (exit) {
  	prTaskPool.finish();
  }

  enum workUnitSize = 1;

  foreach(i, fLine; prTaskPool.parallel(fld, workUnitSize)) {
              //....
            }

A self-contained and complete example would help a lot, but the likely
problem with this code is that you're accessing pnts[y][x] in the
loop, which makes the loop bodies no longer independent because some
of them need to first allocate an int[] to replace the zero-length
pnts[y] that you're starting with.

Consider:

$ rdmd --eval 'int[][] p; p.length = 5; p.map!"a.length".writeln'
[0, 0, 0, 0, 0]
June 25, 2021
On Friday, 25 June 2021 at 14:10:52 UTC, Ali Çehreli wrote:
> On 6/25/21 6:53 AM, seany wrote:
>
> >          [...]
> workUnitSize)) {
> > [...]
>
> Performance is not guaranteed depending on many factors. For example, inserting a writeln() call in the loop would make all threads compete with each other for stdout. There can be many contention points some of which depending on your program logic. (And "Amdahl's Law" applies.)
>
> Another reason: 1 can be a horrible value for workUnitSize. Try 100, 1000, etc. and see whether it helps with performance.
>
> > [...]
> line...
> > [...]
>
> Do you still have two parallel loops? Are both with explicit TaskPool objects? If not, I wonder whether multiple threads are using the convenient 'parallel' function, stepping over each others' toes. (I am not sure about this because perhaps it's safe to do this; never tested.)
>
> It is possible that the segfaults are caused by your code. The code you showed in your original post (myFunction0() and others), they all work on independent data structures, right?
>
> Ali

The code without the parallel foreach works fine. No segfault.

In several instances, I do have multiple nested loops, but in every case. only the outer one in parallel foreach.

All of them are with explicit taskpool definition.




June 25, 2021

On Friday, 25 June 2021 at 14:13:14 UTC, jfondren wrote:

>

On Friday, 25 June 2021 at 13:53:17 UTC, seany wrote:

>

[...]

A self-contained and complete example would help a lot, but the likely
problem with this code is that you're accessing pnts[y][x] in the
loop, which makes the loop bodies no longer independent because some
of them need to first allocate an int[] to replace the zero-length
pnts[y] that you're starting with.

Consider:

$ rdmd --eval 'int[][] p; p.length = 5; p.map!"a.length".writeln'
[0, 0, 0, 0, 0]

This particular location does not cause segfault.
It is segfaulting down the line in a completely unrelated location... Wait I will try to make a MWP.

June 25, 2021

On Friday, 25 June 2021 at 14:22:25 UTC, seany wrote:

>

On Friday, 25 June 2021 at 14:13:14 UTC, jfondren wrote:

>

On Friday, 25 June 2021 at 13:53:17 UTC, seany wrote:

>

[...]

A self-contained and complete example would help a lot, but the likely
problem with this code is that you're accessing pnts[y][x] in the
loop, which makes the loop bodies no longer independent because some
of them need to first allocate an int[] to replace the zero-length
pnts[y] that you're starting with.

Consider:

$ rdmd --eval 'int[][] p; p.length = 5; p.map!"a.length".writeln'
[0, 0, 0, 0, 0]

This particular location does not cause segfault.
It is segfaulting down the line in a completely unrelated location... Wait I will try to make a MWP.

Here is MWP.

Please compile with dub build -b release --compiler=ldc2 . Then to run, please use : ./tracker_ai --filename 21010014-86.ptl

June 25, 2021
On 6/25/21 7:21 AM, seany wrote:

> The code without the parallel foreach works fine. No segfault.

That's very common.

What I meant is, is the code written in a way to work safely in a parallel foreach loop? (i.e. Is the code "independent"?) (But I assume it is because it's been the common theme in this thread; so there must be something stranger going on.)

Ali

June 25, 2021
On Friday, 25 June 2021 at 15:08:38 UTC, Ali Çehreli wrote:
> On 6/25/21 7:21 AM, seany wrote:
>
> > The code without the parallel foreach works fine. No segfault.
>
> That's very common.
>
> What I meant is, is the code written in a way to work safely in a parallel foreach loop? (i.e. Is the code "independent"?) (But I assume it is because it's been the common theme in this thread; so there must be something stranger going on.)
>
> Ali

I have added MWP. did you have a chance to look at it?