April 03, 2023

On 4/3/23 6:02 PM, Paul wrote:

>

On Sunday, 2 April 2023 at 15:32:05 UTC, Steven Schveighoffer wrote:

>

It's important to note that parallel doesn't iterate the range in parallel, it just runs the body in parallel limited by your CPU count.
?!?

So for example, if you have:

foreach(i; iota(0, 2_000_000).parallel)
{
   runExpensiveTask(i);
}

The foreach is run on the main thread, gets a 0, then hands off to a task thread runExpensiveTask(0). Then it gets a 1, and hands off to a task thread runExpensiveTask(1), etc. The iteration is not expensive, and is not done in parallel.

On the other hand, what you shouldn't do is:

foreach(i; iota(0, 2_000_000).map!(x => runExpensiveTask(x)).parallel)
{
}

as this will run the expensive task before running any tasks.

> >

If your foreach body takes a global lock (like writeln(i);), then it's not going to run any faster (probably slower actually).
Ok I did have some debug writelns I commented out.

And did it help? Another thing that takes a global lock is memory allocation.

> >

Also make sure you have more than one logical CPU.
I have 8.

It's dependent on the work being done, but you should see a roughly 8x speedup as long as the overhead of distributing tasks is not significant compared to the work being done.

-Steve

April 03, 2023

On Monday, 3 April 2023 at 22:24:18 UTC, Steven Schveighoffer wrote:

> > >

If your foreach body takes a global lock (like writeln(i);), then it's not going to run any faster (probably slower actually).
Ok I did have some debug writelns I commented out.

And did it help?
No

My program is about 140 lines Steven. Its just one of the Advent of Code challenges. Could I past the whole program here and see what you think?

Thanks for your assistance...much appreciated.

April 03, 2023

On 4/3/23 6:56 PM, Paul wrote:

>

On Monday, 3 April 2023 at 22:24:18 UTC, Steven Schveighoffer wrote:

> > >

If your foreach body takes a global lock (like writeln(i);), then it's not going to run any faster (probably slower actually).
Ok I did have some debug writelns I commented out.

And did it help?
No

My program is about 140 lines Steven.  Its just one of the Advent of Code challenges.  Could I past the whole program here and see what you think?

Yeah, please post.

-Steve

April 03, 2023

On Monday, 3 April 2023 at 23:13:58 UTC, Steven Schveighoffer wrote:

>

Yeah, please post.

module aoc2215b2;

import std.stdio;
import std.file: readText;
import std.conv: to;
import std.math: abs;
import std.traits;
import std.parallelism;
import std.range;
import core.time: MonoTime;

// Timed main() vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
void main(string[] args) {
auto progStartTime = MonoTime.currTime;
//-----------------------------------------------------------------------------
	string filename = args.length > 1 ? args[1] : "aoc2215a.data";
	CommPair[] commPair;
	ulong x,y;

	// read file that has data sets in the form of x,y coordinate pairs
	// for each sensor-beacon pair.  Create on array of structs to hold
	// this information.
	loadFileDataIntoArrayOfStructs(commPair, filename);
	
	foreach(int lineOfInterest;parallel(iota(0,4_000_001))) {
		Span[] span; // A section of line-of-interest coordinates where no other beacons are present.
		const spanReserve = span.reserve(50);
		createSpansOfNoBeacons(lineOfInterest,commPair,span);

		// if spans overlap, combine them into a single span and mark
		// the other spans !inUse.
		combineOverlappingSpans(span);

		// look for a line that doesn't have 4,000,001 locations accounted for
		if(beaconFreeLocations(span) < 4_000_001) {

			// find the location that is not accounted for
			foreach(ulong i;0..4_000_000) {
				bool found = false;
				foreach(sp;span) {
					if(i >= sp.xLow && i <= sp.xHigh) {
						found = true;
						break;
					}
				}
				if(!found) {
					x = i; y = lineOfInterest;
					break;
				}
			}
		}
	}
writeln(x," ",y);

//-----------------------------------------------------------------------------
auto progEndTime = MonoTime.currTime;
writeln(progEndTime - progStartTime);
}
// Timed main() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^



struct CommPair {
	int sx,sy,bx,by;
	int manhattanDistance;
}

void loadFileDataIntoArrayOfStructs(ref CommPair[] commPair, string filename) {
	import std.regex;
	auto s = readText(filename);
	auto ctr = ctRegex!(`x=(-*\d+), y=(-*\d+):.*x=(-*\d+), y=(-*\d+)`);
	CommPair cptemp;
	foreach (c; matchAll(s, ctr)) {
		cptemp.sx = to!int(c[1]);
		cptemp.sy = to!int(c[2]);
		cptemp.bx = to!int(c[3]);
		cptemp.by = to!int(c[4]);
		cptemp.manhattanDistance = abs(cptemp.sx-cptemp.bx) + abs(cptemp.sy-cptemp.by);
		commPair ~= cptemp;
	}
}

struct Span {
	int xLow, xHigh;
	bool inUse = true;
}

void createSpansOfNoBeacons(int lineOfInterest, CommPair[] commPair,ref Span[] span) {
	foreach(size_t i,cp;commPair) {
		int distanceToLineOfInterest = abs(cp.sy - lineOfInterest);
		if(cp.manhattanDistance >= distanceToLineOfInterest) {
			int xLow  = cp.sx - (cp.manhattanDistance - distanceToLineOfInterest);
			int xHigh = cp.sx + (cp.manhattanDistance - distanceToLineOfInterest);
			span ~= Span(xLow,xHigh);
		}
	}
}

void combineOverlappingSpans(ref Span[] span) {
	bool combinedSpansThisCycle = true;
	while(combinedSpansThisCycle) {
		combinedSpansThisCycle = false;
		for(size_t i=0; i < span.length-1; i++) {
			if(!span[i].inUse) continue;
		
			for(size_t j=i+1; j < span.length; j++) {
				if(!span[j].inUse) continue;

				// if one span overlaps with the other, combine them into one span
				if(spanIContainedInSpanJ(span[i],span[j]) || spanJContainedInSpanI(span[i],span[j])) {
					span[i].xLow  = span[i].xLow  < span[j].xLow  ? span[i].xLow  : span[j].xLow;
					span[i].xHigh = span[i].xHigh > span[j].xHigh ? span[i].xHigh : span[j].xHigh;
					span[j].inUse = false;
					combinedSpansThisCycle = true;

					// after combining two spans, perform bounds checking
					// 15 part b limits the search between 0 and 4,000,000
					span[i].xLow  = span[i].xLow  < 0         ? 0         : span[i].xLow;
					span[i].xHigh = span[i].xHigh > 4_000_000 ? 4_000_000 : span[i].xHigh;
				}
			}
		}
	}
}

bool spanIContainedInSpanJ (Span spanI, Span spanJ) {
	return 	(spanI.xLow >= spanJ.xLow && spanI.xLow <= spanJ.xHigh) ||
			(spanI.xHigh>= spanJ.xLow && spanI.xHigh<= spanJ.xHigh);
}

bool spanJContainedInSpanI (Span spanI, Span spanJ) {
	return 	(spanJ.xLow >= spanI.xLow && spanJ.xLow <= spanI.xHigh) ||
			(spanJ.xHigh>= spanI.xLow && spanJ.xHigh<= spanI.xHigh);
}

int beaconFreeLocations(ref Span[] span) {
	int noBeaconCount = 0;
	foreach(sp;span) if(sp.inUse) noBeaconCount += sp.xHigh - sp.xLow + 1;

	return noBeaconCount;
}
April 03, 2023

On 4/3/23 7:22 PM, Paul wrote:

>
// Timed main() vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
void main(string[] args) {
auto progStartTime = MonoTime.currTime;
//-----------------------------------------------------------------------------
     string filename = args.length > 1 ? args[1] : "aoc2215a.data";
     CommPair[] commPair;
     ulong x,y;

     // read file that has data sets in the form of x,y coordinate pairs
     // for each sensor-beacon pair.  Create on array of structs to hold
     // this information.
     loadFileDataIntoArrayOfStructs(commPair, filename);

     foreach(int lineOfInterest;parallel(iota(0,4_000_001))) {
         Span[] span; // A section of line-of-interest coordinates where no other beacons are present.
         const spanReserve = span.reserve(50);
         createSpansOfNoBeacons(lineOfInterest,commPair,span);

         // if spans overlap, combine them into a single span and mark
         // the other spans !inUse.
         combineOverlappingSpans(span);

         // look for a line that doesn't have 4,000,001 locations accounted for
         if(beaconFreeLocations(span) < 4_000_001) {

             // find the location that is not accounted for
             foreach(ulong i;0..4_000_000) {
                 bool found = false;
                 foreach(sp;span) {
                     if(i >= sp.xLow && i <= sp.xHigh) {
                         found = true;
                         break;
                     }
                 }
                 if(!found) {
                     x = i; y = lineOfInterest;
                     break;
                 }
             }
         }
     }
writeln(x," ",y);

So I just quoted your main loop.

I am assuming that this O(n^2) algorithm doesn't actually run for all iterations, because that wouldn't be feasible (16 trillion iterations is a lot).

This means that I'm assuming a lot of cases do not run the second loop. Everything you do besides prune the second loop is mostly allocating an array of Span types. This means most of the parallel loops are allocating, and doing nothing else. As I said earlier, allocations need a global lock of the GC.

What you need to do probably, is to avoid these allocations per loop.

The easiest thing I can think of is to store the Span array as a static array of the largest array you need (i.e. the length of commPair), and then slice it instead of appending.

So what you need is inside createSpansOfNoBeacons, take as a reference a ref Span[MAX_SPANS], and have it return a Span[] that is a slice of that which was "alocated".

See if this helps.

FWIW, I did the AoC 2022 as well, and I never needed parallel execution. Looking at my solution comment in reddit, this one I sort of punted by knowing I could exit as soon as the answer is found (my solution runs in 2.5s on my input). But I recommend (once you are done), reading this post, it is a really cool way to look at it:

https://www.reddit.com/r/adventofcode/comments/zmcn64/2022_day_15_solutions/j0hl19a/?context=8&depth=9

-Steve

April 04, 2023

On Monday, 3 April 2023 at 22:24:18 UTC, Steven Schveighoffer wrote:

>

So for example, if you have:

foreach(i; iota(0, 2_000_000).parallel)
{
   runExpensiveTask(i);
}

The foreach is run on the main thread, gets a 0, then hands off to a task thread runExpensiveTask(0). Then it gets a 1, and hands off to a task thread runExpensiveTask(1), etc. The iteration is not expensive, and is not done in parallel.

On the other hand, what you shouldn't do is:

foreach(i; iota(0, 2_000_000).map!(x => runExpensiveTask(x)).parallel)
{
}

as this will run the expensive task before running any tasks.

I don't understand what foreach() does :)

import std.range, std.algorithm : map;
import std.stdio, std.parallelism;
//import sdb.sequences : RowlandSequence_v2;/*
struct RowlandSequence_v2 {
  import std.numeric : gcd;

  long b, r, a = 3;
  enum empty = false;
  auto front() => a;
  void popFront() {
    long result = 1;
    while(result == 1) {
      result = gcd(r++, b);
      b += result;
    }
    a = result;
  }
}//*/

enum BP : long {
   // s, f, r, b = 7, /* <- beginning
   s = 178, r = 1993083484, b =  5979250449,//*/
   len = 190
}

void main()
{
  with(BP) {
    long[len] arr;
    auto range = RowlandSequence_v2(b, r);

    s.iota(len)
     .map!((a){
                range.popFront();
                return arr[a] = range.front();
              }
          )
     .parallel
     .writeln;
  }
} /* PRINTS:

ParallelForeach!(MapResult!(__lambda3, Result))(std.parallelism.TaskPool, [5, 3, 73, 157, 7, 5, 3, 13, 3986167223, 3, 7, 73], 1)

*/

Is it necessary to enclose the code in foreach()? I invite Ali to tell me! Please explain why parallel isn't running.

"Ben anlamıyor, foreach ne yapıyor 😀 Kodu foreach() içine almak şart mı? Ali'yi davet ediyorum, bana anlatması için! Paralel() niye çalışmıyor, lütfen açıklayın hocam. Mümkünse Türkçe!" in Turkish.

SDB@79

April 04, 2023
On 4/4/23 02:24, Salih Dincer wrote:

> I don't understand what `foreach()` does :)

Hm. I forgot whether 'parallel' works only with 'foreach'. But there are various other algorithms in std.parallelism that may be more useful with range algorithm chains:

  https://dlang.org/phobos/std_parallelism.html

> in Turkish

I don't have time to experiment more at this time but I have the following chapters, which includes some of those other algorithms as well:

  http://ddili.org/ders/d/kosut_islemler.html

  http://ddili.org/ders/d.en/parallelism.html

Ali

April 04, 2023

On 4/4/23 5:24 AM, Salih Dincer wrote:

>

Is it necessary to enclose the code in foreach()? I invite Ali to tell me! Please explain why parallel isn't running.

parallel is a shortcut to TaskPool.parallel, which is indeed a foreach-only construct, it does not return a range.

I think what you want is TaskPool.map:

// untested, just looking at the
taskPool.map!(/* your map function here */)
   (s.iota(len)).writeln;

Can't use pipelining with it, because it is a member function.

https://dlang.org/phobos/std_parallelism.html#.TaskPool.map

-Steve

April 04, 2023

On Tuesday, 4 April 2023 at 14:20:20 UTC, Steven Schveighoffer wrote:

>

parallel is a shortcut to TaskPool.parallel, which is indeed a foreach-only construct, it does not return a range.

I think what you want is TaskPool.map:

// untested, just looking at the
taskPool.map!(/* your map function here */)
   (s.iota(len)).writeln;

I tried, thanks but it goes into infinite loop. For example, the first 50 of the sequence should have been printed to the screen immediately without waiting.

long[50] arr;
RowlandSequence_v2 range;

auto next(long a)
{
  range.popFront();
  return arr[a] = range.front();
}

void main()
{
    range = RowlandSequence_v2(7, 2);
    taskPool.map!next(iota(50))/*
    s.iota(50)
     .map!next
     .parallel//*/
     .writeln;
}

On Tuesday, 4 April 2023 at 13:18:01 UTC, Ali Çehreli wrote:

>

I don't have time to experiment more at this time but I have the following chapters, which includes some of those other algorithms as well:

http://ddili.org/ders/d/kosut_islemler.html

I read it, thanks...

SDB@79

April 04, 2023

On 4/4/23 11:34 AM, Salih Dincer wrote:

>

On Tuesday, 4 April 2023 at 14:20:20 UTC, Steven Schveighoffer wrote:

>

parallel is a shortcut to TaskPool.parallel, which is indeed a foreach-only construct, it does not return a range.

I think what you want is TaskPool.map:

// untested, just looking at the
taskPool.map!(/* your map function here */)
   (s.iota(len)).writeln;

I tried, thanks but it goes into infinite loop. For example, the first 50 of the sequence should have been printed to the screen immediately without waiting.

long[50] arr;
RowlandSequence_v2 range;

auto next(long a)
{
   range.popFront();
   return arr[a] = range.front();
}

void main()
{
     range = RowlandSequence_v2(7, 2);
     taskPool.map!next(iota(50))/*
     s.iota(50)
      .map!next
      .parallel//*/
      .writeln;
}

Keep in mind that arr and range are thread-local, and so will be different states for different tasks.

Though I don't really know what you are doing there.

-Steve