June 18, 2020
I have an array of input data that I'm looping over, and, based on some condition, generate new items that are appended onto a target array (which may already contain data). Since the creation of new items is quite expensive, I'm thinking to parallelize it with parallel foreach.

To avoid data races, my thought is for each generated item to be appended to thread-specific temporary arrays, that after the parallel foreach get sequentially appended to the target array. Something like this:

	Item[] targetArray = ...; // already contains data
	Item[][nThreads] tmp;
	foreach (elem; input.parallel) {
		if (condition(elem)) {
			auto output = expensiveComputation(elem);
			tmp[threadId] ~= output;
		}
	}
	foreach (a; tmp)
		targetArray ~= a;

Is there an easy way to achieve this with std.parallelism?  I looked over the API but there doesn't seem to be any obvious way for a task to know which thread it's running in, in order to know which tmp array it should append to.  If possible I'd like to avoid having to manually assign tasks to threads.


T

-- 
Questions are the beginning of intelligence, but the fear of God is the beginning of wisdom.
June 19, 2020
On Thursday, 18 June 2020 at 14:43:54 UTC, H. S. Teoh wrote:
> I have an array of input data that I'm looping over, and, based on some condition, generate new items that are appended onto a target array (which may already contain data). Since the creation of new items is quite expensive, I'm thinking to parallelize it with parallel foreach.
>
> To avoid data races, my thought is for each generated item to be appended to thread-specific temporary arrays, that after the parallel foreach get sequentially appended to the target array. Something like this:
>
> 	Item[] targetArray = ...; // already contains data
> 	Item[][nThreads] tmp;
> 	foreach (elem; input.parallel) {
> 		if (condition(elem)) {
> 			auto output = expensiveComputation(elem);
> 			tmp[threadId] ~= output;
> 		}
> 	}
> 	foreach (a; tmp)
> 		targetArray ~= a;
>
> Is there an easy way to achieve this with std.parallelism?  I looked over the API but there doesn't seem to be any obvious way for a task to know which thread it's running in, in order to know which tmp array it should append to.  If possible I'd like to avoid having to manually assign tasks to threads.

There's an example of exactly this in std.parallelism: https://dlang.org/phobos/std_parallelism.html#.TaskPool.workerIndex

In short:

    Item[] targetArray = ...; // already contains data
    // Get thread count from taskPool
    Item[][] tmp = new Item[][taskPool.size+1];
    foreach (elem; input.parallel) {
        if (condition(elem)) {
            auto output = expensiveComputation(elem);
            // Use workerIndex as index
            tmp[taskPool.workerIndex] ~= output;
        }
    }
    foreach (a; tmp)
        targetArray ~= a;

--
  Simen