Jump to page: 1 25  
Page
Thread overview
May 06
I have been using std.parallelism and that has worked quite nicely but it is not fully utilising all the cpu resources in my computation so I though it could be good to run it concurrently to see if I can get better performance. However I am very new to std.concurrency and the baby version of the code I am trying to run:

```
void main()
{
  import std.concurrency;
  import std.stdio: writeln;

  void process(double x, double y, long i, shared(double[]) z)
  {
    z[i] = x*y;
  }
  long n = 100;
  shared(double[]) z = new double[n];
  for(long i = 0; i < n; ++i)
  {
    spawn(&process, cast(double)(i), cast(double)(i + 1), i, z);
  }
  writeln("z: ", z);
}
```


Illicits the following error:

```
onlineapp.d(14): Error: template std.concurrency.spawn cannot deduce function from argument types !()(void delegate(double x, double y, long i, shared(double[]) z) pure nothrow @nogc @safe, double, double, long, shared(double[])), candidates are:
/dlang/dmd/linux/bin64/../../src/phobos/std/concurrency.d(460):   
     spawn(F, T...)(F fn, T args)
  with F = void delegate(double, double, long, shared(double[])) pure nothrow @nogc @safe,
       T = (double, double, long, shared(double[]))
  must satisfy the following constraint:
       isSpawnable!(F, T)
```


May 06
On Wednesday, 6 May 2020 at 03:25:41 UTC, data pulverizer wrote:
> [...]

The problem here is that `process` is a delegate, not a function. The compiler *should* know it's a function, but for some reason it does not. Making the function static, or moving it outside of the scope of main, will fix it.

For reference, this will spawn 100 threads to do a simple computation so probably not what you would want, I expect. But I suppose this is just example code and the underlying computation is much more expensive ?
May 06
On Wednesday, 6 May 2020 at 03:33:12 UTC, Mathias LANG wrote:
> On Wednesday, 6 May 2020 at 03:25:41 UTC, data pulverizer wrote:
>> [...]
>
> The problem here is that `process` is a delegate, not a function. The compiler *should* know it's a function, but for some reason it does not. Making the function static, or moving it outside of the scope of main, will fix it.

I moved the `process` function out of main and it is now running but it prints out

```
z: [nan, 2, nan, 12, 20, nan, nan, nan, nan, 90, nan, 132, nan, nan, 210, nan, nan, nan, nan, nan, nan, nan, nan, nan, 600, nan, nan, nan, nan, nan, 930, 992, 1056, nan, 1190, nan, nan, nan, nan, nan, 1640, 1722, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 3080, nan, nan, 3422, 3540, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 8010, nan, nan, nan, nan, nan, nan, 9312, nan, nan, 9900]
```
Is there something I need to do to wait for each thread to finish computation?

> For reference, this will spawn 100 threads to do a simple computation so probably not what you would want, I expect. But I suppose this is just example code and the underlying computation is much more expensive ?

Yes, that's exactly what I want the actual computation I'm running is much more expensive and much larger. It shouldn't matter if I have like 100_000_000 threads should it? The threads should just be queued until the cpu works on it?

Thanks
May 06
06.05.2020 06:25, data pulverizer пишет:
> 
> ```
> onlineapp.d(14): Error: template std.concurrency.spawn cannot deduce function from argument types !()(void delegate(double x, double y, long i, shared(double[]) z) pure nothrow @nogc @safe, double, double, long, shared(double[])), candidates are:
> /dlang/dmd/linux/bin64/../../src/phobos/std/concurrency.d(460):      spawn(F, T...)(F fn, T args)
>    with F = void delegate(double, double, long, shared(double[])) pure nothrow @nogc @safe,
>         T = (double, double, long, shared(double[]))
>    must satisfy the following constraint:
>         isSpawnable!(F, T)
> ```
> 

I think the problem is in `process` attributes (error message you posted is strange, is it the full message?)
Make your `process` function a template one to let the compiler to deduce its attributes. Or set them manually.
May 05
On 5/5/20 8:41 PM, data pulverizer wrote:> On Wednesday, 6 May 2020 at 03:33:12 UTC, Mathias LANG wrote:
>> On Wednesday, 6 May 2020 at 03:25:41 UTC, data pulverizer wrote:

> Is there something I need to do to wait for each thread to finish
> computation?

thread_joinAll(). I have an example here:

  http://ddili.org/ders/d.en/concurrency.html#ix_concurrency.thread_joinAll

Although I understand that you're experimenting with std.concurrency, I want to point out that there is also std.parallelism, which may be better suited in many cases. Again, here are some examples:

  http://ddili.org/ders/d.en/parallelism.html

Ali

May 06
On Wednesday, 6 May 2020 at 03:41:11 UTC, data pulverizer wrote:
>
> Is there something I need to do to wait for each thread to finish computation?

Yeah, you need to synchronize so that your main thread wait on all the other threads to finish.
Look up `Thread.join`.

> Yes, that's exactly what I want the actual computation I'm running is much more expensive and much larger. It shouldn't matter if I have like 100_000_000 threads should it? The threads should just be queued until the cpu works on it?

It does matter quite a bit. Each thread has its own resources allocated to it, and some part of the language will need to interact with *all* threads, e.g. the GC.
In general, if you want to parallelize something, you should aim to have as many threads as you have cores. Having 100M threads will mean you have to do a lot of context switches. You might want to look up the difference between tasks and threads.
May 06
On Wednesday, 6 May 2020 at 03:56:04 UTC, Ali Çehreli wrote:
> On 5/5/20 8:41 PM, data pulverizer wrote:> On Wednesday, 6 May 2020 at 03:33:12 UTC, Mathias LANG wrote:
> >> On Wednesday, 6 May 2020 at 03:25:41 UTC, data pulverizer
> wrote:
>
> > Is there something I need to do to wait for each thread to
> finish
> > computation?
>
> thread_joinAll(). I have an example here:
>
>   http://ddili.org/ders/d.en/concurrency.html#ix_concurrency.thread_joinAll

This worked nicely thank you very much

> ... I want to point out that there is also std.parallelism, which may be better suited in many cases.

I actually started off using std.parallelism and it worked well but the CPU usage on all the threads was less than half on my system monitor meaning there is more performance to be wrung out of my computer, which is why I am now looking into spawn. When you suggested using thread_joinAll() I saw that is in `core.thread.osthread` module. It might be shaving the yak this point but I have tried using `Thread` instead of `spawn`:

```
void process(double x, double y, long i, shared(double[]) z)
{
  z[i] = x*y;
}

void main()
{
  import core.thread.osthread;
  import std.stdio: writeln;

  long n = 100;
  shared(double[]) z = new double[n];
  for(long i = 0; i < n; ++i)
  {
    auto proc = (){
      process(cast(double)(i), cast(double)(i + 1), i, z);
      return;
    };
    new Thread(&proc).start();
  }
  thread_joinAll();
  writeln("z: ", z);
}
```
and I am getting the following error:

```
onlineapp.d(20): Error: none of the overloads of this are callable using argument types (void delegate() @system*), candidates are:
/dlang/dmd/linux/bin64/../../src/druntime/import/core/thread/osthread.d(646):        core.thread.osthread.Thread.this(void function() fn, ulong sz = 0LU)
/dlang/dmd/linux/bin64/../../src/druntime/import/core/thread/osthread.d(671):        core.thread.osthread.Thread.this(void delegate() dg, ulong sz = 0LU)
/dlang/dmd/linux/bin64/../../src/druntime/import/core/thread/osthread.d(1540):        core.thread.osthread.Thread.this(ulong sz = 0LU)
```



May 06
On Wednesday, 6 May 2020 at 04:04:14 UTC, Mathias LANG wrote:
> On Wednesday, 6 May 2020 at 03:41:11 UTC, data pulverizer wrote:
>> Yes, that's exactly what I want the actual computation I'm running is much more expensive and much larger. It shouldn't matter if I have like 100_000_000 threads should it? The threads should just be queued until the cpu works on it?
>
> It does matter quite a bit. Each thread has its own resources allocated to it, and some part of the language will need to interact with *all* threads, e.g. the GC.
> In general, if you want to parallelize something, you should aim to have as many threads as you have cores. Having 100M threads will mean you have to do a lot of context switches. You might want to look up the difference between tasks and threads.

Sorry, I meant 10_000 not 100_000_000 I square the number by mistake because I'm calculating a 10_000 x 10_000 matrix it's only 10_000 tasks, so 1 task does 10_000 calculations. The actual bit of code I'm parallelising is here:

```
auto calculateKernelMatrix(T)(AbstractKernel!(T) K, Matrix!(T) data)
{
  long n = data.ncol;
  auto mat = new Matrix!(T)(n, n);

  foreach(j; taskPool.parallel(iota(n)))
  {
    auto arrj = data.refColumnSelect(j).array;
    for(long i = j; i < n; ++i)
    {
      mat[i, j] = K.kernel(data.refColumnSelect(i).array, arrj);
      mat[j, i] = mat[i, j];
    }
  }
  return mat;
}
```

At the moment this code is running a little bit faster than threaded simd optimised Julia code, but as I said in an earlier reply to Ali when I look at my system monitor, I can see that all the D threads are active and running at ~ 40% usage, meaning that they are mostly doing nothing. The Julia code runs all threads at 100% and is still a tiny bit slower so my (maybe incorrect?) assumption is that I could get more performance from D. The method `refColumnSelect(j).array` is (trying to) reference a column from a matrix (1D array with computed index referencing) which I select from the matrix using:

```
return new Matrix!(T)(data[startIndex..(startIndex + nrow)], [nrow, 1]);
```

If I use the above code, I am I wrong in assuming that the sliced data (T[]) is referenced rather than copied? That so if I do:

```
auto myData = data[5...10];
```

myData is referencing elements [5..10] of data and not creating a new array with elements data[5..10] copied?
May 06
06.05.2020 07:25, data pulverizer пишет:
> On Wednesday, 6 May 2020 at 03:56:04 UTC, Ali Çehreli wrote:
>> On 5/5/20 8:41 PM, data pulverizer wrote:> On Wednesday, 6 May 2020 at 03:33:12 UTC, Mathias LANG wrote:
>> >> On Wednesday, 6 May 2020 at 03:25:41 UTC, data pulverizer
>> wrote:
>>
>> > Is there something I need to do to wait for each thread to
>> finish
>> > computation?
>>
>> thread_joinAll(). I have an example here:
>>
>> http://ddili.org/ders/d.en/concurrency.html#ix_concurrency.thread_joinAll
> 
> This worked nicely thank you very much
> 
>> ... I want to point out that there is also std.parallelism, which may be better suited in many cases.
> 
> I actually started off using std.parallelism and it worked well but the CPU usage on all the threads was less than half on my system monitor meaning there is more performance to be wrung out of my computer, which is why I am now looking into spawn. When you suggested using thread_joinAll() I saw that is in `core.thread.osthread` module. It might be shaving the yak this point but I have tried using `Thread` instead of `spawn`:
> 
> ```
> void process(double x, double y, long i, shared(double[]) z)
> {
>    z[i] = x*y;
> }
> 
> void main()
> {
>    import core.thread.osthread;
>    import std.stdio: writeln;
> 
>    long n = 100;
>    shared(double[]) z = new double[n];
>    for(long i = 0; i < n; ++i)
>    {
>      auto proc = (){
>        process(cast(double)(i), cast(double)(i + 1), i, z);
>        return;
>      };

proc is already a delegate, so &proc is a pointer to the delegate, just pass a `proc` itself

>      new Thread(&proc).start();
>    }
>    thread_joinAll();
>    writeln("z: ", z);
> }
> ```
> and I am getting the following error:
> 
> ```
> onlineapp.d(20): Error: none of the overloads of this are callable using argument types (void delegate() @system*), candidates are:
> /dlang/dmd/linux/bin64/../../src/druntime/import/core/thread/osthread.d(646):        core.thread.osthread.Thread.this(void function() fn, ulong sz = 0LU)
> /dlang/dmd/linux/bin64/../../src/druntime/import/core/thread/osthread.d(671):        core.thread.osthread.Thread.this(void delegate() dg, ulong sz = 0LU)
> /dlang/dmd/linux/bin64/../../src/druntime/import/core/thread/osthread.d(1540):        core.thread.osthread.Thread.this(ulong sz = 0LU)
> ```
> 
> 
> 

May 06
On Wednesday, 6 May 2020 at 04:52:30 UTC, data pulverizer wrote:
> myData is referencing elements [5..10] of data and not creating a new array with elements data[5..10] copied?

Just checked this and can confirm that the data is not being copied so that is not the source of cpu idling: https://ddili.org/ders/d.en/slices.html

« First   ‹ Prev
1 2 3 4 5