January 13, 2022
On Thu, Jan 13, 2022 at 09:13:11PM +0000, jmh530 via Digitalmars-d wrote:
> On Thursday, 13 January 2022 at 20:58:25 UTC, H. S. Teoh wrote:
[...]
> > I'm not 100% sure why .parallel is @system, but I suspect it's because of potential issues with race conditions, since it does not prevent you from writing to the same local variable from multiple threads. If pointers are updated this way, it could lead to memory corruption problems.
[...]
> Could it be made @safe when used with const/immutable variables?

Apparently not, as Petar already pointed out.

But even besides access to non-shared local variables, there's also the long-standing issue that a function that receives a delegate cannot have stricter attributes than the delegate itself, i.e.:

	// NG: @safe function fun cannot call @system delegate dg.
	void fun(scope void delegate() @system dg) @safe {
		dg();
	}

	// You have to do this instead (i.e., delegate must be
	// restricted to be @safe):
	void fun(scope void delegate() @safe dg) @safe {
		dg();
	}

There's currently no way to express that the @safety of fun depends solely on the @safety of dg, such that if you pass in a @safe delegate, then fun should be regarded as @safe and allowed to be called from @safe code.

This is a problem because .parallel is implemented using .opApply, which takes a delegate argument. It accepts an unqualified delegate in order to be usable with both @system and @safe delegates. But this unfortunately means it must be @system, and therefore uncallable from @safe code.

Various proposals to fix this has been brought up before, but Walter either doesn't fully understand the issue, or else has some reasons he's not happy with the proposed solutions.  In fact he has proposed something that goes the *opposite* way to what should be done in order to address this problem.  Since both were shot down in the forum discussions, we're stuck at the current stalemate. :-(


T

-- 
Once bitten, twice cry...
January 13, 2022

On Thursday, 13 January 2022 at 21:13:11 UTC, jmh530 wrote:

>

On Thursday, 13 January 2022 at 20:58:25 UTC, H. S. Teoh wrote:

>

[snip]

I'm not 100% sure why .parallel is @system, but I suspect it's because of potential issues with race conditions, since it does not prevent you from writing to the same local variable from multiple threads. If pointers are updated this way, it could lead to memory corruption problems.

T

Could it be made @safe when used with const/immutable variables?

For some data to be @safe-ly accessible across threads it must have no "unshared aliasing", meaning that shared(const(T)) and immutable(T) are ok, but simply T and const(T) are not.

The reason why the .parallel example above was not safe, is because the body of the foreach was passed as a delegate to the ParallelForeach.opApply and the problem is that delegates can access unshared mutable data through their closure. If the @safe-ty holes regarding delegates are closed, presumably we could add a ParallelForeach.opApply overload that took a @safe delegate and then the whole main function could be marked as @safe.

I think back when the module was under active development, the authors did carefully consider the @safe-ty aspects, as they have written code that conditionally enables some function overloads to be @trusted, depending on the parameters they receive. But in the end it was the best they could given the state of the language at the time. Most likely the situation has improved sufficiently that more the of the API could be made (at least conditionally) @safe.

You can check the various comments explaining the situation:

January 13, 2022

On Thursday, 13 January 2022 at 21:44:13 UTC, H. S. Teoh wrote:

>

On Thu, Jan 13, 2022 at 09:13:11PM +0000, jmh530 via Digitalmars-d wrote:

>

On Thursday, 13 January 2022 at 20:58:25 UTC, H. S. Teoh wrote:
[...]

>

[...]
[...]
Could it be made @safe when used with const/immutable variables?

Apparently not, as Petar already pointed out.

But even besides access to non-shared local variables, there's also the long-standing issue that a function that receives a delegate cannot have stricter attributes than the delegate itself, i.e.:

// NG: @safe function fun cannot call @system delegate dg.
void fun(scope void delegate() @system dg) @safe {
dg();
}

// You have to do this instead (i.e., delegate must be
// restricted to be @safe):
void fun(scope void delegate() @safe dg) @safe {
dg();
}

There's currently no way to express that the @safety of fun depends solely on the @safety of dg, such that if you pass in a @safe delegate, then fun should be regarded as @safe and allowed to be called from @safe code.

This is a problem because .parallel is implemented using .opApply, which takes a delegate argument. It accepts an unqualified delegate in order to be usable with both @system and @safe delegates. But this unfortunately means it must be @system, and therefore uncallable from @safe code.

Various proposals to fix this has been brought up before, but Walter either doesn't fully understand the issue, or else has some reasons he's not happy with the proposed solutions. In fact he has proposed something that goes the opposite way to what should be done in order to address this problem. Since both were shot down in the forum discussions, we're stuck at the current stalemate. :-(

T

There are two DIPs that aim to address the attribute propagation problem:

January 13, 2022

On Thursday, 13 January 2022 at 21:39:07 UTC, Bruce Carneal wrote:

>
  1. Any resultant increase in support load would fall on one volunteer (that is not me) and

Yes, that is not a good situation…

The caveat is that if fewer people use dcompute, then fewer people will help out with it, then it will take more time to reach a state where it is "ready"…

Showing how/when dcompute improves performance on standard desktop computers might make more people interested in participating.

Are there some performance benchmarks on modest hardware? (e.g. a standard macbook, imac or mac mini) Benchmarks that compares dcompute to CPU with auto-vectorization (SIMD)?

January 13, 2022

On Thursday, 13 January 2022 at 20:38:19 UTC, Bruce Carneal wrote:

>

Ethan might have a sufficiently compelling economic case for promoting dcompute to his company in the relatively near future. Nicholas recently addressed their need for access to the texture hardware and fitting within their work flow, but there may be other requirements... An adoption by a world class game studio would, of course, be very good news but I think Ethan is slammed (perpetually, and in a mostly good way, I think) so it might be a while.

As a former GPGPU guy: can you explain in what ways dcompute improves life over using CUDA and OpenCL through DerelictCL/DerelictCUDA (I used to maintain them and I think nobody ever used them). Using the API directly seems to offer the most control to me, and no special compiler support.

January 13, 2022
On Thursday, 13 January 2022 at 21:44:13 UTC, H. S. Teoh wrote:
> [snip]
> Various proposals to fix this has been brought up before, but Walter either doesn't fully understand the issue, or else has some reasons he's not happy with the proposed solutions.  In fact he has proposed something that goes the *opposite* way to what should be done in order to address this problem.  Since both were shot down in the forum discussions, we're stuck at the current stalemate. :-(
>
>
> T

Thanks for the detailed explanation. Maybe the new DIPs can make a better effort at the beginning to communicate the issue (such as this example).
January 13, 2022

On Thursday, 13 January 2022 at 21:51:10 UTC, Petar Kirov [ZombineDev] wrote:

>

[snip]

Thanks for the detailed explanation.

January 14, 2022

On Thursday, 13 January 2022 at 21:06:45 UTC, Ola Fosheim Grøstad wrote:

>

On Thursday, 13 January 2022 at 20:38:19 UTC, Bruce Carneal wrote:

>

I know, right? Ridiculously big opportunity/effort ratio for dlang and near zero awareness...

If dcompute is here to stay, why not put it in the official documentation for D as an "optional" part of the spec?

I honestly assumed that it was unsupported and close to dead as I had not heard much about it for a long time.

I suppose that's my fault for not marketing more, the code generation is tested in LDC's CI pipelines so that is unlikely to break, and the library is built on slow moving APIs that are also unlikely to break. Just because it doesn't get a lot of commits doesn't mean its going to stop working.

As for specification, I think that would be a wasted effort and too constraining. On the compiler side, it is mostly using the existing LDC infrastructure with (more than) a few hacks to get everything to stick together, and it is heavily dependant on LDC and LLVM internals to be part of the D spec. On the runtime side of it, I fear specification would either be too constraining or end up out of sync with the implementation.

January 14, 2022

On Thursday, 13 January 2022 at 23:28:01 UTC, Guillaume Piolat wrote:

>

As a former GPGPU guy: can you explain in what ways dcompute improves life over using CUDA and OpenCL through DerelictCL/DerelictCUDA (I used to maintain them and I think nobody ever used them). Using the API directly seems to offer the most control to me, and no special compiler support.

It is entirely possible to use dcompute as simply a wrapper over OpenCL/CUDA and benefit from the enhanced usability that it offers (e.g. querying OpenCL API objects for their properties is faaaar simpler and less error prone with dcompute) because it exposes the underlying API objects 1:1, and you can always get the raw pointer and do things manually if you need to. Also dcompute uses DerelictCL/DerelictCUDA underneath anyway (thanks for them!).

If you're thinking of "special compiler support" as what CUDA does with its <<<>>>, then no, dcompute does all of that, but not with special help from the compiler, only with what meta programming and reflection is available to any other D program.
It's D all the way down to the API calls. Obviously there is special compiler support to turn D code into compute kernels.

The main benefit of dcompute is turning kernel launches into type safe one-liners, as opposed to brittle, type unsafe, paragraphs of code.

January 14, 2022

On Thursday, 13 January 2022 at 23:28:01 UTC, Guillaume Piolat wrote:

>

On Thursday, 13 January 2022 at 20:38:19 UTC, Bruce Carneal wrote:

>

Ethan might have a sufficiently compelling economic case for promoting dcompute to his company in the relatively near future. Nicholas recently addressed their need for access to the texture hardware and fitting within their work flow, but there may be other requirements... An adoption by a world class game studio would, of course, be very good news but I think Ethan is slammed (perpetually, and in a mostly good way, I think) so it might be a while.

As a former GPGPU guy: can you explain in what ways dcompute improves life over using CUDA and OpenCL through DerelictCL/DerelictCUDA (I used to maintain them and I think nobody ever used them). Using the API directly seems to offer the most control to me, and no special compiler support.

For me there were several things, including:

  1. the dcompute kernel invocation was simpler, made more sense, letting me easily create invocation abstractions to my liking (partially memoized futures in my case but easy enough to do other stuff)

  2. the kernel meta programming was much friendlier generally, of course.

  3. the D nested function capability, in conjunction with better meta programming, enabled great decomposition, intra kernel. You could get the compiler to keep everything within the maximum-dispatch register limit (64) with ease, with readable code.

  4. using the above I found it easy to reduce/minimize memory traffic, an important consideration in that much of my current work is memory bound. Trivial example: use static foreach to logically unroll a window neighborhood algorithm eliminating both unnecessary loads and all extraneous reg-to-reg moves as you naturally mod around.

It's not that you that you can't do such things in CUDA/C++, eventually, sometimes, after quite a bit of discomfort, once you acquire your level-bazillion C++ meta programming merit badge, it's that it's all so much easier to do in dcompute. You get to save the heroics for something else.

I'm sure that new idioms/benefits will emerge with additional use (this was my first dcompute project) but, as you will have noticed :-), I'm already hooked.

WRT OpenCL I don't have much to say. From what I gather people consider OpenCL to be even less hospitable than CUDA, preferring OpenCL mostly (only?) for its non-proprietary status. I'd be interested to hear from OpenCL gurus on this topic.

Finally, if any of the above doesn't make sense, or you'd like to discuss it further, I suggest we meet up at beerconf. I'd also love to talk about data parallel latency sensitive coding strategies, about how we should deal with HW capability variation, about how we can introduce data parallelism to many more in the dlang community, ...