DCompute - Native heterogeneous computing for D - is here! (page 2)

On Monday, 27 February 2017 at 13:55:23 UTC, Guillaume Piolat wrote: > On Sunday, 26 February 2017 at 08:37:29 UTC, Nicholas Wilson wrote: >> This will enable writing kernels in D utilising all of D's meta programming goodness across the device divide and will allow launching those kernels with a level of ease on par with CUDA's <<<...>>> syntax. > > Interesting to write kernels in D, since a limitation of CUDA is that you need to multiply the entry points to instantiate a template differently, and a limitation of OpenCL C is that you need templates and includes in the first place. > Wait you mean you have to explicitly instantiate every instance of a templated kernel? Ouch. In D all you need do is have a reference to it somewhere, taking it's .mangleof suffices and is (part of) how the example below will achieve its elegance. I should first emphasise the future tense of the second half of the sentence you quoted. > How does this work? DCompute (the compiler infrastructure) is currently capable of building .ptx and .spv as part of the compilation process. They can be used directly in any process pipeline you may have already. > Does the host code need something like DerelictCL/CUDA to work? If you want to call the kernel, yes. The eventual goal of DCompute (the D infrastructure) is to fully wrap and unify and abstract the OpeCL/CUDA runtime libraries (most likely provided by Derelict), and have something like: ``` Queue q = ...; Buffer b = ...; q.enqueue!(myTemplatedKernel!(Foo,bar,baz => myTransform(baz)))(b,other, args); ``` Although, there is no need to wait until DCompute reaches that point to use it, you would just have to do the (rather painful) API bashing yourself.

On Monday, 27 February 2017 at 23:02:43 UTC, Nicholas Wilson wrote: >> Interesting to write kernels in D, since a limitation of CUDA is that you need to multiply the entry points to instantiate a template differently, and a limitation of OpenCL C is that you need templates and includes in the first place. >> > > Wait you mean you have to explicitly instantiate every instance of a templated kernel? Ouch. IIRC, that entry point explosion happens in CUDA when you separate strictly host and device code. Not sure for mixed mode as I've never used that. > I should first emphasise the future tense of the second half of the sentence you quoted. > >> How does this work? > > DCompute (the compiler infrastructure) is currently capable of building .ptx and .spv as part of the compilation process. They can be used directly in any process pipeline you may have already. .ptx, got it. >> Does the host code need something like DerelictCL/CUDA to work? > > If you want to call the kernel, yes. The eventual goal of DCompute (the D infrastructure) is to fully wrap and unify and abstract the OpeCL/CUDA runtime libraries (most likely provided by Derelict), and have something like: Interesting. Let me know if you need more things in OpenCL bindings.

February 28, 2017

Re: DCompute - Native heterogeneous computing for D - is here!

Posted by Nicholas Wilson
in reply to Nicholas Wilson

Permalink

Nicholas Wilson

Posted in reply to Nicholas Wilson

Permalink

On Sunday, 26 February 2017 at 08:37:29 UTC, Nicholas Wilson wrote:
> DCompute is an extension to LDC capable of generating code (with no language changes*) for NVIDIA's NVPTX for use with CUDA, SPIRV for use with the OpenCL runtime, and of course the host, all at the same time! It is also possible to share implementation of algorithms across the host and device.
> This will enable writing kernels in D utilising all of D's meta programming goodness across the device divide and will allow launching those kernels with a level of ease on par with CUDA's <<<...>>> syntax. I hope to be giving a talk at DConf2017 about this ;), what it enables us to do, what still needs to be done and future plans.
>
> DCompute supports all of OpenCL except Images and Pipes (support is planned though).
> I haven't done any test for CUDA so I'm not sure about the extent of support for it, all of the math stuff works, images/textures not so sure.
>
> Many thanks to the ldc team (especially Johan) for their guidance and patience, Ilya for reminding me that I should upstream my work and John Colvin for his DConf2016 talk for making me think 'surely compiler support can't be too hard'. 10 months later: here it is!
>
> The DCompute compiler is available at the dcompute branch of ldc [0], you will need my fork of llvm here[1] and the SPIRV submodule that comes with it [2] as the llvm to link against. There is also a tool for interconversion [3] (I've mucked up the submodules a bit, sorry, just clone it into 'tools/llvm-spirv', it's not necessary anyway). The device standard library and drivers (both WIP) are available here[4].
>
> Please sent bug reports to their respective components, although I'm sure I'll see them anyway regardless of where they go.
>
> [0]: https://github.com/ldc-developers/ldc/tree/dcompute
> [1]: https://github.com/thewilsonator/llvm/tree/compute
> [2]: https://github.com/thewilsonator/llvm-target-spirv
> [3]: https://github.com/thewilsonator/llvm-tool-spirv
> [4]: https://github.com/libmir/dcompute
>
> * modulo one hack related to resolving intrinsics because there is no static context (i.e. static if) for the device(s). Basically a 'codegen time if'.

An simple example because I forgot.

```
@compute(CompileFor.deviceOnly) module example;
import ldc.attributes;
import ldc.dcomputetypes;
import dcompute.std.index;

@kernel void test(GlobalPointer!float a, GlobalPointer!float b)
{
    auto idx = GlobalIndex.x;
    a[idx] = a[idx] + b[idx];
}
```

then compile with `ldc -mdcompute-targets=ocl-220,cuda-500 example.d -I/path/to/dcompute`. It will produce two files, kernels_ocl220_64.spv and kernels_cuda500_64.ptx when built in 64-bit mode and kernels_ocl220_32.spv and kernels_cuda500_32.ptx in 32 bit mode.

Forums