Thread overview
GPGPU progess
5 days ago
Manu
5 days ago
Nicholas Wilson
5 days ago
Nicholas Wilson
5 days ago
I just caught up on the dconf material. It was a really interesting year, and I'm super bummed I missed it!

So I'm just starting out on some GPGPU work, and I love the direction we're going here.

Given this scenario;
I want to write CUDA kernels in D, and execute them in an existing C++ CUDA
app.
Can anyone point me at the getting started material, how do I achieve this
seemingly simple goal?
Has anyone started on this content? Where are the toolchains, how to I
build a CUDA kernel, how do I link/load it in my C++ app?

If this doesn't exist, there needs to be a wiki with getting started guide covering various usage environments. I don't have a lot of time to spare on detail, so success will depend on my ability to hit the ground running, and that's really just a matter of simple clear reference material.

How far are we from integration into LDC without using forked compilers?


5 days ago
On Thursday, 18 May 2017 at 05:39:52 UTC, Manu wrote:
> I just caught up on the dconf material. It was a really interesting year, and I'm super bummed I missed it!
>
> So I'm just starting out on some GPGPU work, and I love the direction we're going here.
>
> Given this scenario;
> I want to write CUDA kernels in D, and execute them in an existing C++ CUDA
> app.

Can do already although not all the intrinsics are there yet, but they are very simple to add. See https://github.com/libmir/dcompute/blob/master/source/dcompute/std/cuda/index.d for how easy it is (PRs welcome!)

> Can anyone point me at the getting started material, how do I achieve this
> seemingly simple goal?

If you're only interested in CUDA I think any LLVM 3.9.1 or 4.0 should work, If you want SPIRV get my LLVM from https://github.com/thewilsonator/llvm/tree/compute

Get the dcompute branch of LDC ( https://github.com/ldc-developers/ldc/tree/dcompute) and its submodules (i.e. druntime, I don't think it points to a branch anymore because Johan deleted that when I merged that into the regular ldc druntime but it will still work.) The dcompute branch of LDC is in the process of being merged into master so expect some breakage when updating to that.

> Has anyone started on this content? Where are the toolchains, how to I
> build a CUDA kernel, how do I link/load it in my C++ app?
>

There is some info on the dcompute wiki (https://github.com/libmir/dcompute/wiki) and I intend make this into "production quality" documentation (again PRs welcome or I can give you write access to the wiki), but I'm very busy at the moment.

When ldc runs you will get a kernels_cudaxxx_yy.ptx (where xxx is the CUDA compute capability specified on the command line and yy is 32 or 64 for 32 or 64bit) which should fit somewhere into your existing C++ pipeline.

> If this doesn't exist, there needs to be a wiki with getting started guide covering various usage environments. I don't have a lot of time to spare on detail, so success will depend on my ability to hit the ground running, and that's really just a matter of simple clear reference material.
>
> How far are we from integration into LDC without using forked compilers?

Soon™, it the first thing I'm going to do (if I don't get it done before) after I hand in my honours thesis at the start of July. I'm stupidly busy this semester so don't count on it done beforehand.

If you have any questions about anything probably the fastest way is the libmir or ldc gitter channels. (https://gitter.im/ldc-developers/main , https://gitter.im/libmir/public).

Good luck! I look forward to your experience report.
Nic

5 days ago
On Thursday, 18 May 2017 at 09:07:38 UTC, Nicholas Wilson wrote:
> When ldc runs you will get a kernels_cudaxxx_yy.ptx (where xxx is the CUDA compute capability specified on the command line and yy is 32 or 64 for 32 or 64bit) which should fit somewhere into your existing C++ pipeline.

Whoops, that assumes you have a CUDA driver API pipeline in your C++ code, which if you're asking I'm not sure that you have.
If you're using the `kernel<<<...>>>(args)` form to launch you kernels then you are going to have a lot more work to do in D because you'll need to use the driver API (http://docs.nvidia.com/cuda/cuda-driver-api/#axzz4hQLA0Zdm)
You'll need to:
*get a device
*create a context from it
*get a stream on that context
*load the ptx module (possibly linking it with other modules, to resolve missing symbols).
*compile it for the device
*then launch a kernel from that module on that device, by name passing the arguments in a void*[].

The sad thing is that its still nice than OpenCL because in OpenCL you have to pass the runtime args (with sizes) one by one to a function.

Hence why I want to automate as much of that shit as is possible.
 I hope to have that done ASAP, but I don't have hardware set up to test CUDA at the moment (I have one on my windows box but I don't have dev set up there) and I'll be working on OpenCL at the same time (and theres only so much horrible API I can take in a day).
 I'll be working on dcompute part-part-time next semester though so I should be able to get a fair bit done and quite a few others are interested so that'll speed thing up a bit.