Thread overview
DCompute: First kernels run successfully
Sep 11, 2017
Nicholas Wilson
Sep 11, 2017
jmh530
Sep 11, 2017
kerdemdemir
Sep 11, 2017
Nicholas Wilson
Sep 11, 2017
Walter Bright
Sep 11, 2017
Nicholas Wilson
September 11, 2017
I'm pleased to announce that I have run the first dcompute kernel and it was a success!

There is still a fair bit of polish to the driver needed to make the API sane and more complete, not to mention more similar to the (untested) OpenCL driver API. But it works!
(Contributions are of course greatly welcomed)

The kernel:
```
@compute(CompileFor.deviceOnly)
module dcompute.tests.dummykernels;

import ldc.dcompute;
import dcompute.std.index;

@kernel void saxpy(GlobalPointer!(float) res,
                   float alpha,GlobalPointer!(float) x,
                   GlobalPointer!(float) y,
                   size_t N)
{
    auto i = GlobalIndex.x;
    if (i >= N) return;
    res[i] = alpha*x[i] + y[i];
}
```

The host code:
```
import dcompute.driver.cuda;
import dcompute.tests.dummykernels : saxpy;

Platform.initialise();

auto devs   = Platform.getDevices(theAllocator);
auto ctx    = Context(devs[0]); scope(exit) ctx.detach();

// Change the file to match your GPU.
Program.globalProgram = Program.fromFile("./.dub/obj/kernels_cuda210_64.ptx");
auto q = Queue(false);

enum size_t N = 128;
float alpha = 5.0;
float[N] res, x,y;
foreach (i; 0 .. N)
{
    x[i] = N - i;
    y[i] = i * i;
}
Buffer!(float) b_res, b_x, b_y;
b_res      =  Buffer!(float)(res[]); scope(exit) b_res.release();
b_x        =  Buffer!(float)(x[]);   scope(exit) b_x.release();
b_y        =  Buffer!(float)(y[]);   scope(exit) b_y.release();

b_x.copy!(Copy.hostToDevice); // not quite sold on this interface yet.
b_y.copy!(Copy.hostToDevice);

q.enqueue!(saxpy)  // <-- the main magic happens here
    ([N,1,1],[1,1,1])   // the grid
    (b_res,alpha,b_x,b_y, N); // the kernel arguments

b_res.copy!(Copy.deviceToHost);
foreach(i; 0 .. N)
    enforce(res[i] == alpha * x[i] + y[i]);
writeln(res[]); // [640, 636, ... 16134]
```

Simple as that!

Dcompute, as always, is at https://github.com/libmir/dcompute and on dub.

To successfully run the dcompute CUDA test you will need a very recent LDC (less than two days) with the NVPTX backend* enabled along with a CUDA environment and an Nvidia GPU.

*Or wait for LDC 1.4 release real soon(™).

Thanks to the LDC folks for putting up with me ;)

Have fun GPU programming,
Nic
September 11, 2017
On Monday, 11 September 2017 at 12:23:16 UTC, Nicholas Wilson wrote:
> I'm pleased to announce that I have run the first dcompute kernel and it was a success!
>

Keep up the good work.
September 11, 2017
Hi Wilson,

Since I believe GPU-CPU hybrid programming is the future I believe you are doing a great job for your and D lang's future.

> To successfully run the dcompute CUDA test you will need a very recent LDC (less than two days) with the NVPTX backend* enabled along with a CUDA environment and an Nvidia GPU.
>
> *Or wait for LDC 1.4 release real soon(™).

Can you please describe a bit about for starters like me how to build recent LDC.

Is this "NVPTX backend" a cmake option?
And what should I do for making my "CUDA environment" ready? Which packages should I install?

Sorry if my questions are so dummy I hope I will be able to add an example.

Regards
Erdem
September 11, 2017
On 9/11/2017 5:23 AM, Nicholas Wilson wrote:
> I'm pleased to announce that I have run the first dcompute kernel and it was a success!

Excellent!


https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAAgvAAAAJDY4OTI4MmE0LTVlZDgtNGQzYy1iN2U1LWU5Nzk1NjlhNzIwNg.jpg
September 11, 2017
On Monday, 11 September 2017 at 20:45:43 UTC, kerdemdemir wrote:
> Hi Wilson,
>
> Since I believe GPU-CPU hybrid programming is the future I believe you are doing a great job for your and D lang's future.
>
>> To successfully run the dcompute CUDA test you will need a very recent LDC (less than two days) with the NVPTX backend* enabled along with a CUDA environment and an Nvidia GPU.
>>
>> *Or wait for LDC 1.4 release real soon(™).
>
> Can you please describe a bit about for starters like me how to build recent LDC.
>
> Is this "NVPTX backend" a cmake option?
> And what should I do for making my "CUDA environment" ready? Which packages should I install?
>
> Sorry if my questions are so dummy I hope I will be able to add an example.
>
> Regards
> Erdem


Hi Erdem

Sorry I've been a bit busy with uni. To build LDC just clone ldc and `git submodule --init` and run cmake, setting LLVM_CONFIG to /path/to/llvm/build/bin/llvm-config and LLVM_INTRINSIC_TD_PATH to /path/to/llvm/source/include/llvm/IR

The nvptx backend is enabled by setting LLVM's cmake variable LLVM_TARGETS_TO_BUILD to either "all", or "X86;NVPTX" along with any other archs you want to enable, (without the quotes) and then building LLVM with cmake. This will get picked up by LDC automatically.

I just installed the CUDA sdk in its entirety, but I'm sure you don't need everything from it.

September 11, 2017
On Monday, 11 September 2017 at 22:40:02 UTC, Walter Bright wrote:
> On 9/11/2017 5:23 AM, Nicholas Wilson wrote:
>> I'm pleased to announce that I have run the first dcompute kernel and it was a success!
>
> Excellent!
>
>
> https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAAgvAAAAJDY4OTI4MmE0LTVlZDgtNGQzYy1iN2U1LWU5Nzk1NjlhNzIwNg.jpg

Indeed the the world domination begin!

I just need to get some OpenCL 2.0 capable hardware to test that and we'll be well on the way.

AlsoLDC1.4 was just released Yay!