DCompute: First kernels run successfully

Sep 11, 2017

Nicholas Wilson

Sep 11, 2017

jmh530

Sep 11, 2017

Sep 11, 2017

Sep 11, 2017

Sep 11, 2017

September 11, 2017

DCompute: First kernels run successfully

Posted by Nicholas Wilson

Permalink

Nicholas Wilson

Permalink

I'm pleased to announce that I have run the first dcompute kernel and it was a success!

There is still a fair bit of polish to the driver needed to make the API sane and more complete, not to mention more similar to the (untested) OpenCL driver API. But it works!
(Contributions are of course greatly welcomed)

The kernel:
```
@compute(CompileFor.deviceOnly)
module dcompute.tests.dummykernels;

import ldc.dcompute;
import dcompute.std.index;

@kernel void saxpy(GlobalPointer!(float) res,
                   float alpha,GlobalPointer!(float) x,
                   GlobalPointer!(float) y,
                   size_t N)
{
    auto i = GlobalIndex.x;
    if (i >= N) return;
    res[i] = alpha*x[i] + y[i];
}
```

The host code:
```
import dcompute.driver.cuda;
import dcompute.tests.dummykernels : saxpy;

Platform.initialise();

auto devs   = Platform.getDevices(theAllocator);
auto ctx    = Context(devs[0]); scope(exit) ctx.detach();

// Change the file to match your GPU.
Program.globalProgram = Program.fromFile("./.dub/obj/kernels_cuda210_64.ptx");
auto q = Queue(false);

enum size_t N = 128;
float alpha = 5.0;
float[N] res, x,y;
foreach (i; 0 .. N)
{
    x[i] = N - i;
    y[i] = i * i;
}
Buffer!(float) b_res, b_x, b_y;
b_res      =  Buffer!(float)(res[]); scope(exit) b_res.release();
b_x        =  Buffer!(float)(x[]);   scope(exit) b_x.release();
b_y        =  Buffer!(float)(y[]);   scope(exit) b_y.release();

b_x.copy!(Copy.hostToDevice); // not quite sold on this interface yet.
b_y.copy!(Copy.hostToDevice);

q.enqueue!(saxpy)  // <-- the main magic happens here
    ([N,1,1],[1,1,1])   // the grid
    (b_res,alpha,b_x,b_y, N); // the kernel arguments

b_res.copy!(Copy.deviceToHost);
foreach(i; 0 .. N)
    enforce(res[i] == alpha * x[i] + y[i]);
writeln(res[]); // [640, 636, ... 16134]
```

Simple as that!

Dcompute, as always, is at https://github.com/libmir/dcompute and on dub.

To successfully run the dcompute CUDA test you will need a very recent LDC (less than two days) with the NVPTX backend* enabled along with a CUDA environment and an Nvidia GPU.

*Or wait for LDC 1.4 release real soon(™).

Thanks to the LDC folks for putting up with me ;)

Have fun GPU programming,
Nic

Hi Wilson, Since I believe GPU-CPU hybrid programming is the future I believe you are doing a great job for your and D lang's future. > To successfully run the dcompute CUDA test you will need a very recent LDC (less than two days) with the NVPTX backend* enabled along with a CUDA environment and an Nvidia GPU. > > *Or wait for LDC 1.4 release real soon(™). Can you please describe a bit about for starters like me how to build recent LDC. Is this "NVPTX backend" a cmake option? And what should I do for making my "CUDA environment" ready? Which packages should I install? Sorry if my questions are so dummy I hope I will be able to add an example. Regards Erdem

On 9/11/2017 5:23 AM, Nicholas Wilson wrote: > I'm pleased to announce that I have run the first dcompute kernel and it was a success! Excellent! https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAAgvAAAAJDY4OTI4MmE0LTVlZDgtNGQzYy1iN2U1LWU5Nzk1NjlhNzIwNg.jpg

On Monday, 11 September 2017 at 20:45:43 UTC, kerdemdemir wrote: > Hi Wilson, > > Since I believe GPU-CPU hybrid programming is the future I believe you are doing a great job for your and D lang's future. > >> To successfully run the dcompute CUDA test you will need a very recent LDC (less than two days) with the NVPTX backend* enabled along with a CUDA environment and an Nvidia GPU. >> >> *Or wait for LDC 1.4 release real soon(™). > > Can you please describe a bit about for starters like me how to build recent LDC. > > Is this "NVPTX backend" a cmake option? > And what should I do for making my "CUDA environment" ready? Which packages should I install? > > Sorry if my questions are so dummy I hope I will be able to add an example. > > Regards > Erdem Hi Erdem Sorry I've been a bit busy with uni. To build LDC just clone ldc and `git submodule --init` and run cmake, setting LLVM_CONFIG to /path/to/llvm/build/bin/llvm-config and LLVM_INTRINSIC_TD_PATH to /path/to/llvm/source/include/llvm/IR The nvptx backend is enabled by setting LLVM's cmake variable LLVM_TARGETS_TO_BUILD to either "all", or "X86;NVPTX" along with any other archs you want to enable, (without the quotes) and then building LLVM with cmake. This will get picked up by LDC automatically. I just installed the CUDA sdk in its entirety, but I'm sure you don't need everything from it.

On Monday, 11 September 2017 at 22:40:02 UTC, Walter Bright wrote: > On 9/11/2017 5:23 AM, Nicholas Wilson wrote: >> I'm pleased to announce that I have run the first dcompute kernel and it was a success! > > Excellent! > > > https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAAgvAAAAJDY4OTI4MmE0LTVlZDgtNGQzYy1iN2U1LWU5Nzk1NjlhNzIwNg.jpg Indeed the the world domination begin! I just need to get some OpenCL 2.0 capable hardware to test that and we'll be well on the way. AlsoLDC1.4 was just released Yay!

Forums