ldc nvvm GPU intrinsics good news (page 2)

On Friday, 5 March 2021 at 16:41:39 UTC, kinke wrote: > On Friday, 5 March 2021 at 16:34:07 UTC, kinke wrote: >> https://run.dlang.io/is/z0k6l9 > > Small correction (IR generation was fine, compilation wasn't): that intrinsic requires a newer target shader model (-mcpu=sm_70) [and -betterC to avoid ModuleInfo etc.]. Then the asm can be generated and inspected too. Yep. The above works for me as well. cuda_620 (sm_62 IIUC) is the highest currently allowed via the --mdcompute-targets interface. I'll work on expanding that. I'm not sure if the "i1" type can just plug in to gen_gccbuiltins.cpp as previously discussed but I'll look at that as well. Thanks.

On Friday, 5 March 2021 at 00:03:26 UTC, Bruce Carneal wrote:

After updating the first line to '@compute(CompileFor.hostAndDevice) module ...' and adding an 'import ldc.dcompute;' line, the runtime/import/ldc/gccbuiltins_nvvm.di file from a current LDC build apparently gives access to all manner of GPU intrinsics.

Hi Bruce,
Why not submit a PR that modifies gen_gccbuiltins.cpp such that it adds the @compute attribute for the relevant intrinsics files?
I think it's OK if gen_gccbuiltins contains some hacks like that . Please add a small compile test case, so we verify that it won't bitrot in the future.

Wouldn't @compute(CompileFor.deviceOnly) make more sense, because the intrinsics will not be available on normal CPUs anyway?

I hope all your work will land in either LDC or dcompute's repositories, such that others can easily benefit from it.

cheers,
Johan

April 26, 2021

Re: ldc nvvm GPU intrinsics good news

Posted by Bruce Carneal
in reply to Johan Engelen

Permalink

Bruce Carneal

Posted in reply to Johan Engelen

Permalink

On Sunday, 25 April 2021 at 22:26:06 UTC, Johan Engelen wrote:

On Friday, 5 March 2021 at 00:03:26 UTC, Bruce Carneal wrote:

Wouldn't @compute(CompileFor.deviceOnly) make more sense, because the intrinsics will not be available on normal CPUs anyway?

I hope all your work will land in either LDC or dcompute's repositories, such that others can easily benefit from it.

cheers,
Johan

Yes, I'll help when the current push is over here, but I think I dont understand enough quite yet. I'm still bumping in to limitations/awkwardness in dcompute that should admit simple solutions. At least it feels that way.

One idea from my experience to date is that we can and probably should create a simpler (from a programmer perspective) and finer granularity way to handle multiple targets. Intrinsic selection is part of that as is library selection.

Also on my mind is how we should handle deployment. For the ultimate in speed we can do AOT per-target specialized compiles and "fat" binaries but using SPIR-V + Vulkan compute could significantly improve penetration and reduce bloat.

I read a relatively recent thread in an LLVM forum indicating that the Intel guys are pushing a "real" SPIR-V IR effort now so maybe we can help out there.

Also, I dont know how MLIR should fit in to our plans.

I'll be in touch when I get my head above water here. Thanks to you and the rest of the LDC crew for the help so far. Looking forward to advancing dlang on GPUs in the future. It really can be much much better than C++ in that arena.

Bruce

Forums