March 05, 2021
On Friday, 5 March 2021 at 16:41:39 UTC, kinke wrote:
> On Friday, 5 March 2021 at 16:34:07 UTC, kinke wrote:
>> https://run.dlang.io/is/z0k6l9
>
> Small correction (IR generation was fine, compilation wasn't): that intrinsic requires a newer target shader model (-mcpu=sm_70) [and -betterC to avoid ModuleInfo etc.]. Then the asm can be generated and inspected too.

Yep.  The above works for me as well.  cuda_620 (sm_62 IIUC) is the highest currently allowed via the --mdcompute-targets interface.  I'll work on expanding that.

I'm not sure if the "i1" type can just plug in to gen_gccbuiltins.cpp as previously discussed but I'll look at that as well.

Thanks.

April 25, 2021

On Friday, 5 March 2021 at 00:03:26 UTC, Bruce Carneal wrote:

>

After updating the first line to '@compute(CompileFor.hostAndDevice) module ...' and adding an 'import ldc.dcompute;' line, the runtime/import/ldc/gccbuiltins_nvvm.di file from a current LDC build apparently gives access to all manner of GPU intrinsics.

Hi Bruce,
Why not submit a PR that modifies gen_gccbuiltins.cpp such that it adds the @compute attribute for the relevant intrinsics files?
I think it's OK if gen_gccbuiltins contains some hacks like that . Please add a small compile test case, so we verify that it won't bitrot in the future.

Wouldn't @compute(CompileFor.deviceOnly) make more sense, because the intrinsics will not be available on normal CPUs anyway?

I hope all your work will land in either LDC or dcompute's repositories, such that others can easily benefit from it.

cheers,
Johan

April 26, 2021

On Sunday, 25 April 2021 at 22:26:06 UTC, Johan Engelen wrote:

>

On Friday, 5 March 2021 at 00:03:26 UTC, Bruce Carneal wrote:

>

After updating the first line to '@compute(CompileFor.hostAndDevice) module ...' and adding an 'import ldc.dcompute;' line, the runtime/import/ldc/gccbuiltins_nvvm.di file from a current LDC build apparently gives access to all manner of GPU intrinsics.

Hi Bruce,
Why not submit a PR that modifies gen_gccbuiltins.cpp such that it adds the @compute attribute for the relevant intrinsics files?
I think it's OK if gen_gccbuiltins contains some hacks like that . Please add a small compile test case, so we verify that it won't bitrot in the future.

Wouldn't @compute(CompileFor.deviceOnly) make more sense, because the intrinsics will not be available on normal CPUs anyway?

I hope all your work will land in either LDC or dcompute's repositories, such that others can easily benefit from it.

cheers,
Johan

Yes, I'll help when the current push is over here, but I think I dont understand enough quite yet. I'm still bumping in to limitations/awkwardness in dcompute that should admit simple solutions. At least it feels that way.

One idea from my experience to date is that we can and probably should create a simpler (from a programmer perspective) and finer granularity way to handle multiple targets. Intrinsic selection is part of that as is library selection.

Also on my mind is how we should handle deployment. For the ultimate in speed we can do AOT per-target specialized compiles and "fat" binaries but using SPIR-V + Vulkan compute could significantly improve penetration and reduce bloat.

I read a relatively recent thread in an LLVM forum indicating that the Intel guys are pushing a "real" SPIR-V IR effort now so maybe we can help out there.

Also, I dont know how MLIR should fit in to our plans.

I'll be in touch when I get my head above water here. Thanks to you and the rest of the LDC crew for the help so far. Looking forward to advancing dlang on GPUs in the future. It really can be much much better than C++ in that arena.

Bruce

April 27, 2021

On Monday, 26 April 2021 at 13:20:11 UTC, Bruce Carneal wrote:

>

On Sunday, 25 April 2021 at 22:26:06 UTC, Johan Engelen wrote:

>

[...]

Yes, I'll help when the current push is over here, but I think I dont understand enough quite yet. I'm still bumping in to limitations/awkwardness in dcompute that should admit simple solutions. At least it feels that way.

[...]

Nice work, thanks for wanting to improve dcompute! I think D has real potential there

1 2
Next ›   Last »