On 30 May 2017 at 19:54, Nicholas Wilson via Digitalmars-d <digitalmars-d@puremagic.com> wrote:

as for embedding in the binary a post build step that does

ubyte[] ptx_code = import("kernels_cuda620_64.ptx");

should be doable as should invoking ptxas and doing the same.
Then proving a consistent naming convention is used the code can do its magic.
Or the files could just be read from disk.

Is it possible to convince the compiler to emit code built for the backend target directly into the same object file as the host code?
I feel like this should be possible, along the lines of __attribute__((target(...)) to convince the compiler to generate code for a few functions with different targets than the module?

Using solutions as you suggest above introduces dependent build sequencing into the build script. Different build systems might prove to be more or less difficult to integrate cleanly, and many people use build-script generators which might need to learn a few new tricks.

Any input with your expertise with CUDA will be much appriciated.

'Expertise' is possibly not the word I'd suggest ;)
But I'll have some established software by that time that I'd love to attempt to port, we can work through rough edges together when you're available. No rush.