On Friday, 14 January 2022 at 15:17:59 UTC, Ola Fosheim Grøstad wrote:
*nods* For a long time we could expect "home computers" to be Intel/AMD, but then the computing environment changed and maybe Apple tries to make its own platform stand out as faster than it is by forcing developers to special case their code for Metal rather than going through a generic API.
I guess FPGAs will be available in entry level machines at some point as well. So, I understand that it will be a challenge to get dcompute to a "ready for the public" stage when there is no multi-person team behind it.
Maybe, but I suspect not for a while though, but that could be wildly wrong. Anyway, I don't think they will be too difficult to support, provided the vendor in question provides an OpenCL implementation. The only thing to do is support
As for manpower, the reason is I don't have any personal particular need for dcompute these days. I am happy to do features for people that need something in particular, e.g. Vulkan compute shader, textures, and PR are welcome. Though if Bruce makes millions and gives me a job then that will obviously change ;)
But I am not so sure about the apples and oranges aspect of it.
The apples to oranges comment was about doing benchmarks with CPU vs. GPU, there are so many factors that make performance comparisons (more) difficult. Is the GPU discrete? How important is latency vs. throughput? How "powerful" is the GPU compared to the CPU?How well suited to the task is the GPU? The list goes on. Its hard enough to do CPU benchmarks in an unbiased way.
If the intention is to say, "look at the speedup you can for for $TASK using $COMMON_HARDWARE" then yeah, that would be possible. It would certainly be possible to do a benchmark of, say, "ease of implementation with comparable performance" of dcopmute vs CUDA, e.g. LoC, verbosity, brittleness etc., since the main advantage of D/dcompute (vs CUDA) is enumeration of kernel designs for performance. That would give a nice measurable goal to improve usability.
The presentation by Bryce was quite explicitly focusing on making GPU computation available at the same level as CPU computations (sans function pointers). This should be possible for homogeneous memory systems (GPU and CPU sharing the same memory bus) in a rather transparent manner and languages that plan for this might be perceived as being much more productive and performant if/when this becomes reality. And C++23 isn't far away, if they make the deadline.
Definitely. Homogenous memory is interesting for the ability to make GPUs do the things GPUs are good at and leave the rest to the CPU without worrying about memory transfer across the PCI-e. Something which CUDA can't take advantage of on account of nvidia GPUs being only discrete. I've no idea how cacheing work in a system like that though.
It was also interesting to me that ISO C23 will provide custom bit width integers and that this would make it easier to efficiently compile C-code to tighter FPGA logic. I remember that LLVM used to have that in their IR, but I think it was taken out and limited to more conventional bit sizes?
Arbitrary Precision integers are still a part of LLVM, and I presume LLVM IR. the problem with that is, like with addressed spaced pointers, D has no way to declare such types. I seem to remember Luís Marqeus doing something crazy like that (maybe in a dconf presentation?), compiling D to verilog.
It just shows that being a system-level programming language requires a lot of adaptability over time and frameworks like dcompute cannot ever be considered truly finished.