On Monday, 7 November 2022 at 01:59:03 UTC, Bruce Carneal wrote:
> Here's a simple godbolt example of one of the areas in which gdc solidly outperforms ldc wrt auto-vectorization: simple but not trivial operand gather
https://godbolt.org/z/ox1vvxd8s
Compile time target adaptive manual __vector-ization is an answer here if you have no access to SIMT, so not a show stopper, but the code is less readable.
I'm not sure what the data parallel future should look like wrt language/IR but I'm pretty sure we can do better than praying that the auto vectorizer can dig patterns out of for loops, or throwing ourselves on the manual vectorization grenade, repeatedly.
My "grenade" phrasing above was fun to write but overly dramatic. Manual __vector-ization is more tedious than dangerous and D ldc/gdc give you quite a bit of help there including 1) __vector types 2) CT max vector length introspection.
Also, auto vectorization does work nicely against simple/and-or conditioned inputs/outputs.
I believe there is a lot more to be had in the programmer-friendly-data-parallelism department, perhaps involving a (major) pivot to MLIR, but I give my considered thanks to those involved in providing what is already the best option in that arena from my point of view. Introspection, __vector, auto-vec, dcompute, ... it's a potent toolkit.