Is implementing a clang style @target_clones capability for LDC a big ask?
@target_clones only works on a per-function basis within a single CPU family currently but can easily be employed in modules intended for multi CPU family use by selecting the @target_clones string from within version(CPU_family_predefine) statements. This might be useful for the front end, runtime, and Phobos as well as to code written with auto vectorization in mind.
There are manual workarounds and extensions, such as library versioning with eager or late (LTO) target binding, so it might not be worth more than a modest effort but I thought I'd ask.