Two years ago there was a Google Summer of Code project to implement these primitives in pure D for various reason. It was concluded the project isn't viable and was abandoned, but there were some interesting learnings. I now stumbled on some new work in C land about these that might be interesting to people that were following the original project so I am sharing it here:
Custom ASM implementation that outperforms libc: https://github.com/nadavrot/memset_benchmark
Paper on automatic implementation of these primitives: https://dl.acm.org/doi/pdf/10.1145/3459898.3463904