Manu:The D GC currently allocates them 16-bytes aligned (but if you slice the array you can lose some alignment). On some new CPUs the penalty for misalignment is small.
They must be aligned, and multiples of N elements.
You often have "n" values, where n is variable. If n is large enough and you are using D vector ops, the handling of the head and tail doesn't waste too much time. If you have very few values it's much better to use the SIMD code.
Maybe later we'll look for some syntax sugar for this.Well, each are valid comparisons in different situations. I'm not sure how syntax could clearly select the one you want.
Are D intrinsics offering instructions to perform prefetching?
Well, GCC does at least. If you're worried about performance at this level, you're probably already using GCC :)
I think D SIMD programmers will expect something functionally like __builtin_prefetch to be available in D too:
http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-g_t_005f_005fbuiltin_005fprefetch-3396