Vector ops and SIMD ops are different things. float[4] (or more realistically, float[3]) should NOT be a candidate for automatic SIMD implementation, likewise, simd_type should not have its components individually accessible. These are operations the hardware can not actually perform. So no syntax to worry about, just a type.
I think the good Hara will be able to implement those syntax fixes in a matter of just one day or very few days if a consensus is reached about what actually is to be fixed in D vector ops syntax.
Instead of discussing about *adding* something (register intrinsics) I suggest to discuss about what to fix about the *already present* vector op syntax. This is not a request to just you Manu, but to this whole newsgroup.
And I think this is exactly the wrong approach. A vector is NOT an array of 4 (actually, usually 3) floats. It should not appear as one. This is overly complicated and ultimately wrong way to engage this hardware.
Imagine the complexity in the compiler to try and force float[4] operations into vector arithmetic vs adding a 'v128' type which actually does what people want anyway... What about when when you actually WANT a float[4] array, and NOT a vector?
SIMD units are not float units, they should not appear like an aggregation of float units. They have:
* Different error semantics, exception handling rules, sometimes different precision...
* Special alignment rules.
* Special literal expression/assignment.
* You can NOT access individual components at will.
* May be reinterpreted at any time as float[1] float[4] double[2] short[8] char[16], etc... (up to the architecture intrinsics)
* Can not be involved in conventional comparison logic (array of floats would make you think they could)
*** Can NOT interact with the regular 'float' unit... Vectors as an array of floats certainly suggests that you can interact with scalar floats...
I will use architecture intrinsics to operate on these regs, and put that nice and neatly behind a hardware vector type with version()'s for each architecture, and an API with a whole lot of sugar to make them nice and friendly to use.
My argument is that even IF the compiler some day attempts to make vector optimisations to float[4] arrays, the raw hardware should be exposed first, and allow programmers to use it directly. This starts with a language defined (platform independant) v128 type.