On 16 January 2012 18:48, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
On 1/16/12 10:46 AM, Manu wrote:
A function using float arrays and a function using hardware vectors
should certainly not be the same speed.

My point was that the version using float arrays should opportunistically use hardware ops whenever possible.

I think this is a mistake, because such a piece of code never exists outside of some context. If the context it exists within is all FPU code (and it is, it's a float array), then swapping between FPU and SIMD execution units will probably result in the function being slower than the original (also the float array is unaligned). The SIMD version however must exist within a SIMD context, since the API can't implicitly interact with floats, this guarantees that the context of each function matches that within which it lives.
This is fundamental to fast vector performance. Using SIMD is an all or nothing decision, you can't just mix it in here and there.
You don't go casting back and fourth between floats and ints on every other line... obviously it's imprecise, but it's also a major performance hazard. There is no difference here, except the performance hazard is much worse.