Manu:
What you're really doing is casting a bunch of vector components to floats,
and then rebuilding a vector, and LLVM can helpfully deal with that.
I would suggest calling a spade a spade and using a swizzle function to
perform a swizzle, instead of code like what you wrote.
Wouldn't this be better:
double2 complexMult(in double2 a, in double2 b) pure nothrow {double2 b_flip = b.yx; // or b.swizzle!"yx", if we don't want to
include an opDispatch in the basic type
double2 a_im = a.yy;
double2 a_re = a.xx;double2 aib = a_im * b_flip;
double2 arb = a_re * b;
I see and you are right.
(If I turn the basic type into a struct containing a double2
aliased-this to the whole structure, the generated code becomes
awful).
A YMM that already contains 8 floats, and probably SIMD registers
will keep growing, maybe to become 1024 bits long. So the swizzle
item names like x y z w will not suffice and some more general
naming scheme is needed.
My experience in writing such kind of code is limited. I will try// return [arb[0] - aib[0], arb[1] + aib[1]]; // this final line is
tricky... it's not very portable.
// Maybe:
return select([-1, 0], arb-aib, arb+aib);
// Hopefully the x86 optimiser will generate the proper opcode. Or a
bunch of other options; a multi-vector shuffle, shift, swizzle, interleave.
}
I think that would be better. More portable, and it eliminates the code
that implies a vector->float->vector cast sequence, which I maintain,
should be syntactically discouraged at all costs.
You don't want to be giving people bad ideas that it's reasonable code to
write ;)
your select to see what kind of code LDC2-LLVM generates.