Manu:While trying to write a multiplication of two complex numbers using SSE3 with LDC2 I have found about seven or more bugs, that I will discuss elsewhere. But regarding the syntax, in nice code like this D requires to add ".array" before all those subscripts (code adapted from Fog):
Interesting. Almost all his points are what we do already in D.
Always nice to see others come to the same conclusions :)
double2 complexMult(in double2 a, in double2 b) pure nothrow {
double2 b_flip = [b.array[1], b.array[0]];
double2 a_im = [a.array[1], a.array[1]];
double2 a_re = [a.array[0], a.array[0]];
double2 aib = a_im * b_flip;
double2 arb = a_re * b;
return [arb.array[0] - aib.array[0], arb.array[1] + aib.array[1]];
}
A line like this:
double2 b_flip = [b.array[1], b.array[0]];
becomes something like:
pshufd $238, %xmm1, %xmm3
Similarly all the other lines become single instructions (but the last one, because LDC2 misses to use a addsubpd).
I vaguely remember you saying that slow SIMD operations shouldn't have a too much short syntax to avoid giving an illusion of efficiency. But given that "often" the CPU executes such array subscripting and shuffling efficiently, isn't it nicer/enough to support a simpler syntax like this in D?
double2 complexMult(in double2 a, in double2 b) pure nothrow {
double2 b_flip = [b[1], b[0]];
double2 a_im = [a[1], a[1]];
double2 a_re = [a[0], a[0]];
double2 aib = a_im * b_flip;
double2 arb = a_re * b;
return [arb[0] - aib[0], arb[1] + aib[1]];
}