Dear D-ers,
I enjoyed reading some details of incorporating AVX into math code
from Johan Engelen's programming blog post:
http://johanengelen.github.io/ldc/2016/10/11/Math-performance-LDC.html
Basically, one can use the ldc compiler to insert avx code, nice!
In playing with some variants of his example code, I realize
that there are issues I do not understand. For example, the following
code successfully incorporates the avx instructions:
// File here is called dotFirst.d
import ldc.attributes : fastmath;
@fastmath
double dot( double[] a, double[] b)
{
double s = 0.0;
foreach (size_t i; 0 .. a.length) {
s += a[i] * b[i];
}
return s;
}
double[8] x =[0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, ];
double[8] y =[0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, ];
void main()
{
double z = 0.0;
z = dot(x, y);
}
If we run:
ldc2 -c -output-s -O3 -release dotFirst.d -mcpu=haswell
echo "Results of grep ymm dotFirst.s:"
grep ymm dotFirst.s
The "grep" shows a number of vector instructions, such as:
vfmadd132pd 160(%rcx,%rdi,8), %ymm5, %ymm1
However, subtle changes in the code (such as moving the dot product
function to a module, or even moving the array declarations to before
the dot product function, and the avx instructions will disappear!
import ldc.attributes : fastmath;
@fastmath
double[8] x =[0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, ];
double[8] y =[0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, ];
double dot( double[] a, double[] b)
{
double s = 0.0;
foreach (size_t i; 0 .. a.length) {
...
Now a grep will not find a single ymm.
It is understood that ldc needs proper alignment to be able to do the vector
instructions...
But my question is: how is proper alignment guaranteed? (Most importantly
how guaranteed among code using modules)?? (There are related stack alignment
issues -- 16?)
Best Regards,
James
PS I have come across scattered bits of (sometimes contradictory) information on
avx/simd for dlang. Is there a canonical source for vector info?