Lately I am playing a little with SIMD in D, and with the work in progress module std.simd, and I have added several related bug reports to Bugzilla.
This program compiles with no errors nor warnings with ldc2 on Windows32 bit:
import core.simd;
__gshared immutable int[16]
a = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
b = [16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1];
__gshared int[16] c;
void main() {
auto va = cast(int4[])a;
auto vb = cast(int4[])b;
auto vc = cast(int4[])c;
vc[0] = va[0] + vb[0];
}
ldc2 generates this main:
__Dmain:
subl $12, %esp
movl $16, 8(%esp)
movl $4, 4(%esp)
movl $16, (%esp)
calll __d_array_cast_len
movdqa __D5test51ayG16i, %xmm0
paddd __D5test51byG16i, %xmm0
movdqa %xmm0, __D5test51cG16i
xorl %eax, %eax
addl $12, %esp
ret
It uses the instruction movdqa, that assumes a,b and c to be aligned to 16 bytes. But I think there is no guarantee they are.
This is the LL code generated using the -output-ll switch of ldc2 (it's a kind of nearly universal bytecode for llvm):
@_D5test51ayG16i = global [16 x i32] [i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16]
@_D5test51byG16i = global [16 x i32] [i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1]
@_D5test51cG16i = global [16 x i32] zeroinitializer
If I add a "align(16)" annotation to a, b, c it adds the align 16 annotation in the LL code too:
@_D5test51ayG16i = global [16 x i32] [i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16], align 16
@_D5test51byG16i = global [16 x i32] [i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1], align 16
@_D5test51cG16i = global [16 x i32] zeroinitializer, align 16
Instead of array casts if I cast the pointers the result is similar, but there is no call to __d_array_cast_len (this is a raw cast, so I don't expect much help from the type system here...):
auto va = cast(int4*)a.ptr;
auto vb = cast(int4*)b.ptr;
auto vc = cast(int4*)c.ptr;
I'd like to receive an alignment warning in those cast(int4[]), or the compiler should not use movdqa in such case.
This means that maybe __d_array_cast_len should keep and transmit the alignment of its input pointer to the pointer in the output array. And maybe this means the D front-end should keep an alignment information for each pointer (or for pointers that will be used in situations where the alignment is important), integrating it in its type system (and perhaps perform alignment inference like the purity inference done for function templates).
The alignment of pointers is important for the CPU, so maybe the type system of a system language should keep track of them, and enforce the correctness of the alignments. Maybe this could be done with no further annotation burden for the programmer. A potential problem is how to mix this alignment inference with the needs of separate compilation. I think the align() annotations suffice for that.