On 16 August 2013 04:39, bearophile <bearophileHUGS@lycos.com> wrote:

The XMM registers I am using are efficient when you feed them memory from arrays aligned to 16 bytes, as the D GC produces. But the YMM registers used by the AVX/AVX2 instructions prefer an alignment of 32 bytes. And the Intel Xeon Phi (MIC) has XMM registers that are efficient when the arrays are aligned to 64 bytes.

When I am not using SIMD code, and I want a small array of little elements, like an array of 10 ushorts, having it aligned to 16 bytes is a waste of space (despite helps the GC reduce the fragmentation).

I'd argue that if you're alloc-ing many instances of 10 short's independently on the heap, then you're probably doing it wrong.

If you're only doing it once or twice, then it's not a significant waste of memory as you say.

So I have written a small enhancement request, where I suggest that arrays for YMM registers could be allocated with an alignment of 32 bytes:
http://d.puremagic.com/issues/show_bug.cgi?id=10826

This shouldn't be an enhancement request, it should be the rule. __vector()'s should be intrinsically aligned to their sizeof. If they are not, it should be a bug.

Having the array alignments in the D type system could be useful. To be backward-compatible you also need a generic unknown alignment (like a void* for alignments), so you can assign arrays of any alignment to it, it could be denoted with '0'.

Some rough ideas:

import core.simd: double2, double4;
auto a = new int[10];
static assert(__traits(alignment, a) == 16);
auto b = new int[128]<32>;
static assert(__traits(alignment, b) == 32);
auto c1 = new double2[128];
auto c2 = new double4[64];
static assert(__traits(alignment, c1) == 16);
static assert(__traits(alignment, c2) == 32);

void foo1(int[]<32> a) {
// Uses YMM registers to modify a
// ...
}

void foo2(int[] a)
if (__traits(alignment, a) == 32) {
// Uses YMM registers to modify a
// ...
}

void foo3(size_t N)(int[]<N> a) {
static if (N >= 32) {
// Uses YMM registers to modify a
// ...
} else {
// ...
}
}

The thing is, a/b/c1/c2 is really just: struct { size_t length; T *ptr; }

Those symbol names just refer to the dynamic array struct... It doesn't make a lot of sense to query the alignment of those symbols. __traits(alignment, s) would == sizeof(size_t) every time.

I've thought about this sort of thing many times before... but I'm not convinced. I still think it falls over with the possibility of slicing, and separate compilation.

The thing you're asking for is alignment of the base of the array, not alignment of elements within the array or alignment of the dynamic array structure its self.

Alignment of the base of the array isn't really expressible as part of the type. It's just a request to the allocator.

I use a mallocAligned() function in C. Sadly, I think this is one of the mistakes of a discreet 'new' operator, which has a syntax that doesn't lend its self to arbitrary parameters.