On 20 June 2013 21:58, bearophile <bearophileHUGS@lycos.com> wrote:

Andrei Alexandrescu:

http://youtube.com/watch?v=q_39RnxtkgM

Very nice.

- - - - - - - - - - - - - - - - - - -

Slide 3:

In practise, say we have iterative code like this:

int data[100];

for(int i = 0; i < data.length; ++i) {
data[i] += 10; }

For code like that in D we have vector ops:

int[100] data;
data[] += 10;

Regarding vector ops: currently they are written with handwritten asm that uses SIMD where possible. Once std.simd is in good shape I think the array ops can be rewritten (and completed in their missing parts) using a higher level style of coding.

I was trying to illustrate a process. Not so much a comment on D array syntax.

The problem with auto-simd applied to array operations, is D doesn't assert that arrays are aligned. Nor are they multiples of 'N' elements wide, which means they lose the opportunity to make a lot of assumptions that make the biggest performance difference.

They must be aligned, and multiples of N elements. By using explicit SIMD types, you're forced to adhere to those rules as a programmer, and the compiler can optimise properly.

You take on the responsibility to handle mis-alignment and stragglers as the programmer, and perhaps make less conservative choices.

- - - - - - - - - - - - - - - - - - -

Slide 22:

Comparisons:
Full suite of comparisons Can produce bit-masks, or boolean 'any'/'all' logic.

Maybe a little of compiler support (for the syntax) will help here.

Well, each are valid comparisons in different situations. I'm not sure how syntax could clearly select the one you want.

- - - - - - - - - - - - - - - - - - -

Slide 26:

Always pass vectors by value.

Unfortunately it seems a bad idea to give a warning if you pass one of those by reference.

And I don't think it should. Passing by ref isn't 'wrong', you just shouldn't do it if you care about performance.

- - - - - - - - - - - - - - - - - - -

Slide 27:

3. Use ‘leaf’ functions where possible.

I am not sure how much good it is to enforce leaf functions with a @leaf annotation.

I don't think it would be useful. It should only be considered a general rule when people are very specifically considering performance above all else.

It's just a very important detail to be aware of when optimising your code, particularly so when you're dealing with maths code (often involving simd).

- - - - - - - - - - - - - - - - - - -

Slide 32:

Experiment with prefetching?

Are D intrinsics offering instructions to perform prefetching?

Well, GCC does at least. If you're worried about performance at this level, you're probably already using GCC :)

- - - - - - - - - - - - - - - - - - -

LDC2 is supports SIMD on Windows32 too.

So for this code:

void main() {
alias double2 = __vector(double[2]);
auto a = new double[200];
auto b = cast(double2[])a;
double2 tens = [10.0, 10.0];
b[] += tens;
}

LDC2 compiles it to:

movl $200, 4(%esp)
movl $__D11TypeInfo_Ad6__initZ, (%esp)
calll __d_newarrayiT
movl %edx, %esi
movl %eax, (%esp)
movl $16, 8(%esp)
movl $8, 4(%esp)
calll __d_array_cast_len
testl %eax, %eax
je LBB0_3
movapd LCPI0_0, %xmm0
.align 16, 0x90
LBB0_2:
movapd (%esi), %xmm1
addpd %xmm0, %xmm1
movapd %xmm1, (%esi)
addl $16, %esi
decl %eax
jne LBB0_2
LBB0_3:
xorl %eax, %eax
addl $12, %esp
popl %esi
ret

It uses addpd that works with two doubles at the same time.

Sure... did I say this wasn't supported somewhere? Sorry if I gave that impression.

- - - - - - - - - - - - - - - - - - -

The Reddit thread contains a link to this page, a compiler for a C variant from Intel that's optimized for SIMD:
http://ispc.github.io/

Some of the syntax of ispc:

- - - - - -

The first of these statements is cif, indicating an if statement that is expected to be coherent. The usage of cif in code is just the same as if:

cif (x < y) {
...
} else {
...
}

cif provides a hint to the compiler that you expect that most of the executing SPMD programs will all have the same result for the if condition.

Along similar lines, cfor, cdo, and cwhile check to see if all program instances are running at the start of each loop iteration; if so, they can run a specialized code path that has been optimized for the "all on" execution mask case.

This is interesting. I didn't know about this.