Search
```Porting some code from C to D I found the inline assembler very convenient. This is the C code (using an external NASM file):

// dot_product returns dot product t*w of n elements.  n is rounded
// up to a multiple of 8.  Result is scaled down by 8 bits.
#ifdef NOASM  // no assembly language
int dot_product(short *t, short *w, int n) {
int sum=0;
n=(n+7)&-8;
for (int i=0; i<n; i+=2) {
if (lol >= 21567) printf("dp %d %d %d %d %d %d\n", n, i, t[i], w[i], t[i+1], w[i+1]);
sum+=(t[i]*w[i]+t[i+1]*w[i+1]) >> 8;
}
return sum;
}
#else  // The NASM version uses MMX and is about 8 times faster.
extern "C" int dot_product(short *t, short *w, int n);  // in NASM
#endif

In D, I can move the ASM inside the function, so there is no need for two declarations:

extern (C) int dot_product(short *t, short *w, const int n) {
version (D_InlineAsm_X86_64) asm {
naked;
mov RCX, RDX;            // n
mov RAX, RDI;            // a
mov RDX, RSI;            // b
cmp RCX, 0;
jz done;
sub RAX, 16;
sub RDX, 16;
pxor XMM0, XMM0;         // sum = 0
loop:                        // each loop sums 4 products
movdqa XMM1, [RAX+RCX*2];// put parital sums of vector product in xmm1
sub RCX, 8;
ja loop;
movdqa XMM1, XMM0;       // add 4 parts of xmm0 and return in eax
psrldq XMM1, 8;
movdqa XMM1, XMM0;
psrldq XMM1, 4;
movq RAX, XMM0;
done:
ret;
} else {
int sum = 0;
for (int i = 0; i < n; i += 4) {
sum += (t[i  ]*w[i  ] + t[i+1]*w[i+1]) >> 8;
sum += (t[i+2]*w[i+2] + t[i+3]*w[i+3]) >> 8;
}
return sum;
}
}

This example also shows, how 'naked' should probably not be applied to the function declaration, because it contains non-asm code as well. (It could be "naked asm" though.) For compatibility with GDC (and in fact the original NASM code), I used extern(C) here as the parameter passing strategy.
This may also serve as a practical use case for vector operations.
```
```On 1/22/2012 2:38 AM, Marco Leise wrote:
> Porting some code from C to D I found the inline assembler very convenient. This
> is the C code (using an external NASM file):

The original impetus for doing an inline assembler is all the nightmares we had trying to support the endless variety of assemblers that users had, all of which had different bugs and different syntax.
```