I was refactoring some code and changed a parameter from by value, to by pointer, and saw the performance drop by 50%. This is a highly reduced example of what I found, but basically passing something into a function by reference or pointer seems to make the compilers (it affects both DMD and LDC) treat it as if its volatile and must be loaded from memory on every use. This also inhibits the auto-vectorization of code by LDC.
https://d.godbolt.org/z/oonq1drd9
void fillBP(uint* value, uint* dest)
{
dest[0] = *value;
dest[1] = *value;
dest[2] = *value;
dest[3] = *value;
}
codegen DMD -->
push RBP
mov RBP,RSP
mov ECX,[RSI]
mov [RDI],ECX
mov EDX,[RSI]
mov 4[RDI],EDX
mov R8D,[RSI]
mov 8[RDI],R8D
mov R9D,[RSI]
mov 0Ch[RDI],R9D
pop RBP
ret
codgen LDC -->
mov eax, dword ptr [rdi]
mov dword ptr [rsi], eax
mov eax, dword ptr [rdi]
mov dword ptr [rsi + 4], eax
mov eax, dword ptr [rdi]
mov dword ptr [rsi + 8], eax
mov eax, dword ptr [rdi]
mov dword ptr [rsi + 12], eax
ret
void fillBV(uint value, uint* dest)
{
dest[0] = value;
dest[1] = value;
dest[2] = value;
dest[3] = value;
}
codgen DMD -->
push RBP
mov RBP,RSP
mov [RDI],ESI
mov 4[RDI],ESI
mov 8[RDI],ESI
mov 0Ch[RDI],ESI
pop RBP
ret
codegen LDC -->
movd xmm0, edi
pshufd xmm0, xmm0, 0
movdqu xmmword ptr [rsi], xmm0
ret
Interestingly if you do this...
void fillBP(uint* value, uint* dest)
{
uint tmp = *value;
dest[0] = tmp;
dest[1] = tmp;
dest[2] = tmp;
dest[3] = tmp;
}
You get identical code to the by value versions. (except the load from memory)
I'm not a compiler guy so maybe there's some rationale for this that I don't know but it seems like the compiler should be able to read "*value" once and cache it.