Thread overview | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
July 13, 2013 A new calling convention in VS2013 | ||||
---|---|---|---|---|
| ||||
Through Reddit I've found an article about vector-calling-convention added to VS2013: http://blogs.msdn.com/b/vcblog/archive/2013/07/12/introducing-vector-calling-convention.aspx So I have written what I think is a similar D program: import core.stdc.stdio, core.simd; struct Particle { float4 x, y; } Particle addParticles(in Particle p1, in Particle p2) pure nothrow { return Particle(p1.x + p2.x, p1.y + p2.y); } // BUG 10627 and 10523 //alias Particle2 = float4[2]; //Particle2 addParticles(in Particle2 p1, in Particle2 p2) { // return p1[] + p2[]; //} void main() { auto p1 = Particle([1, 2, 3, 4], [10, 20, 30, 40]); printf("%f %f %f %f %f %f %f %f\n", p1.x.array[0], p1.x.array[1], p1.x.array[2], p1.x.array[3], p1.y.array[0], p1.y.array[1], p1.y.array[2], p1.y.array[3]); auto p2 = Particle([100, 200, 300, 400], [1000, 2000, 3000, 4000]); printf("%f %f %f %f %f %f %f %f\n", p2.x.array[0], p2.x.array[1], p2.x.array[2], p2.x.array[3], p2.y.array[0], p2.y.array[1], p2.y.array[2], p2.y.array[3]); auto p3 = addParticles(p1, p2); printf("%f %f %f %f %f %f %f %f\n", p3.x.array[0], p3.x.array[1], p3.x.array[2], p3.x.array[3], p3.y.array[0], p3.y.array[1], p3.y.array[2], p3.y.array[3]); } I have compiled with the latest ldc2 (Windows32): ldc2 -O5 -disable-inlining -release -vectorize-slp -vectorize-slp-aggressive -output-s test.d The resulting X86 asm: __D4test12addParticlesFNaNbxS4test8ParticlexS4test8ParticleZS4test8Particle: pushl %ebp movl %esp, %ebp andl $-16, %esp subl $16, %esp movaps 40(%ebp), %xmm0 movaps 56(%ebp), %xmm1 addps 8(%ebp), %xmm0 addps 24(%ebp), %xmm1 movups %xmm1, 16(%eax) movups %xmm0, (%eax) movl %ebp, %esp popl %ebp ret $64 __Dmain: ... movaps 160(%esp), %xmm0 movaps 176(%esp), %xmm1 movaps %xmm1, 48(%esp) movaps %xmm0, 32(%esp) movaps 128(%esp), %xmm0 movaps 144(%esp), %xmm1 movaps %xmm1, 16(%esp) movaps %xmm0, (%esp) leal 96(%esp), %eax calll __D4test12addParticlesFNaNbxS4test8ParticlexS4test8ParticleZS4test8Particle subl $64, %esp movss 96(%esp), %xmm0 movss 100(%esp), %xmm1 movss 104(%esp), %xmm2 movss 108(%esp), %xmm3 movss 112(%esp), %xmm4 movss 116(%esp), %xmm5 movss 120(%esp), %xmm6 movss 124(%esp), %xmm7 cvtss2sd %xmm7, %xmm7 movsd %xmm7, 60(%esp) cvtss2sd %xmm6, %xmm6 movsd %xmm6, 52(%esp) cvtss2sd %xmm5, %xmm5 movsd %xmm5, 44(%esp) cvtss2sd %xmm4, %xmm4 movsd %xmm4, 36(%esp) cvtss2sd %xmm3, %xmm3 movsd %xmm3, 28(%esp) cvtss2sd %xmm2, %xmm2 movsd %xmm2, 20(%esp) cvtss2sd %xmm1, %xmm1 movsd %xmm1, 12(%esp) cvtss2sd %xmm0, %xmm0 movsd %xmm0, 4(%esp) movl $_.str3, (%esp) calll ___mingw_printf xorl %eax, %eax movl %ebp, %esp popl %ebp ret Are those vector calling conventions useful for D too? Bye, bearophile |
July 13, 2013 Re: A new calling convention in VS2013 | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Saturday, 13 July 2013 at 10:36:01 UTC, bearophile wrote: > Through Reddit I've found an article about vector-calling-convention added to VS2013: > http://blogs.msdn.com/b/vcblog/archive/2013/07/12/introducing-vector-calling-convention.aspx > -- snip -- > > Are those vector calling conventions useful for D too? > > Bye, > bearophile I'd vote for not adding more fluff which makes ABI differences between compilers greater. But it certainly looks like if would be useful if you wish to save the time taken to copy the vector from XMM registers onto the stack and back again when passing values around. |
July 14, 2013 Re: A new calling convention in VS2013 | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | Am 13.07.2013 12:35, schrieb bearophile: > The resulting X86 asm: > > __D4test12addParticlesFNaNbxS4test8ParticlexS4test8ParticleZS4test8Particle: > > pushl %ebp > movl %esp, %ebp > andl $-16, %esp > subl $16, %esp > movaps 40(%ebp), %xmm0 > movaps 56(%ebp), %xmm1 > addps 8(%ebp), %xmm0 > addps 24(%ebp), %xmm1 > movups %xmm1, 16(%eax) > movups %xmm0, (%eax) I think it would be more important if dmd would actually use the XMM registers correctly for computations. As you can see from the disassembly dmd generates code that always adds/moves from/to memory and does not stay within the registers at all. http://d.puremagic.com/issues/show_bug.cgi?id=10226 Until dmd uses the XMM registers correctly it doesn't make much sense to add a special calling convetion for this purpose. Kind Regards Benjamin Thaut |
July 14, 2013 Re: A new calling convention in VS2013 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Benjamin Thaut | Benjamin Thaut: > http://d.puremagic.com/issues/show_bug.cgi?id=10226 I see there are codegen inefficiencies. > Until dmd uses the XMM registers correctly it doesn't make much sense to add a special calling convetion for this purpose. I don't agree, because: - Even if DMD codegen is far from not perfect, it's a good idea to improve all things in parallel. Generally improving things (or fixing bugs) gives a better result if you adopt a pipelined development approach. - A vector calling convention is meant to be usable on other compilers too, like LDC2, that have better codegen. (The asm I have shown in this thread comes from LDC2 because dmd doesn't even use SIMD registers on Windows 32 bit). Bye, bearophile |
July 14, 2013 Re: A new calling convention in VS2013 | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | Calling convention optimizations can probably be done during whole program optimization, which 1) usable for computation-intensive applications anyway, 2) guarantees invisibility of those fastcall functions to external code so there's no incompatibility. |
July 14, 2013 Re: A new calling convention in VS2013 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kagamin | Kagamin:
> Calling convention optimizations can probably be done during whole program optimization, which 1) usable for computation-intensive applications anyway, 2) guarantees invisibility of those fastcall functions to external code so there's no incompatibility.
In D you can tag a free function as "private", to make them module-private. Maybe in this case the D compiler is free to use any kind of calling convention for them.
Bye,
bearophile
|
July 14, 2013 Re: A new calling convention in VS2013 | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | Am 14.07.2013 14:11, schrieb bearophile:
> Benjamin Thaut:
>
>> http://d.puremagic.com/issues/show_bug.cgi?id=10226
>
> I see there are codegen inefficiencies.
>
>
>> Until dmd uses the XMM registers correctly it doesn't make much sense
>> to add a special calling convetion for this purpose.
>
> I don't agree, because:
> - Even if DMD codegen is far from not perfect, it's a good idea to
> improve all things in parallel. Generally improving things (or fixing
> bugs) gives a better result if you adopt a pipelined development approach.
> - A vector calling convention is meant to be usable on other compilers
> too, like LDC2, that have better codegen. (The asm I have shown in this
> thread comes from LDC2 because dmd doesn't even use SIMD registers on
> Windows 32 bit).
>
> Bye,
> bearophile
I just wanted to say that there are currently bigger fish to fry then micro optimization through calling convetions. (GC, allocators, all the bugs...)
Did you compile the shown code with optimization enabled or is that a debug build? If it is optimized I'M going to be disappointed by LDCs codegen.
Kind Regards
Benjamin Thaut
|
July 14, 2013 Re: A new calling convention in VS2013 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Benjamin Thaut | Benjamin Thaut: > I just wanted to say that there are currently bigger fish to fry then micro optimization through calling convetions. (GC, allocators, all the bugs...) I understand and I agree. On the other hand I think there are things that (if desired) it's better to introduce sooner, despite some important bugs are not fixed, because they shape the future of D a bit. > Did you compile the shown code with optimization enabled or is that a debug build? If it is optimized I'M going to be disappointed by LDCs codegen. If you take a look at the original post I have used: ldc2 -O5 -disable-inlining -release -vectorize-slp -vectorize-slp-aggressive -output-s test.d I think that's about the max optimization, I have also added some aggressive optimization switches introduced only the latest LLVM version (if you remove them the resulting asm of addParticles is about the same, but it reorders less better some of the instructions inside the dmain). Bye, bearophile |
July 16, 2013 Re: A new calling convention in VS2013 | ||||
---|---|---|---|---|
| ||||
Posted in reply to Benjamin Thaut | Benjamin Thaut:
> If it is optimized I'M going to be disappointed by LDCs codegen.<
Why?
Bye,
bearophile
|
Copyright © 1999-2021 by the D Language Foundation