Thread overview | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
January 15, 2012 SIMD benchmark | ||||
---|---|---|---|---|
| ||||
I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux. Anyhow, it's good enough now to play around with. Consider it alpha quality. Expect bugs - but make bug reports, as there's a serious lack of source code to test it with. ----------------------- import core.simd; void test1a(float[4] a) { } void test1() { float[4] a = 1.2; a[] = a[] * 3 + 7; test1a(a); } void test2a(float4 a) { } void test2() { float4 a = 1.2; a = a * 3 + 7; test2a(a); } import std.stdio; import std.datetime; int main() { test1(); test2(); auto b = comparingBenchmark!(test1, test2, 100); writeln(b.point); return 0; } |
January 15, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 1/14/2012 10:56 PM, Walter Bright wrote: > as there's a serious lack of source code to > test it with. Here's what there is at the moment. Needs much more. https://github.com/D-Programming-Language/dmd/blob/master/test/runnable/testxmm.d |
January 15, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 15/01/12 6:56 AM, Walter Bright wrote: > I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux. > Anyhow, it's good enough now to play around with. Consider it alpha > quality. Expect bugs - but make bug reports, as there's a serious lack > of source code to test it with. You sure you want proper bug reports for this? There still seems to be a lot of issues. For example, none of these work for me (OSX 64-bt). ---- int4 a = 2; // backend/cod2.c 2630 ---- int4 a = void; int4 b = void; a = b; // segfault ---- int4 a = void; a = simd(XMM.PXOR, a, a); // segfault ---- I could go on and on really. Very little seems to work at my end. Actually, looking at the auto-tester, I'm not alone. Just seems to be OSX though. http://d.puremagic.com/test-results/index.ghtml |
January 15, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 15 January 2012 06:56, Walter Bright <newshound2@digitalmars.com> wrote: > I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux. Anyhow, it's good enough now to play around with. Consider it alpha quality. Expect bugs - but make bug reports, as there's a serious lack of source code to test it with. I get 20+ speedup without optimisations with GDC on that small test. :) -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0'; |
January 15, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
On 15 January 2012 16:59, Iain Buclaw <ibuclaw@ubuntu.com> wrote: > On 15 January 2012 06:56, Walter Bright <newshound2@digitalmars.com> wrote: >> I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux. Anyhow, it's good enough now to play around with. Consider it alpha quality. Expect bugs - but make bug reports, as there's a serious lack of source code to test it with. > > I get 20+ speedup without optimisations with GDC on that small test. :) > Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up with -O2 and above. My oh my... -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0'; |
January 15, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Peter Alexander | On 1/15/2012 3:49 AM, Peter Alexander wrote:
> Actually, looking at the auto-tester, I'm not alone. Just seems to be OSX though.
Yeah, it's just OSX. I had the test for that platform inadvertently disabled, gak.
|
January 15, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Iain Buclaw | On 1/15/2012 10:10 AM, Iain Buclaw wrote:
> Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up
> with -O2 and above. My oh my...
Woo-hoo!
|
January 15, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Attachments:
| On 15 January 2012 20:10, Iain Buclaw <ibuclaw@ubuntu.com> wrote:
> On 15 January 2012 16:59, Iain Buclaw <ibuclaw@ubuntu.com> wrote:
> > On 15 January 2012 06:56, Walter Bright <newshound2@digitalmars.com>
> wrote:
> >> I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux. Anyhow, it's good enough now to play around with. Consider it alpha
> quality.
> >> Expect bugs - but make bug reports, as there's a serious lack of source
> code
> >> to test it with.
> >
> > I get 20+ speedup without optimisations with GDC on that small test. :)
> >
>
> Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up with -O2 and above. My oh my...
Oh my indeed.
Haha, well I'm sure that's a fairly artificial result, but yes, this is why
I've been harping on for months that it's a bare necessity to provide
language support :P
|
January 15, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Iain Buclaw | Iain Buclaw:
> Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up with -O2 and above. My oh my...
Please, show me the assembly code produced, with its relative D source :-)
Bye,
bearophile
|
January 15, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 15 January 2012 19:01, bearophile <bearophileHUGS@lycos.com> wrote: > Iain Buclaw: > >> Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up with -O2 and above. My oh my... > > Please, show me the assembly code produced, with its relative D source :-) > > Bye, > bearophile D code: ---- import core.simd; void test2a(float4 a) { } float4 test2() { float4 a = 1.2; a = a * 3 + 7; test2a(a); return a; } ---- Relevant assembly: ---- .LC5: .long 1067030938 .long 1067030938 .long 1067030938 .long 1067030938 .section .rodata.cst4,"aM",@progbits,4 .align 4 _D4test5test2FZNhG4f: .cfi_startproc movl $3, %eax cvtsi2ss %eax, %xmm0 movb $7, %al cvtsi2ss %eax, %xmm1 unpcklps %xmm0, %xmm0 unpcklps %xmm1, %xmm1 movlhps %xmm0, %xmm0 movlhps %xmm1, %xmm1 mulps .LC5(%rip), %xmm0 addps %xmm1, %xmm0 ret .cfi_endproc ---- As someone pointed out to me, the only optimisation missing was constant propagation, but that doesn't matter too much for now. Regards -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0'; |
Copyright © 1999-2021 by the D Language Foundation