View mode: basic / threaded / horizontal-split · Log in · Help
January 15, 2012
SIMD benchmark
I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux.
Anyhow, it's good enough now to play around with. Consider it alpha quality. 
Expect bugs - but make bug reports, as there's a serious lack of source code to 
test it with.
-----------------------
import core.simd;

void test1a(float[4] a) { }

void test1()
{
    float[4] a = 1.2;
    a[] = a[] * 3 + 7;
    test1a(a);
}

void test2a(float4 a) { }

void test2()
{
    float4 a = 1.2;
    a = a * 3 + 7;
    test2a(a);
}

import std.stdio;
import std.datetime;

int main()
{
    test1();
    test2();
    auto b = comparingBenchmark!(test1, test2, 100);
    writeln(b.point);
    return 0;
}
January 15, 2012
Re: SIMD benchmark
On 1/14/2012 10:56 PM, Walter Bright wrote:
> as there's a serious lack of source code to
> test it with.

Here's what there is at the moment. Needs much more.

https://github.com/D-Programming-Language/dmd/blob/master/test/runnable/testxmm.d
January 15, 2012
Re: SIMD benchmark
On 15/01/12 6:56 AM, Walter Bright wrote:
> I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux.
> Anyhow, it's good enough now to play around with. Consider it alpha
> quality. Expect bugs - but make bug reports, as there's a serious lack
> of source code to test it with.

You sure you want proper bug reports for this? There still seems to be a 
lot of issues. For example, none of these work for me (OSX 64-bt).

----

int4 a = 2; // backend/cod2.c 2630

----

int4 a = void;
int4 b = void;
a = b; // segfault

----

int4 a = void;
a = simd(XMM.PXOR, a, a); // segfault

----

I could go on and on really. Very little seems to work at my end.

Actually, looking at the auto-tester, I'm not alone. Just seems to be 
OSX though.

http://d.puremagic.com/test-results/index.ghtml
January 15, 2012
Re: SIMD benchmark
On 15 January 2012 06:56, Walter Bright <newshound2@digitalmars.com> wrote:
> I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux.
> Anyhow, it's good enough now to play around with. Consider it alpha quality.
> Expect bugs - but make bug reports, as there's a serious lack of source code
> to test it with.

I get 20+ speedup without optimisations with GDC on that small test. :)


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
January 15, 2012
Re: SIMD benchmark
On 15 January 2012 16:59, Iain Buclaw <ibuclaw@ubuntu.com> wrote:
> On 15 January 2012 06:56, Walter Bright <newshound2@digitalmars.com> wrote:
>> I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux.
>> Anyhow, it's good enough now to play around with. Consider it alpha quality.
>> Expect bugs - but make bug reports, as there's a serious lack of source code
>> to test it with.
>
> I get 20+ speedup without optimisations with GDC on that small test. :)
>

Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up
with -O2 and above.  My oh my...



-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
January 15, 2012
Re: SIMD benchmark
On 1/15/2012 3:49 AM, Peter Alexander wrote:
> Actually, looking at the auto-tester, I'm not alone. Just seems to be OSX though.

Yeah, it's just OSX. I had the test for that platform inadvertently disabled, gak.
January 15, 2012
Re: SIMD benchmark
On 1/15/2012 10:10 AM, Iain Buclaw wrote:
> Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up
> with -O2 and above.  My oh my...

Woo-hoo!
January 15, 2012
Re: SIMD benchmark
On 15 January 2012 20:10, Iain Buclaw <ibuclaw@ubuntu.com> wrote:

> On 15 January 2012 16:59, Iain Buclaw <ibuclaw@ubuntu.com> wrote:
> > On 15 January 2012 06:56, Walter Bright <newshound2@digitalmars.com>
> wrote:
> >> I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux.
> >> Anyhow, it's good enough now to play around with. Consider it alpha
> quality.
> >> Expect bugs - but make bug reports, as there's a serious lack of source
> code
> >> to test it with.
> >
> > I get 20+ speedup without optimisations with GDC on that small test. :)
> >
>
> Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up
> with -O2 and above.  My oh my...


Oh my indeed.
Haha, well I'm sure that's a fairly artificial result, but yes, this is why
I've been harping on for months that it's a bare necessity to provide
language support :P
January 15, 2012
Re: SIMD benchmark
Iain Buclaw:

> Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up
> with -O2 and above.  My oh my...

Please, show me the assembly code produced, with its relative D source :-)

Bye,
bearophile
January 15, 2012
Re: SIMD benchmark
On 15 January 2012 19:01, bearophile <bearophileHUGS@lycos.com> wrote:
> Iain Buclaw:
>
>> Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up
>> with -O2 and above.  My oh my...
>
> Please, show me the assembly code produced, with its relative D source :-)
>
> Bye,
> bearophile

D code:
----
import core.simd;

void test2a(float4 a) { }

float4 test2()
{
  float4 a = 1.2;
  a = a * 3 + 7;
  test2a(a);
  return a;
}
----

Relevant assembly:
----
.LC5:
       .long   1067030938
       .long   1067030938
       .long   1067030938
       .long   1067030938
       .section        .rodata.cst4,"aM",@progbits,4
       .align 4

_D4test5test2FZNhG4f:
       .cfi_startproc
       movl    $3, %eax
       cvtsi2ss        %eax, %xmm0
       movb    $7, %al
       cvtsi2ss        %eax, %xmm1
       unpcklps        %xmm0, %xmm0
       unpcklps        %xmm1, %xmm1
       movlhps %xmm0, %xmm0
       movlhps %xmm1, %xmm1
       mulps   .LC5(%rip), %xmm0
       addps   %xmm1, %xmm0
       ret
       .cfi_endproc
----

As someone pointed out to me, the only optimisation missing was
constant propagation, but that doesn't matter too much for now.

Regards
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
« First   ‹ Prev
1 2 3 4 5
Top | Discussion index | About this forum | D home