Jump to page: 1 27  
Page
Thread overview
SIMD benchmark
Jan 15, 2012
Walter Bright
Jan 15, 2012
Walter Bright
Jan 15, 2012
Peter Alexander
Jan 15, 2012
Walter Bright
Jan 15, 2012
Iain Buclaw
Jan 15, 2012
Iain Buclaw
Jan 15, 2012
Walter Bright
Jan 15, 2012
bearophile
Jan 15, 2012
Iain Buclaw
Jan 15, 2012
Iain Buclaw
Jan 15, 2012
Manu
Jan 16, 2012
Andre Tampubolon
Jan 16, 2012
Walter Bright
Jan 16, 2012
Andre Tampubolon
Jan 16, 2012
Walter Bright
Jan 16, 2012
Manu
Jan 16, 2012
Manu
Jan 16, 2012
Timon Gehr
Jan 16, 2012
Manu
Jan 16, 2012
Walter Bright
Jan 16, 2012
Michel Fortin
Jan 16, 2012
Michel Fortin
Jan 16, 2012
Manu
Jan 16, 2012
Walter Bright
Jan 16, 2012
Walter Bright
Jan 16, 2012
Walter Bright
Jan 16, 2012
Iain Buclaw
Jan 16, 2012
Walter Bright
Jan 16, 2012
Iain Buclaw
Jan 16, 2012
Peter Alexander
Jan 16, 2012
Manu
Jan 16, 2012
Walter Bright
Jan 16, 2012
Manu
Jan 16, 2012
Simen Kjærås
Jan 16, 2012
Simen Kjærås
Jan 16, 2012
Walter Bright
Jan 16, 2012
Iain Buclaw
Jan 17, 2012
Peter Alexander
Jan 17, 2012
Walter Bright
Jan 17, 2012
Peter Alexander
Jan 17, 2012
Walter Bright
Jan 17, 2012
Peter Alexander
Jan 17, 2012
Walter Bright
Jan 18, 2012
Peter Alexander
Jan 18, 2012
Walter Bright
Jan 18, 2012
Timon Gehr
Jan 18, 2012
F i L
Jan 18, 2012
Timon Gehr
Jan 18, 2012
a
Jan 18, 2012
F i L
Jan 17, 2012
Iain Buclaw
Jan 17, 2012
Manu
Jan 17, 2012
Walter Bright
Jan 17, 2012
Manu
Jan 17, 2012
Walter Bright
Jan 17, 2012
Iain Buclaw
Jan 17, 2012
Martin Nowak
Jan 17, 2012
bearophile
Jan 17, 2012
Manu
Jan 17, 2012
Martin Nowak
Jan 17, 2012
Manu
Jan 16, 2012
Martin Nowak
Jan 17, 2012
Don Clugston
Jan 17, 2012
Martin Nowak
Jan 17, 2012
Walter Bright
Jan 17, 2012
Martin Nowak
Jan 17, 2012
Walter Bright
January 15, 2012
I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux.
Anyhow, it's good enough now to play around with. Consider it alpha quality. Expect bugs - but make bug reports, as there's a serious lack of source code to test it with.
-----------------------
import core.simd;

void test1a(float[4] a) { }

void test1()
{
    float[4] a = 1.2;
    a[] = a[] * 3 + 7;
    test1a(a);
}

void test2a(float4 a) { }

void test2()
{
    float4 a = 1.2;
    a = a * 3 + 7;
    test2a(a);
}

import std.stdio;
import std.datetime;

int main()
{
    test1();
    test2();
    auto b = comparingBenchmark!(test1, test2, 100);
    writeln(b.point);
    return 0;
}
January 15, 2012
On 1/14/2012 10:56 PM, Walter Bright wrote:
> as there's a serious lack of source code to
> test it with.

Here's what there is at the moment. Needs much more.

https://github.com/D-Programming-Language/dmd/blob/master/test/runnable/testxmm.d
January 15, 2012
On 15/01/12 6:56 AM, Walter Bright wrote:
> I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux.
> Anyhow, it's good enough now to play around with. Consider it alpha
> quality. Expect bugs - but make bug reports, as there's a serious lack
> of source code to test it with.

You sure you want proper bug reports for this? There still seems to be a lot of issues. For example, none of these work for me (OSX 64-bt).

----

int4 a = 2; // backend/cod2.c 2630

----

int4 a = void;
int4 b = void;
a = b; // segfault

----

int4 a = void;
a = simd(XMM.PXOR, a, a); // segfault

----

I could go on and on really. Very little seems to work at my end.

Actually, looking at the auto-tester, I'm not alone. Just seems to be OSX though.

http://d.puremagic.com/test-results/index.ghtml
January 15, 2012
On 15 January 2012 06:56, Walter Bright <newshound2@digitalmars.com> wrote:
> I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux. Anyhow, it's good enough now to play around with. Consider it alpha quality. Expect bugs - but make bug reports, as there's a serious lack of source code to test it with.

I get 20+ speedup without optimisations with GDC on that small test. :)


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
January 15, 2012
On 15 January 2012 16:59, Iain Buclaw <ibuclaw@ubuntu.com> wrote:
> On 15 January 2012 06:56, Walter Bright <newshound2@digitalmars.com> wrote:
>> I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux. Anyhow, it's good enough now to play around with. Consider it alpha quality. Expect bugs - but make bug reports, as there's a serious lack of source code to test it with.
>
> I get 20+ speedup without optimisations with GDC on that small test. :)
>

Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up with -O2 and above.  My oh my...



-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
January 15, 2012
On 1/15/2012 3:49 AM, Peter Alexander wrote:
> Actually, looking at the auto-tester, I'm not alone. Just seems to be OSX though.

Yeah, it's just OSX. I had the test for that platform inadvertently disabled, gak.
January 15, 2012
On 1/15/2012 10:10 AM, Iain Buclaw wrote:
> Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up
> with -O2 and above.  My oh my...

Woo-hoo!

January 15, 2012
On 15 January 2012 20:10, Iain Buclaw <ibuclaw@ubuntu.com> wrote:

> On 15 January 2012 16:59, Iain Buclaw <ibuclaw@ubuntu.com> wrote:
> > On 15 January 2012 06:56, Walter Bright <newshound2@digitalmars.com>
> wrote:
> >> I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux. Anyhow, it's good enough now to play around with. Consider it alpha
> quality.
> >> Expect bugs - but make bug reports, as there's a serious lack of source
> code
> >> to test it with.
> >
> > I get 20+ speedup without optimisations with GDC on that small test. :)
> >
>
> Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up with -O2 and above.  My oh my...


Oh my indeed.
Haha, well I'm sure that's a fairly artificial result, but yes, this is why
I've been harping on for months that it's a bare necessity to provide
language support :P


January 15, 2012
Iain Buclaw:

> Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up with -O2 and above.  My oh my...

Please, show me the assembly code produced, with its relative D source :-)

Bye,
bearophile
January 15, 2012
On 15 January 2012 19:01, bearophile <bearophileHUGS@lycos.com> wrote:
> Iain Buclaw:
>
>> Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up with -O2 and above.  My oh my...
>
> Please, show me the assembly code produced, with its relative D source :-)
>
> Bye,
> bearophile

D code:
----
import core.simd;

void test2a(float4 a) { }

float4 test2()
{
   float4 a = 1.2;
   a = a * 3 + 7;
   test2a(a);
   return a;
}
----

Relevant assembly:
----
.LC5:
        .long   1067030938
        .long   1067030938
        .long   1067030938
        .long   1067030938
        .section        .rodata.cst4,"aM",@progbits,4
        .align 4

_D4test5test2FZNhG4f:
        .cfi_startproc
        movl    $3, %eax
        cvtsi2ss        %eax, %xmm0
        movb    $7, %al
        cvtsi2ss        %eax, %xmm1
        unpcklps        %xmm0, %xmm0
        unpcklps        %xmm1, %xmm1
        movlhps %xmm0, %xmm0
        movlhps %xmm1, %xmm1
        mulps   .LC5(%rip), %xmm0
        addps   %xmm1, %xmm0
        ret
        .cfi_endproc
----

As someone pointed out to me, the only optimisation missing was constant propagation, but that doesn't matter too much for now.

Regards
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
« First   ‹ Prev
1 2 3 4 5 6 7