January 15, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 15 January 2012 19:01, bearophile <bearophileHUGS@lycos.com> wrote: > Iain Buclaw: > >> Correction, 1.5x speed up without, 20x speed up with -O1, 30x speed up with -O2 and above. My oh my... > > Please, show me the assembly code produced, with its relative D source :-) > > Bye, > bearophile For those who can't read AT&T: ---- .LC5: .long 1067030938 .long 1067030938 .long 1067030938 .long 1067030938 .align 16 _D4test5test2FZNhG4f: .cfi_startproc mov eax, 3 cvtsi2ss xmm0, eax mov al, 7 cvtsi2ss xmm1, eax unpcklps xmm0, xmm0 unpcklps xmm1, xmm1 movlhps xmm0, xmm0 movlhps xmm1, xmm1 mulps xmm0, XMMWORD PTR .LC5[rip] addps xmm0, xmm1 ret .cfi_endproc ---- -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0'; |
January 16, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | I just built 32 & 64 bit DMD (latest commit on git tree is f800f6e342e2d9ab1ec9a6275b8239463aa1cee8) Using the 32-bit version, I got this error: Internal error: backend/cg87.c 1702 The 64-bit version went fine. Previously, both 32 and 64 bit version had no problem. On 01/15/2012 01:56 PM, Walter Bright wrote: > I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux. Anyhow, it's good enough now to play around with. Consider it alpha quality. Expect bugs - but make bug reports, as there's a serious lack of source code to test it with. > ----------------------- > import core.simd; > > void test1a(float[4] a) { } > > void test1() > { > float[4] a = 1.2; > a[] = a[] * 3 + 7; > test1a(a); > } > > void test2a(float4 a) { } > > void test2() > { > float4 a = 1.2; > a = a * 3 + 7; > test2a(a); > } > > import std.stdio; > import std.datetime; > > int main() > { > test1(); > test2(); > auto b = comparingBenchmark!(test1, test2, 100); > writeln(b.point); > return 0; > } |
January 16, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andre Tampubolon | On 1/16/2012 12:59 AM, Andre Tampubolon wrote:
> I just built 32& 64 bit DMD (latest commit on git tree is
> f800f6e342e2d9ab1ec9a6275b8239463aa1cee8)
>
> Using the 32-bit version, I got this error:
> Internal error: backend/cg87.c 1702
>
> The 64-bit version went fine.
>
> Previously, both 32 and 64 bit version had no problem.
Which machine?
|
January 16, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Well I only have 1 machine, a laptop running 64 bit Arch Linux.
Yesterday I did a git pull, built both 32 & 64 bit DMD, and this code
compiled fine using those.
But now, the 32 bit version fails.
Walter Bright <newshound2@digitalmars.com> wrote:
> On 1/16/2012 12:59 AM, Andre Tampubolon wrote:
>> I just built 32& 64 bit DMD (latest commit on git tree is
>> f800f6e342e2d9ab1ec9a6275b8239463aa1cee8)
>>
>> Using the 32-bit version, I got this error:
>> Internal error: backend/cg87.c 1702
>>
>> The 64-bit version went fine.
>>
>> Previously, both 32 and 64 bit version had no problem.
>
> Which machine?
|
January 16, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andre Tampubolon | 32 bit SIMD for Linux is not implemented.
It's all 64 bit platforms, and 32 bit OS X.
On 1/16/2012 2:35 AM, Andre Tampubolon wrote:
> Well I only have 1 machine, a laptop running 64 bit Arch Linux.
> Yesterday I did a git pull, built both 32& 64 bit DMD, and this code
> compiled fine using those.
> But now, the 32 bit version fails.
>
> Walter Bright<newshound2@digitalmars.com> wrote:
>> On 1/16/2012 12:59 AM, Andre Tampubolon wrote:
>>> I just built 32& 64 bit DMD (latest commit on git tree is
>>> f800f6e342e2d9ab1ec9a6275b8239463aa1cee8)
>>>
>>> Using the 32-bit version, I got this error:
>>> Internal error: backend/cg87.c 1702
>>>
>>> The 64-bit version went fine.
>>>
>>> Previously, both 32 and 64 bit version had no problem.
>>
>> Which machine?
|
January 16, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 1/15/12 12:56 AM, Walter Bright wrote:
> I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux.
> Anyhow, it's good enough now to play around with. Consider it alpha
> quality. Expect bugs - but make bug reports, as there's a serious lack
> of source code to test it with.
> -----------------------
> import core.simd;
>
> void test1a(float[4] a) { }
>
> void test1()
> {
> float[4] a = 1.2;
> a[] = a[] * 3 + 7;
> test1a(a);
> }
>
> void test2a(float4 a) { }
>
> void test2()
> {
> float4 a = 1.2;
> a = a * 3 + 7;
> test2a(a);
> }
These two functions should have the same speed. The function that ought to be slower is:
void test1()
{
float[5] a = 1.2;
float[] b = a[1 .. $];
b[] = b[] * 3 + 7;
test1a(a);
}
Andrei
|
January 16, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu Attachments:
| On 16 January 2012 18:17, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org > wrote: > On 1/15/12 12:56 AM, Walter Bright wrote: > >> I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux. Anyhow, it's good enough now to play around with. Consider it alpha quality. Expect bugs - but make bug reports, as there's a serious lack of source code to test it with. >> ----------------------- >> import core.simd; >> >> void test1a(float[4] a) { } >> >> void test1() >> { >> float[4] a = 1.2; >> a[] = a[] * 3 + 7; >> test1a(a); >> } >> >> void test2a(float4 a) { } >> >> void test2() >> { >> float4 a = 1.2; >> a = a * 3 + 7; >> test2a(a); >> } >> > > These two functions should have the same speed. A function using float arrays and a function using hardware vectors should certainly not be the same speed. |
January 16, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On 1/16/12 10:46 AM, Manu wrote:
> A function using float arrays and a function using hardware vectors
> should certainly not be the same speed.
My point was that the version using float arrays should opportunistically use hardware ops whenever possible.
Andrei
|
January 16, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | On Mon, 16 Jan 2012 17:17:44 +0100, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote: > On 1/15/12 12:56 AM, Walter Bright wrote: >> I get a 2 to 2.5 speedup with the vector instructions on 64 bit Linux. >> Anyhow, it's good enough now to play around with. Consider it alpha >> quality. Expect bugs - but make bug reports, as there's a serious lack >> of source code to test it with. >> ----------------------- >> import core.simd; >> >> void test1a(float[4] a) { } >> >> void test1() >> { >> float[4] a = 1.2; >> a[] = a[] * 3 + 7; >> test1a(a); >> } >> >> void test2a(float4 a) { } >> >> void test2() >> { >> float4 a = 1.2; >> a = a * 3 + 7; >> test2a(a); >> } > > These two functions should have the same speed. The function that ought to be slower is: > > void test1() > { > float[5] a = 1.2; > float[] b = a[1 .. $]; > b[] = b[] * 3 + 7; > test1a(a); > } > > > Andrei Unfortunately druntime's array ops are a mess and fail to speed up anything below 16 floats. Additionally there is overhead for a function call and they have to check alignment at runtime. martin |
January 16, 2012 Re: SIMD benchmark | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu Attachments:
| On 16 January 2012 18:48, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org > wrote: > On 1/16/12 10:46 AM, Manu wrote: > >> A function using float arrays and a function using hardware vectors should certainly not be the same speed. >> > > My point was that the version using float arrays should opportunistically use hardware ops whenever possible. I think this is a mistake, because such a piece of code never exists outside of some context. If the context it exists within is all FPU code (and it is, it's a float array), then swapping between FPU and SIMD execution units will probably result in the function being slower than the original (also the float array is unaligned). The SIMD version however must exist within a SIMD context, since the API can't implicitly interact with floats, this guarantees that the context of each function matches that within which it lives. This is fundamental to fast vector performance. Using SIMD is an all or nothing decision, you can't just mix it in here and there. You don't go casting back and fourth between floats and ints on every other line... obviously it's imprecise, but it's also a major performance hazard. There is no difference here, except the performance hazard is much worse. |
Copyright © 1999-2021 by the D Language Foundation