August 05, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Trass3r | Trass3r:
> > are you willing and able to show me the asm before it gets assembled? (with gcc you do it with the -S switch). (I also suggest to use only the C standard library, with time() and printf() to produce a smaller asm output: http://codepad.org/12EUo16J ).
You are a person of few words :-) Thank you for the asm.
Apparently the program was not compiled in release mode (or with nobounds. With DMD it's the same thing, maybe with gdc it's not the same thing). It contains the calls, but they aren't to the next line, they were for the array bounds:
call _d_assert
call _d_array_bounds
call _d_array_bounds
call _d_assert_msg
call _d_array_bounds
call _d_array_bounds
call _d_array_bounds
call _d_array_bounds
call _d_array_bounds
call _d_array_bounds
call _d_assert_msg
But I think this doesn't fully explain the low performance, I have seen too many instructions like:
movss DWORD PTR [rsp+32], xmm1
movss DWORD PTR [rsp+16], xmm2
movss DWORD PTR [rsp+48], xmm3
If you want to go on with this exploration, then I suggest you to find a way to disable bound tests.
Bye,
bearophile
| |||
August 05, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | > If you want to go on with this exploration, then I suggest you to find a way to disable bound tests.
Ok, now I get up to 32930000 skinned vertices per second.
Still a bit worse than LDC.
| |||
August 05, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Trass3r | Am 04.08.2011, 04:07 Uhr, schrieb Trass3r <un@known.com>:
>> C++:
>> Skinned vertices per second: 48660000
>>
>> C++ no SIMD:
>> Skinned vertices per second: 42420000
>>
>>
>> D dmd:
>> Skinned vertices per second: 159046
>>
>> D gdc:
>> Skinned vertices per second: 23450000
>
>
> D ldc:
> Skinned vertices per second: 37910000
>
> ldc2 -O3 -release -enable-inlining dver.d
D gdc with added -frelease -fno-bounds-check:
Skinned vertices per second: 37710000
| |||
August 05, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Trass3r | Trass3r:
> >> C++ no SIMD:
> >> Skinned vertices per second: 42420000
>...
> D gdc with added -frelease -fno-bounds-check:
> Skinned vertices per second: 37710000
I'd like to know why the GCC back-end is able to produce a more efficient binary from the C++ code (compared to the D code), but now the problem is not large, as before.
It seems I've found a benchmark coming from real-world code that's a worst case for DMD (GDC here produces code about 237 times faster than DMD).
Bye,
bearophile
| |||
August 05, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | > I'd like to know why the GCC back-end is able to produce a more efficient binary from the C++ code (compared to the D code), but now the problem is not large, as before.
I attached both asm versions ;)
| |||
August 06, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | == Quote from bearophile (bearophileHUGS@lycos.com)'s article > Trass3r: > > C++ no SIMD: > > Skinned vertices per second: 42420000 > > > ... > > D gdc: > > Skinned vertices per second: 23450000 > Are you able and willing to show me the asm produced by gdc? There's a problem there. > Bye, > bearophile Notes from me: - Options -fno-bounds-check and -frelease can be just as important in GDC as they are in DMD under certain instances. - You can output asm in intel dialect using -masm=intel if at&t is that difficult for you to read. 8-) I will look into this later from my workstation. | |||
August 06, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Iain Buclaw | Iain Buclaw:
> I will look into this later from my workstation.
The remaining thing to look at is just the small performance difference between the D-GDC version and the C++-G++ version.
Bye,
bearophile
| |||
August 06, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | == Quote from bearophile (bearophileHUGS@lycos.com)'s article > Iain Buclaw: > > I will look into this later from my workstation. > The remaining thing to look at is just the small performance difference between the D-GDC version and the C++-G++ version. > Bye, > bearophile Three things that helped improve performance in a minor way for me: 1) using pointers over dynamic arrays. (5% speedup) 2) removing the calls to CalVector4's constructor (5.7% speedup) 3) using core.stdc.time over std.datetime. (1.6% speedup) Point one is pretty well known issue in D as far as I'm aware. Point two is not an issue with inlining (all methods are marked 'inline'), but it did help remove quite a few movss instructions being emitted. Point three is interesting, it seems that "sw.peek().msecs" slows down the number of iterations in the while loop. With those changes, D implementation is still 21% slower than C++ implementation without SIMD. http://ideone.com/4PP2D | |||
August 06, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Iain Buclaw | Iain Buclaw: Are you using GDC2-64 bit on Linux? > Three things that helped improve performance in a minor way for me: > 1) using pointers over dynamic arrays. (5% speedup) > 2) removing the calls to CalVector4's constructor (5.7% speedup) > 3) using core.stdc.time over std.datetime. (1.6% speedup) > > Point one is pretty well known issue in D as far as I'm aware. Really? I don't remember discussions about it. What is its cause? > Point two is not an issue with inlining (all methods are marked 'inline'), but it did help remove quite a few movss instructions being emitted. This too is something worth fixing. Is this issue in Bugzilla already? > Point three is interesting, it seems that "sw.peek().msecs" slows down the number of iterations in the while loop. This needs to be fixed. > With those changes, D implementation is still 21% slower than C++ implementation > without SIMD. > http://ideone.com/4PP2D This is a lot still. Thank you for your work. I think all three issues are worth fixing, eventually. Bye, bearophile | |||
August 06, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | == Quote from bearophile (bearophileHUGS@lycos.com)'s article > Iain Buclaw: > Are you using GDC2-64 bit on Linux? GDC2-32 bit on Linux. > > Three things that helped improve performance in a minor way for me: > > 1) using pointers over dynamic arrays. (5% speedup) > > 2) removing the calls to CalVector4's constructor (5.7% speedup) > > 3) using core.stdc.time over std.datetime. (1.6% speedup) > > > > Point one is pretty well known issue in D as far as I'm aware. > Really? I don't remember discussions about it. What is its cause? I can't remember the exact discussion, but it was something about a benchmark of passing by value vs passing by ref vs passing by pointer. > > Point two is not an issue with inlining (all methods are marked 'inline'), but it did help remove quite a few movss instructions being emitted. > This too is something worth fixing. Is this issue in Bugzilla already? I don't think its an issue really. But of course, there is a difference between what you say and what you mean with regards to the code here (that being, with the first version, lots of temp vars get created and moved around the place). Regards Iain | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply