| Thread overview | |||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
August 03, 2011 From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
The benchmark info: http://chadaustin.me/2011/01/digging-into-javascript-performance/ The code, in C++, JS, Java, C#: https://github.com/chadaustin/Web-Benchmarks/ The C++/JS/Java code runs on a single core. D2 version translated from the C# version (the C++ version uses struct inheritance!): http://ideone.com/kf1tz Bye, bearophile | ||||
August 03, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | 03.08.2011 18:20, bearophile: > The benchmark info: > http://chadaustin.me/2011/01/digging-into-javascript-performance/ > > The code, in C++, JS, Java, C#: > https://github.com/chadaustin/Web-Benchmarks/ > The C++/JS/Java code runs on a single core. > > D2 version translated from the C# version (the C++ version uses struct inheritance!): > http://ideone.com/kf1tz > > Bye, > bearophile Compilers: C++: cl /O2 /Oi /Ot /Oy /GT /GL and link /STACK:10240000 Java: Oracle Java 1.6 with hm... Oracle default settings C#: Csc /optimize+ D2: dmd -O -noboundscheck -inline -release Type column: working scalar type Other columns: vertices per second (inaccuracy is about 1%) by language (tests from bearophile's message, C++ test is "skinning_test_no_simd.cpp"). System: Windows XP, Core 2 Duo E6850 ----------------------------------------------------------- Type | C++ | Java | C# | D2 ----------------------------------------------------------- float | 31_400_000 | 17_000_000 | 14_700_000 | 168_000 double | 32_300_000 | 16_000_000 | 14_100_000 | 166_000 real | 32_300_000 | no real | no real | 203_000 int | 29_100_000 | 14_600_000 | 14_100_000 | 16_500_000 long | 29_100_000 | 6_600_000 | 4_400_000 | 5_800_000 ----------------------------------------------------------- JavaScript vs C++ speed is at the first link of original bearophile's post and JS is about 10-20 temes slower than C++. Looks like a spiteful joke... In other words: WTF?! JavaScript is about 10 times faster than D in floating point calculations!? Please, tell me that I'm mistaken. | |||
August 03, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Denis Shelomovskij Attachments:
| I believe that "long" in this case is 32 bits in C++, and 64-bits in the remaining languages, hence the same result for int and long in C++. Try with "long long" maybe? :)
--
Ziad
2011/8/3 Denis Shelomovskij <verylonglogin.reg@gmail.com>
> 03.08.2011 18:20, bearophile:
>
> The benchmark info:
>> http://chadaustin.me/2011/01/**digging-into-javascript-**performance/<http://chadaustin.me/2011/01/digging-into-javascript-performance/>
>>
>> The code, in C++, JS, Java, C#: https://github.com/chadaustin/**Web-Benchmarks/<https://github.com/chadaustin/Web-Benchmarks/> The C++/JS/Java code runs on a single core.
>>
>> D2 version translated from the C# version (the C++ version uses struct
>> inheritance!):
>> http://ideone.com/kf1tz
>>
>> Bye,
>> bearophile
>>
>
> Compilers:
> C++: cl /O2 /Oi /Ot /Oy /GT /GL and link /STACK:10240000
> Java: Oracle Java 1.6 with hm... Oracle default settings
> C#: Csc /optimize+
> D2: dmd -O -noboundscheck -inline -release
>
> Type column: working scalar type
> Other columns: vertices per second (inaccuracy is about 1%) by language
> (tests from bearophile's message, C++ test is "skinning_test_no_simd.cpp").
>
> System: Windows XP, Core 2 Duo E6850
>
> ------------------------------**-----------------------------
> Type | C++ | Java | C# | D2
> ------------------------------**-----------------------------
> float | 31_400_000 | 17_000_000 | 14_700_000 | 168_000 double | 32_300_000 | 16_000_000 | 14_100_000 | 166_000 real | 32_300_000 | no real | no real | 203_000 int | 29_100_000 | 14_600_000 | 14_100_000 | 16_500_000 long | 29_100_000 | 6_600_000 | 4_400_000 | 5_800_000
> ------------------------------**-----------------------------
>
> JavaScript vs C++ speed is at the first link of original bearophile's post
> and JS is about 10-20 temes slower than C++.
> Looks like a spiteful joke... In other words: WTF?! JavaScript is about 10
> times faster than D in floating point calculations!? Please, tell me that
> I'm mistaken.
>
| |||
August 03, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Ziad Hatahet | 03.08.2011 22:15, Ziad Hatahet: > I believe that "long" in this case is 32 bits in C++, and 64-bits in the > remaining languages, hence the same result for int and long in C++. Try > with "long long" maybe? :) > > > -- > Ziad > > > 2011/8/3 Denis Shelomovskij <verylonglogin.reg@gmail.com > <mailto:verylonglogin.reg@gmail.com>> > > 03.08.2011 18:20, bearophile: > > The benchmark info: > http://chadaustin.me/2011/01/__digging-into-javascript-__performance/ > <http://chadaustin.me/2011/01/digging-into-javascript-performance/> > > The code, in C++, JS, Java, C#: > https://github.com/chadaustin/__Web-Benchmarks/ > <https://github.com/chadaustin/Web-Benchmarks/> > The C++/JS/Java code runs on a single core. > > D2 version translated from the C# version (the C++ version uses > struct inheritance!): > http://ideone.com/kf1tz > > Bye, > bearophile > > > Compilers: > C++: cl /O2 /Oi /Ot /Oy /GT /GL and link /STACK:10240000 > Java: Oracle Java 1.6 with hm... Oracle default settings > C#: Csc /optimize+ > D2: dmd -O -noboundscheck -inline -release > > Type column: working scalar type > Other columns: vertices per second (inaccuracy is about 1%) by > language (tests from bearophile's message, C++ test is > "skinning_test_no_simd.cpp"). > > System: Windows XP, Core 2 Duo E6850 > > ------------------------------__----------------------------- > Type | C++ | Java | C# | D2 > ------------------------------__----------------------------- > float | 31_400_000 | 17_000_000 | 14_700_000 | 168_000 > double | 32_300_000 | 16_000_000 | 14_100_000 | 166_000 > real | 32_300_000 | no real | no real | 203_000 > int | 29_100_000 | 14_600_000 | 14_100_000 | 16_500_000 > long | 29_100_000 | 6_600_000 | 4_400_000 | 5_800_000 > ------------------------------__----------------------------- > > JavaScript vs C++ speed is at the first link of original > bearophile's post and JS is about 10-20 temes slower than C++. > Looks like a spiteful joke... In other words: WTF?! JavaScript is > about 10 times faster than D in floating point calculations!? > Please, tell me that I'm mistaken. > > Good! This is my first blunder (it's so easy to complitely forget illogical (for me) language design). So, corrected last row: Type | C++ | Java | C# | D2 ------------------------------------------------------------- long | 5_500_000 | 6_600_000 | 4_400_000 | 5_800_000 Java is the fastest "long" language :) | |||
August 03, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Denis Shelomovskij | > System: Windows XP, Core 2 Duo E6850
Is this Windows XP 32 bit or 64 bit? That will probably make a difference on the longs I'd expect.
| |||
August 03, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Adam D. Ruppe | On 8/3/11 9:48 PM, Adam D. Ruppe wrote:
>> System: Windows XP, Core 2 Duo E6850
>
> Is this Windows XP 32 bit or 64 bit? That will probably make
> a difference on the longs I'd expect.
It doesn't, long is 32-bit wide on Windows x86_64 too (LLP64).
David
| |||
August 03, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Adam D. Ruppe | 03.08.2011 22:48, Adam D. Ruppe пишет:
>> System: Windows XP, Core 2 Duo E6850
>
> Is this Windows XP 32 bit or 64 bit? That will probably make
> a difference on the longs I'd expect.
I meant Windows XP 32 bit (5.1 (Build 2600: Service Pack 3)) (according to what is "Windows XP" in wikipedia)
| |||
August 03, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Denis Shelomovskij | Denis Shelomovskij: > (tests from bearophile's message, C++ test is "skinning_test_no_simd.cpp"). For a more realistic test I suggest you to time the C++ version that uses the intrinsics too (only for float). > Looks like a spiteful joke... In other words: WTF?! JavaScript is about 10 times faster than D in floating point calculations!? Please, tell me that I'm mistaken. Languages aren't slow or fast, their implementations produce assembly that's more or less efficient. A D1 version fit for LDC V1 with Tango: http://codepad.org/ewDy31UH Vertices (millions), Linux 32 bit: C++ no simd: 29.5 D: 27.6 LDC based on DMD v1.057 and llvm 2.6, ldc -O3 -release -inline G++ V4.3.3, -s -O3 -mfpmath=sse -ffast-math -msse3 It's a bit slower than the C++ version, but for most people that's an acceptable difference (and maybe porting the C++ code to D instead of the C# one and using a more modern LLVM you reduce that loss a bit). Bye, bearophile | |||
August 04, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Denis Shelomovskij | > Looks like a spiteful joke... In other words: WTF?! JavaScript is about 10 times faster than D in floating point calculations!? Please, tell me that I'm mistaken.
I'm afraid not. dmd's backend isn't good at floating point calculations.
| |||
August 04, 2011 Re: From a C++/JS benchmark | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Trass3r | Trass3r:
> I'm afraid not. dmd's backend isn't good at floating point calculations.
Studying a bit the asm it's not hard to find the cause, because this benchmark is quite pure (synthetic, despite I think it comes from real-world code).
This is what G++ generates from the C++ code without intrinsics (the version that uses SIMD intrinsics has a similar look but it's shorter):
movl (%eax), %edx
movss 4(%eax), %xmm0
movl 8(%eax), %ecx
leal (%edx,%edx,2), %edx
sall $4, %edx
addl %ebx, %edx
testl %ecx, %ecx
movss 12(%edx), %xmm1
movss 20(%edx), %xmm7
movss (%edx), %xmm5
mulss %xmm0, %xmm1
mulss %xmm0, %xmm7
movss 4(%edx), %xmm6
movss 8(%edx), %xmm4
movss %xmm1, (%esp)
mulss %xmm0, %xmm5
movss 28(%edx), %xmm1
movss %xmm7, 4(%esp)
mulss %xmm0, %xmm6
movss 32(%edx), %xmm7
mulss %xmm0, %xmm1
movss 16(%edx), %xmm3
mulss %xmm0, %xmm7
movss 24(%edx), %xmm2
movss %xmm1, 16(%esp)
mulss %xmm0, %xmm4
movss 36(%edx), %xmm1
movss %xmm7, 8(%esp)
mulss %xmm0, %xmm3
movss 40(%edx), %xmm7
mulss %xmm0, %xmm2
mulss %xmm0, %xmm1
mulss %xmm0, %xmm7
mulss 44(%edx), %xmm0
leal 12(%eax), %edx
movss %xmm7, 12(%esp)
movss %xmm0, 20(%esp)
This is what DMD generates for the same (or quite similar) piece of code:
movsd
mov EAX,068h[ESP]
imul EDX,EAX,030h
add EDX,018h[ESP]
fld float ptr [EDX]
fmul float ptr 06Ch[ESP]
fstp float ptr 038h[ESP]
fld float ptr 4[EDX]
fmul float ptr 06Ch[ESP]
fstp float ptr 03Ch[ESP]
fld float ptr 8[EDX]
fmul float ptr 06Ch[ESP]
fstp float ptr 040h[ESP]
fld float ptr 0Ch[EDX]
fmul float ptr 06Ch[ESP]
fstp float ptr 044h[ESP]
fld float ptr 010h[EDX]
fmul float ptr 06Ch[ESP]
fstp float ptr 048h[ESP]
fld float ptr 014h[EDX]
fmul float ptr 06Ch[ESP]
fstp float ptr 04Ch[ESP]
fld float ptr 018h[EDX]
fmul float ptr 06Ch[ESP]
fstp float ptr 050h[ESP]
fld float ptr 01Ch[EDX]
mov CL,070h[ESP]
xor CL,1
fmul float ptr 06Ch[ESP]
fstp float ptr 054h[ESP]
fld float ptr 020h[EDX]
fmul float ptr 06Ch[ESP]
fstp float ptr 058h[ESP]
fld float ptr 024h[EDX]
fmul float ptr 06Ch[ESP]
fstp float ptr 05Ch[ESP]
fld float ptr 028h[EDX]
fmul float ptr 06Ch[ESP]
fstp float ptr 060h[ESP]
fld float ptr 02Ch[EDX]
fmul float ptr 06Ch[ESP]
fstp float ptr 064h[ESP]
I think DMD back-end already contains logic to use xmm registers as true registers (not as a floating point stack or temporary holes where to push and pull FP values), so I suspect it doesn't take too much work to modify it to emit FP asm with a single optimization: just keep the values inside registers. In my uninformed opinion all other FP optimizations are almost insignificant compared to this one :-)
Bye,
bearophile
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply