August 03, 2015 Re: Why Java (server VM) is faster than D? | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On Monday, 3 August 2015 at 16:47:58 UTC, John Colvin wrote:
> gets me down to 0.182s with ldc on OS X
Yeah, I tried dmd with the final and didn't get a difference but gdc with final (and -frelease, very important for max speed here since without it the method calls are surrounded by various assertions) and got similar speed to the hand written one too.
|
August 03, 2015 Re: Why Java (server VM) is faster than D? | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On 8/3/15 12:50 PM, John Colvin wrote:
> On Monday, 3 August 2015 at 16:47:14 UTC, Adam D. Ruppe wrote:
>> You can try a few potential optimizations in the D version yourself
>> and see if it makes a difference.
>>
>> Devirtualization has a very small impact. Test this by making `test`
>> take `SubFoo` and making `bar` final, or making `bar` a stand-alone
>> function.
>>
>> That's not it.
>
> Making SubFoo a final class and test take SubFoo gives a >10x speedup
> for me.
Let's make sure we're all comparing apples to apples here.
FWIW, I suspect the inlining to be the most significant improvement, which is impossible for virtual functions in D.
ALSO, make SURE you are compiling in release mode, so you aren't calling a virtual invariant function before/after every call.
-Steve
|
August 03, 2015 Re: Why Java (server VM) is faster than D? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adam D. Ruppe | On Monday, 3 August 2015 at 16:53:30 UTC, Adam D. Ruppe wrote:
> On Monday, 3 August 2015 at 16:47:58 UTC, John Colvin wrote:
>> gets me down to 0.182s with ldc on OS X
>
> Yeah, I tried dmd with the final and didn't get a difference but gdc with final (and -frelease, very important for max speed here since without it the method calls are surrounded by various assertions) and got similar speed to the hand written one too.
ouch, yeah those assertions cause me a 30x slowdown!
|
August 03, 2015 Re: Why Java (server VM) is faster than D? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On 03-Aug-2015 19:54, Steven Schveighoffer wrote: > On 8/3/15 12:50 PM, John Colvin wrote: >> On Monday, 3 August 2015 at 16:47:14 UTC, Adam D. Ruppe wrote: >>> You can try a few potential optimizations in the D version yourself >>> and see if it makes a difference. >>> >>> Devirtualization has a very small impact. Test this by making `test` >>> take `SubFoo` and making `bar` final, or making `bar` a stand-alone >>> function. >>> >>> That's not it. >> >> Making SubFoo a final class and test take SubFoo gives a >10x speedup >> for me. > > Let's make sure we're all comparing apples to apples here. > > FWIW, I suspect the inlining to be the most significant improvement, > which is impossible for virtual functions in D. Should be trivial in this particular case. You just keep the original virtual call where it cannot be deduced. > > ALSO, make SURE you are compiling in release mode, so you aren't calling > a virtual invariant function before/after every call. This one is critical. Actually why do we have an extra call for trivial null-check on any object that doesn't even have invariant? -- Dmitry Olshansky |
August 03, 2015 Re: Why Java (server VM) is faster than D? | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On Monday, 3 August 2015 at 16:50:42 UTC, John Colvin wrote:
> Making SubFoo a final class and test take SubFoo gives a >10x speedup for me.
Right, gdc and ldc will the the aggressive inlining and local data optimizations automatically once it is able to devirtualize the calls (at least when you use the -O flags).
dmd, however, even with -inline, doesn't make the local copy of the variable - it disassembles to this:
08098740 <_D1l4testFC1l6SubFooiZi>:
8098740: 55 push ebp
8098741: 8b ec mov ebp,esp
8098743: 89 c1 mov ecx,eax
8098745: 53 push ebx
8098746: 31 d2 xor edx,edx
8098748: 8b 5d 08 mov ebx,DWORD PTR [ebp+0x8]
809874b: 56 push esi
809874c: 85 c9 test ecx,ecx
809874e: 7e 0f jle 809875f <_D1l4testFC1l6SubFooiZi+0x1f>
8098750: 8b 43 08 mov eax,DWORD PTR [ebx+0x8]
8098753: 8d 74 40 01 lea esi,[eax+eax*2+0x1]
8098757: 42 inc edx
8098758: 89 73 08 mov DWORD PTR [ebx+0x8],esi
809875b: 39 ca cmp edx,ecx
809875d: 7c f1 jl 8098750 <_D1l4testFC1l6SubFooiZi+0x10>
809875f: 8b 43 08 mov eax,DWORD PTR [ebx+0x8]
8098762: 5e pop esi
8098763: 5b pop ebx
8098764: 5d pop ebp
8098765: c2 04 00 ret 0x4
There's no call in there, but there is still indirect memory access for the variable, so it doesn't get the caching benefits of the stack.
It isn't news that dmd's optimizer is pretty bad next to.... well, pretty much everyone else nowdays, whether gdc, ldc, or Java, but it is sometimes nice to take a look at why.
The biggest magic of Java IMO here is being CPU cache friendly!
|
August 03, 2015 Re: Why Java (server VM) is faster than D? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Dmitry Olshansky | On 8/3/15 12:59 PM, Dmitry Olshansky wrote: > On 03-Aug-2015 19:54, Steven Schveighoffer wrote: >> ALSO, make SURE you are compiling in release mode, so you aren't calling >> a virtual invariant function before/after every call. > > This one is critical. Actually why do we have an extra call for trivial > null-check on any object that doesn't even have invariant? Actually, that the call to the invariant should be avoidable if the object doesn't have one. It should be easy to check the vtable pointer to see if it points at the "default" invariant (which does nothing). -Steve |
August 03, 2015 Re: Why Java (server VM) is faster than D? | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On Monday, 3 August 2015 at 16:47:58 UTC, John Colvin wrote:
> changing two lines:
> final class SubFoo : Foo {
> int test(F)(F obj, int repeat) {
I tried it. DMD is no change, while GDC gets acceptable score.
D(DMD 2.067.1): 2.445
D(GDC 4.9.2/2.066): 0.928
Now I got a hint how to improve the code by hand.
Thanks, John.
But the original Java code that I'm porting is
about 10,000 lines of code.
And the performance is about 3 times different.
Yes! Java is 3 times faster than D in my app.
I hope the future DMD/GDC compiler will do the
similar optimization automatically, not by hand.
Aki.
|
August 03, 2015 Re: Why Java (server VM) is faster than D? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On 03-Aug-2015 20:05, Steven Schveighoffer wrote: > On 8/3/15 12:59 PM, Dmitry Olshansky wrote: >> On 03-Aug-2015 19:54, Steven Schveighoffer wrote: > >>> ALSO, make SURE you are compiling in release mode, so you aren't calling >>> a virtual invariant function before/after every call. >> >> This one is critical. Actually why do we have an extra call for trivial >> null-check on any object that doesn't even have invariant? > > Actually, that the call to the invariant should be avoidable if the > object doesn't have one. It should be easy to check the vtable pointer > to see if it points at the "default" invariant (which does nothing). > https://issues.dlang.org/show_bug.cgi?id=14865 -- Dmitry Olshansky |
August 03, 2015 Re: Why Java (server VM) is faster than D? | ||||
---|---|---|---|---|
| ||||
Posted in reply to aki | On Monday, 3 August 2015 at 17:33:30 UTC, aki wrote: > On Monday, 3 August 2015 at 16:47:58 UTC, John Colvin wrote: >> changing two lines: >> final class SubFoo : Foo { >> int test(F)(F obj, int repeat) { > > I tried it. DMD is no change, while GDC gets acceptable score. > D(DMD 2.067.1): 2.445 > D(GDC 4.9.2/2.066): 0.928 > > Now I got a hint how to improve the code by hand. > Thanks, John. > But the original Java code that I'm porting is > about 10,000 lines of code. > And the performance is about 3 times different. > Yes! Java is 3 times faster than D in my app. > I hope the future DMD/GDC compiler will do the > similar optimization automatically, not by hand. > > Aki. LLVM might be able to do achieve Java's optimization for your use case using profile-guided optimization. In principle, it's hard to choose which function to inline without the function call counts, but LLVM has a back-end with sampling support. http://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization Whether or not this is or will be available soon for D in LDC is a different matter. |
Copyright © 1999-2021 by the D Language Foundation