February 15, 2008 Re: D slower than C++ by a factor of _two_ for simple raytracer (gdc) | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | downs wrote:
> The weird thing is: even if I inline the one spot where gdc ignores
> its opportunity to inline a function, so that I have the _same_
> call-counts as G++ (as measured with -g -pg), even then, the D code
> is slower. So it doesn't depend on missing inlining opportunities. Or
> am I missing something?
It's often worthwhile to run obj2asm on the output of each, and compare.
| |||
February 15, 2008 Returning large structs == bad | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | Another interesting observation. If I change all my opFoo's to opFooAssign's, and use those instead, speed goes up from 16s to 13s; indicating that returning large structs (12 bytes/vector) causes a significant speed hit. Still not close to the C++ version though. The weird thing is that all those ops have been inlined (or so says the assembler dump). Weird. --downs | |||
February 15, 2008 Re: Returning large structs == bad | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | downs wrote:
> Another interesting observation.
>
> If I change all my opFoo's to opFooAssign's, and use those instead, speed goes up from 16s to 13s; indicating that returning large structs (12 bytes/vector) causes a significant speed hit. Still not close to the C++ version though. The weird thing is that all those ops have been inlined (or so says the assembler dump). Weird.
>
> --downs
Excuse me. 24 bytes.
| |||
February 15, 2008 GDC's std.math still not being inlined | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | Another other observation: GDC's std.math functions still aren't being inlined properly, forcing me to use the intrinsics manually. That didn't cause the speed difference though. Still, it would be nice to see it fixed some time soon, seeing as I filed the bug in November :) --downs | |||
February 15, 2008 Re: Returning large structs == bad | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | downs wrote: > Another interesting observation. > > If I change all my opFoo's to opFooAssign's, and use those instead, speed goes up from 16s to 13s; indicating that returning large structs (12 bytes/vector) causes a significant speed hit. Still not close to the C++ version though. The weird thing is that all those ops have been inlined (or so says the assembler dump). Weird. > > --downs Yeah, I was about to say the same. See here: http://paste.dprogramming.com/dpolmzhw It's ugly, but no struct returning. On my machine it's about a second slower than g++ (8.9s vs. 7.8s) compiled via: gdc -fversion=Posix -fversion=Tango -O3 -fomit-frame-pointer -fweb -frelease -finline-functions and g++ -O3 -fomit-frame-pointer -fweb -finline-functions There's probably some other optimizations that could be made. But really I think this comes down to the compiler not being as mature. The stuff that I did should all be done by an optimizing compiler. You're basically tricking the compiler into moving less bits around. Tim. | |||
February 15, 2008 Re: Returning large structs == bad | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Tim Burrell | Tim Burrell wrote:
> downs wrote:
>> Another interesting observation.
>>
>> If I change all my opFoo's to opFooAssign's, and use those instead, speed goes up from 16s to 13s; indicating that returning large structs (12 bytes/vector) causes a significant speed hit. Still not close to the C++ version though. The weird thing is that all those ops have been inlined (or so says the assembler dump). Weird.
>>
>> --downs
>
> Yeah, I was about to say the same. See here:
>
> http://paste.dprogramming.com/dpolmzhw
>
> It's ugly, but no struct returning.
>
> On my machine it's about a second slower than g++ (8.9s vs. 7.8s)
> compiled via:
>
> gdc -fversion=Posix -fversion=Tango -O3 -fomit-frame-pointer -fweb -frelease -finline-functions
>
> and
>
> g++ -O3 -fomit-frame-pointer -fweb -finline-functions
>
> There's probably some other optimizations that could be made. But really I think this comes down to the compiler not being as mature. The stuff that I did should all be done by an optimizing compiler. You're basically tricking the compiler into moving less bits around.
>
> Tim.
But even using your compiler flags, I'm still looking at 12.8s (D) vs 8.1s (C++) .. 11.4 (D) vs 7.8 (C++) using -march=nocona.
:ten minutes later:
... Okay, now I'm confused.
Your program is three seconds faster than my op*Assign version.
Is there a generic problem with operator overloading?
I rewrote my version for freestanding functions .. 9.5s :confused: Why do struct members (which are inlined, I checked) take such a speed hit?
Ah well. Let's hope LLVMDC does a better job .. someday.
--downs
| |||
February 15, 2008 Re: Returning large structs == bad | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Tim Burrell | downs: >f I change all my opFoo's to opFooAssign's, and use those instead, speed goes up from 16s to 13s; indicating that returning large structs (12 bytes/vector) causes a significant speed hit.< Tim Burrell: > Yeah, I was about to say the same. See here: Yep, see my TinyVector ;-) Bye, bearophile | |||
February 15, 2008 Re: Returning large structs == bad | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | "downs" <default_357-line@yahoo.de> wrote in message news:fp4593$1kko$1@digitalmars.com... > I rewrote my version for freestanding functions .. 9.5s :confused: Why do struct members (which are inlined, I checked) take such a speed hit? I think other people have come to this bizarre realization as well. It really doesn't make any sense. Have you compared the assembly of calling a struct member function and calling a free function? | |||
February 15, 2008 Re: Returning large structs == bad | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jarrett Billingsley | I ran a comparison of struct vector methods vs freestanding, and the GDC generated assembler code is precisely identical.
Here's my test source
struct foo {
double x, y, z;
void opAddAssign(ref foo bar) {
x += bar.x; y += bar.y; z += bar.z;
}
}
void foo_add(ref foo bar, ref foo baz) {
baz.x += bar.x; baz.y += bar.y; baz.z += bar.z;
}
// prevents overzealous optimization
// really just returns 0, 0, 0
extern(C) foo complex_external_function();
import std.stdio;
void main() {
foo a = complex_external_function(), b = complex_external_function();
asm { int 3; }
a += b;
asm { int 3; }
foo c = complex_external_function(), d = complex_external_function();
asm { int 3; }
foo_add(d, c);
asm { int 3; }
writefln(a, b, c, d);
}
And here are the relevant two bits of assembler.
#APP
int $3
#NO_APP
fldl -120(%ebp)
faddl -96(%ebp)
fstpl -120(%ebp)
fldl -112(%ebp)
faddl -88(%ebp)
fstpl -112(%ebp)
fldl -104(%ebp)
faddl -80(%ebp)
fstpl -104(%ebp)
#APP
int $3
#NO_APP
fldl -72(%ebp)
faddl -48(%ebp)
fstpl -72(%ebp)
fldl -64(%ebp)
faddl -40(%ebp)
fstpl -64(%ebp)
fldl -56(%ebp)
faddl -32(%ebp)
fstpl -56(%ebp)
No difference. But then why the obvious speed difference? Color me confused ._.
--downs
| |||
February 15, 2008 Re: Returning large structs == bad | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | downs wrote:
> I rewrote my version for freestanding functions .. 9.5s :confused: Why do struct members (which are inlined, I checked) take such a speed hit?
>
My version had a bug. x__X
The correct version takes 11.2s again.
--downs
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply