View mode: basic / threaded / horizontal-split · Log in · Help
May 31, 2013
Slow performance compared to C++, ideas?
Recently I ported a simple ray tracer I wrote in C++11 to D. 
Thanks to the similarity between D and C++ it was almost a line 
by line translation, in other words, very very close. However, 
the D verson runs much slower than the C++11 version. On Windows, 
with MinGW GCC and GDC, the C++ version is twice as fast as the D 
version. On OSX, I used Clang++ and LDC, and the C++11 version 
was 4x faster than D verson.  Since the comparison were between 
compilers that share the same codegen backends I suppose that's a 
relatively fair comparison.  (flags used for GDC: -O3 
-fno-bounds-check -frelease,  flags used for LDC: -O3 -release)

I really like the features offered by D but it's the raw 
performance that's worrying me. From what I read D should offer 
similar performance when doing similar things but my own test 
results is not consistent with this claim. I want to know whether 
this slowness is inherent to the language or it's something I was 
not doing right (very possible because I have only a few days of 
experience with D).

Below is the link to the D and C++ code, in case anyone is 
interested to have a look.

https://dl.dropboxusercontent.com/u/974356/raytracer.d
https://dl.dropboxusercontent.com/u/974356/raytracer.cpp
May 31, 2013
Re: Slow performance compared to C++, ideas?
finalpatch:

> I really like the features offered by D but it's the raw 
> performance that's worrying me.

From my experience if you know what you are doing, you are able 
to write that kind of numerical D code that LDC compiles with a 
performance very close to C++, and sometimes higher. But you need 
to be careful about some things.

Don't do this:
foreach (y; (iota(height)))

Use this, because those abstractions are not for free:
foreach (y;  0 .. height)

Be careful with foreach on arrays of structs, because it perform 
copies that are slow if the structs aren't very small.

Be careful with classes, because on default their methods are 
virtual. Sometimes in D you want to use structs for performance 
reasons.

Sometimes in inner loops it's better to use a classic for instead 
of a foreach.

LDC needs far more flags to compile a raytracer well. LDC even 
support link time optimization, but you need even more obscure 
flags.

Also the ending brace of classes and structs doesn't need a 
semicolon in D.

Bye,
bearophile
May 31, 2013
Re: Slow performance compared to C++, ideas?
Hi bearophile,

Thanks for the reply. I changed it to 0..height and it has no 
measurable effect to the runtime.

The reason I used iota(height) was to test 
std.parallelism.parallel. On Windows if I do foreach (y; 
parallel(iota(height))) I do get almost 4x speed up on a quadcore 
computer. However, on OSX, parallel() either does nothing (LDC) 
or makes it slower than single threaded(DMD).

On Friday, 31 May 2013 at 01:42:53 UTC, bearophile wrote:
> Don't do this:
> foreach (y; (iota(height)))
>
> Use this, because those abstractions are not for free:
> foreach (y;  0 .. height)
May 31, 2013
Re: Slow performance compared to C++, ideas?
finalpatch:

> Thanks for the reply. I changed it to 0..height and it has no 
> measurable effect to the runtime.

Have you also fixed all the other things? :-) Probably you have 
to keep fixing potentially slow spots until you find the truly 
slow ones.

Bye,
bearophile
May 31, 2013
Re: Slow performance compared to C++, ideas?
I don't know if this is the case with the code in question (I 
have not looked at it), but sometimes there will be a significant 
effect on performance caused by the use of the garbage collector. 
This is an area in need of radical improvements.

You have to minimize situations where there's a lot of 
allocations going on while the GC is enabled because that will 
fire up the GC more often than is required and it can slow down 
your app significantly; A 2x or more performance penalty is 
certainly possible. It can also make performance unpredictable 
with large delays at inappropriate points in the execution.

BTW, you should post questions like this into d.learn rather than 
in the general discussion area.

--rt
May 31, 2013
Re: Slow performance compared to C++, ideas?
Hi Rob,

I have tried put GC.disable() and GC.enable() around the 
rendering call and it made no difference.

On Friday, 31 May 2013 at 02:13:36 UTC, Rob T wrote:
> I don't know if this is the case with the code in question (I 
> have not looked at it), but sometimes there will be a 
> significant effect on performance caused by the use of the 
> garbage collector. This is an area in need of radical 
> improvements.
>
> You have to minimize situations where there's a lot of 
> allocations going on while the GC is enabled because that will 
> fire up the GC more often than is required and it can slow down 
> your app significantly; A 2x or more performance penalty is 
> certainly possible. It can also make performance unpredictable 
> with large delays at inappropriate points in the execution.
>
> BTW, you should post questions like this into d.learn rather 
> than in the general discussion area.
>
> --rt
May 31, 2013
Re: Slow performance compared to C++, ideas?
On 5/30/2013 6:26 PM, finalpatch wrote:
> Recently I ported a simple ray tracer I wrote in C++11 to D. Thanks to the
> similarity between D and C++ it was almost a line by line translation, in other
> words, very very close. However, the D verson runs much slower than the C++11
> version. On Windows, with MinGW GCC and GDC, the C++ version is twice as fast as
> the D version. On OSX, I used Clang++ and LDC, and the C++11 version was 4x
> faster than D verson.  Since the comparison were between compilers that share
> the same codegen backends I suppose that's a relatively fair comparison.  (flags
> used for GDC: -O3 -fno-bounds-check -frelease,  flags used for LDC: -O3 -release)

For max speed using dmd, use the flags:

   -O -release -inline -noboundscheck

The -inline is especially important.


> I really like the features offered by D but it's the raw performance that's
> worrying me. From what I read D should offer similar performance when doing
> similar things but my own test results is not consistent with this claim. I want
> to know whether this slowness is inherent to the language or it's something I
> was not doing right (very possible because I have only a few days of experience
> with D).
>
> Below is the link to the D and C++ code, in case anyone is interested to have a
> look.
>
> https://dl.dropboxusercontent.com/u/974356/raytracer.d
> https://dl.dropboxusercontent.com/u/974356/raytracer.cpp
May 31, 2013
Re: Slow performance compared to C++, ideas?
Hi Walter,

Thanks for the reply. I have already tried these flags. However, 
DMD's codegen is lagging behind GCC and LLVM at the moment, so 
even with these flags, the runtime is ~10x longer than the C++ 
version compiled with clang++ (2sec with DMD, 200ms with clang++ 
on a Core2 Mac Pro). I know this is comparing apples to oranges 
though, that's why I was comparing GDC vs G++ and LDC vs Clang++.

On Friday, 31 May 2013 at 02:19:40 UTC, Walter Bright wrote:
> For max speed using dmd, use the flags:
>
>    -O -release -inline -noboundscheck
>
> The -inline is especially important.
May 31, 2013
Re: Slow performance compared to C++, ideas?
On 05/30/2013 11:31 PM, finalpatch wrote:
> Hi Walter,
> 
> Thanks for the reply. I have already tried these flags. However, DMD's codegen is lagging behind GCC and LLVM at the
> moment, so even with these flags, the runtime is ~10x longer than the C++ version compiled with clang++ (2sec with DMD,
> 200ms with clang++ on a Core2 Mac Pro). I know this is comparing apples to oranges though, that's why I was comparing
> GDC vs G++ and LDC vs Clang++.
> 
> On Friday, 31 May 2013 at 02:19:40 UTC, Walter Bright wrote:
>> For max speed using dmd, use the flags:
>>
>>    -O -release -inline -noboundscheck
>>
>> The -inline is especially important.


Have you tried:

    dmd -profile

it compiles in trace generation, so that when you run the program you get a .log file which tells you the slowest
functions and other info.

Please not that the resulting code compiled with -profile is slower because it is instrumented.

--jm
May 31, 2013
Re: Slow performance compared to C++, ideas?
On 5/30/13 9:26 PM, finalpatch wrote:
> https://dl.dropboxusercontent.com/u/974356/raytracer.d
> https://dl.dropboxusercontent.com/u/974356/raytracer.cpp

Manu's gonna love this one: make all methods final.

Andrei
« First   ‹ Prev
1 2 3 4 5
Top | Discussion index | About this forum | D home