Why is this D code slower than C++? (page 5)

Try this version. In MSVC C++, float -> double, funcf -> func for the floating funcs (sqrtf, expf). It improves the time from 8.6 to 5.7 seconds on my computer. The same process makes the D version slower. Bradley Smith wrote: > > Bradley Smith wrote: >> As Bill Baxter pointed out, I missed an optimization on version 2. The pass by reference optimization using the inout on the Intersect's Ray argument. I had applied inout only to the Raytrace's Ray argument. >> >> The further optimization brings the following approx. timings: >> >> time factor >> dmc 5 sec 1.0 >> dmd 9 sec 1.8 >> gdc 13 sec 2.6 > gdc 10 sec 2.0 <-- correction >> msvc 5 sec 1.0 >> g++ doesn't compile >> > > Here is a correction to the gdc results. The wrong optimization flag was used. The build_d_gdc.bat should have "-O3" rather than "-O".

%u wrote: > Bradley Smith Wrote: >> What technical documentation would be proper? What would it >> contain? > > As always such depends on the requirements of the presumed readers. > > If you are able to change your position from the view of the porter to the view of a verifier or freshly introduced maintainer of the port, then you will have an impression of what you would want to look at first. > > It is a pity as it stands, that the question for the content of the technical documentation raises at all. Dude, it's a toy raytracer ported from some free code someone posted to a website somewhere. Why should it come with gobs of documentation? But anyway, the original code was part of a series of tutorials. I think the version Bradley posted was probably from this installment: http://www.devmaster.net/articles/raytracing_series/part3.php As the series goes on, the author adds more and more fancy features to the raytracer. Anyway, the tutorials are already far more documentation than you'll find for most free code out in the wild. --bb

Yes, I see that behavior too. Using doubles, here is what I get. dmc 6 sec dmd 19 sec gdc 17 sec msvc 4 sec It is also interesting that the msvc gets better where the dmc gets worse. I wouldn't stake to much on it though, since these are approximate timings. Thanks, Bradley Daniel Giddings wrote: > Try this version. In MSVC C++, float -> double, funcf -> func for the floating funcs (sqrtf, expf). It improves the time from 8.6 to 5.7 seconds on my computer. The same process makes the D version slower. > > Bradley Smith wrote: >> >> Bradley Smith wrote: >>> As Bill Baxter pointed out, I missed an optimization on version 2. The pass by reference optimization using the inout on the Intersect's Ray argument. I had applied inout only to the Raytrace's Ray argument. >>> >>> The further optimization brings the following approx. timings: >>> >>> time factor >>> dmc 5 sec 1.0 >>> dmd 9 sec 1.8 >>> gdc 13 sec 2.6 >> gdc 10 sec 2.0 <-- correction >>> msvc 5 sec 1.0 >>> g++ doesn't compile >>> >> >> Here is a correction to the gdc results. The wrong optimization flag was used. The build_d_gdc.bat should have "-O3" rather than "-O". >

You must have made a mistake somewhere, because the rendered image from D and C++ are not the same! The image from the D exe has a lone white pixel (also present in the 'float' versions, both D and cpp), but that white pixel is gone in the cpp version (both dmc and msvc). L.

Lionello Lunesu wrote: > You must have made a mistake somewhere, because the rendered image from D and C++ are not the same! > > The image from the D exe has a lone white pixel (also present in the 'float' versions, both D and cpp), but that white pixel is gone in the cpp version (both dmc and msvc). > > L. Sorry, I thought the .d files were also using 'double', but they're not.. This explains the different outcome. L.

January 19, 2007

Re: Why is this D code slower than C++?

Posted by Dave
in reply to Bill Baxter

Permalink

Dave

Posted in reply to Bill Baxter

Permalink

Bill Baxter wrote:
> Dave wrote:
>> Bradley Smith wrote:
>>> Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp.
>>>
>>> Here are the changes I've made. Attached is the new code.
>>>
>>>   Call RegisterClass outside of assert. (Broken if -release used)
>>>   Apply -release option. (Increases speed in an unknown way)
>>>   Converted templates to regular functions. (Templates not being inlined)
>>
>> Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.
> 
> I changed a bunch parameters to inout after discovering that it made a difference for the Intersect method.  It could be that I had the template parameters as inout at the time when getting rid of the templates seemed to make a difference.
> 
> That's evil that inout disables inlining.
> Seems like inout params would be easier to inline than regular parameters, but I guess not.

I agree and have been wondering about that for some time - my guess is that it caused some type of bug early on and Walter didn't have the time to loop back and fix.

> 
> --bb

nobody_ wrote: > I think this thread is worth posting as a (D-performance) tutorial or something. > Alot of interesting performance issues have come up, of which most were unknown to me :) > Hopefully the need for a tutorial on performance will soon be deprecated by better optimizations and a faster GC <g> > What do you think? > >

Dave wrote: > Bradley Smith wrote: >> Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. >> >> Here are the changes I've made. Attached is the new code. >> >> Call RegisterClass outside of assert. (Broken if -release used) >> Apply -release option. (Increases speed in an unknown way) >> Converted templates to regular functions. (Templates not being inlined) > > Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined. > You are correct. I have confirmed that the templates and regular functions are inlined. However, the way they are inlined appears to perform much more moving of data around than manually inlining. Perhaps the extra data moving is the cause of the performance degredation by using the function or template. I can also confirm that using inout on the function parameters will cause it to not be inlined. Thanks, Bradley

The Java implementation is also faster. time factor memory dmc 5 sec 1.0 5 MB java 8 sec 1.6 72 MB (Java 1.6.0 -server) dmd 9 sec 1.8 5 MB java 19 sec 3.8 19 MB (Java 1.6.0 -client) However, Java uses much more memory. All three implementations are in the attached zip. Thanks, Bradley

Forums