January 17, 2007
nobody_ wrote:
> I really hope you'll get it faster than the C++ variant.
> 
> Might -profile shed some light?
> Or maybe I lurk here in learn for a reason :D
> 
> 
> 
>>Thanks for all the suggestions. It helps, but not enough to make the D
>>code faster than the C++. It is now 2.6 times slower. The render times
>>are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp.
>>
>>Here are the changes I've made. Attached is the new code.
>>
>>  Call RegisterClass outside of assert. (Broken if -release used)
>>  Apply -release option. (Increases speed in an unknown way)
>>  Converted templates to regular functions. (Templates not being inlined)
>>  Manually inlined DOT function. (Function not being inlined)
>>
>>
>>Any other suggestions?
> 
> 
> 

I ran it with -profile and it takes about 25 min.

here's the log

http://www.webpages.uidaho.edu/~shro8822/trace.log
January 17, 2007
On Wed, 17 Jan 2007 22:34:31 +0000, Steve Horne <stephenwantshornenospam100@aol.com> wrote:

>On Wed, 17 Jan 2007 11:18:10 -0800, Bradley Smith <digitalmars-com@baysmith.com> wrote:
>
>>Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp.
>
>...
>
>>
>>Any other suggestions?
>
>I haven't actually looked at the code, but I'll take a guess anyway.
>
>Raytracing is heavy on the floating point math. As Walter Bright acknowledges, the DMD compiler does not handle the optimisation of float arithmetic as well as some C++ compilers.

On second thoughts, if you're comparing with the DMC compiler for C++, floating point math performance seems a less likely issue. It seems odd that there's such a difference between the DMD and DMC compilers. You'd think the DMD compiler would use much the same back-end code generation that DMC does.

-- 
Remove 'wants' and 'nospam' from e-mail.
January 18, 2007
BCS Wrote:
> here's the log
> 
> http://www.webpages.uidaho.edu/~shro8822/trace.log

That looks like the use of foreach lets the performance go down. Maybe its due to the numerous calls of delegates.
January 18, 2007
%u wrote:
> BCS Wrote:
>> here's the log
>>
>> http://www.webpages.uidaho.edu/~shro8822/trace.log
> 
> That looks like the use of foreach lets the performance go down. Maybe its due to the numerous calls of delegates. 

No, it shows foreach there because a lot of stuff got inlined and it's only seen by the profiler as the foreach's body. In my experience, more meaningful results can be obtained if -profile is used without -inline.


--
Tomasz Stachowiak
January 18, 2007
>
> I ran it with -profile and it takes about 25 min.

Talk about overhead :)
cpp took about 7 minutes
(log attached)


>
> here's the log
>
> http://www.webpages.uidaho.edu/~shro8822/trace.log



January 18, 2007
Bradley Smith wrote:
> Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp.
> 
> Here are the changes I've made. Attached is the new code.
> 
>   Call RegisterClass outside of assert. (Broken if -release used)
>   Apply -release option. (Increases speed in an unknown way)
>   Converted templates to regular functions. (Templates not being inlined)
>   Manually inlined DOT function. (Function not being inlined)

You left out changing Intersect's Ray argument to be inout.  And generally all Ray (and possibly vector3 parameters) to be inout to avoid  the cost of copying them on the stack.

Also converting vector expressions like
      vector3 v = a_Ray.origin - m_Centre;
to
      vector3 v = a_Ray.origin;
      v -= m_Centre;

makes a difference.  Changing that one line in the Sphere.Intersect routine changes my runtime from 12.2 to 14.3 sec.

Interestingly the same sort of transformation to the C++ code didn't seem to make much difference.  It could be related in part to the C++ vector parameters on the operators all taking const vector& (references) vs the D ones being just plain vector3.  Chaging all the operators in the D version to inout may help speed too.

With those changes on my Intel Xeon 3.6GHz CPU the run times are about 10.1 sec vs 12.2 sec.  D still not as fast as the C++, but close.

--bb
January 18, 2007
Bill Baxter wrote:
> Bradley Smith wrote:
>> Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp.
>>
>> Here are the changes I've made. Attached is the new code.
>>
>>   Call RegisterClass outside of assert. (Broken if -release used)
>>   Apply -release option. (Increases speed in an unknown way)
>>   Converted templates to regular functions. (Templates not being inlined)
>>   Manually inlined DOT function. (Function not being inlined)
> 
> You left out changing Intersect's Ray argument to be inout.  And generally all Ray (and possibly vector3 parameters) to be inout to avoid  the cost of copying them on the stack.
> 
> Also converting vector expressions like
>       vector3 v = a_Ray.origin - m_Centre;
> to
>       vector3 v = a_Ray.origin;
>       v -= m_Centre;
> 
> makes a difference.  Changing that one line in the Sphere.Intersect routine changes my runtime from 12.2 to 14.3 sec.
> 
> Interestingly the same sort of transformation to the C++ code didn't seem to make much difference.  It could be related in part to the C++ vector parameters on the operators all taking const vector& (references) vs the D ones being just plain vector3.  Chaging all the operators in the D version to inout may help speed too.
> 
> With those changes on my Intel Xeon 3.6GHz CPU the run times are about 10.1 sec vs 12.2 sec.  D still not as fast as the C++, but close.
> 
> --bb

One more thing to try (now that auto classes are allocated on the stack) is to convert the structs to classes and pass those around. Of course you can't return those from things like opSub(), so you'd have to always use opXxxAssign(), etc. I haven't gone over the code in detail, so maybe this is not really feasible but maybe worth a shot?

IIRC, one of the problems with using 'inout' as function params. is that those are excluded from consideration for in-lining with the current D compiler front-end.
January 18, 2007
Bradley Smith wrote:
> Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp.
> 
> Here are the changes I've made. Attached is the new code.
> 
>   Call RegisterClass outside of assert. (Broken if -release used)
>   Apply -release option. (Increases speed in an unknown way)
>   Converted templates to regular functions. (Templates not being inlined)

Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.

>   Manually inlined DOT function. (Function not being inlined)
> 
> 
> Any other suggestions?
> 
> Thanks,
>   Bradley
> 
> Bradley Smith wrote:
>> Jacco Bikker wrote several raytracing articles on DevMaster.net. I took his third article and ported it to D. I was surprised to find that the D code is approx. 4 times slower than C++.
>>
>> The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders in approx. 5 sec. I am using the DMD and DMC compilers on Windows.
>>
>> How can the D code be made to run faster?
>>
>> Thanks,
>>   Bradley
>>
January 18, 2007
Dave wrote:
> Bradley Smith wrote:
>> Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp.
>>
>> Here are the changes I've made. Attached is the new code.
>>
>>   Call RegisterClass outside of assert. (Broken if -release used)
>>   Apply -release option. (Increases speed in an unknown way)
>>   Converted templates to regular functions. (Templates not being inlined)
> 
> Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.

I changed a bunch parameters to inout after discovering that it made a difference for the Intersect method.  It could be that I had the template parameters as inout at the time when getting rid of the templates seemed to make a difference.

That's evil that inout disables inlining.
Seems like inout params would be easier to inline than regular parameters, but I guess not.

--bb
January 18, 2007
%u wrote:
> == Quote from Bill Baxter (dnewsgroup@billbaxter.com)'s article
>> I noticed that it doesn't work properly with -release add to the
>> compiler flags.
> That is because in testapp.d the call of RegisterClass is put into
> an assertion.
> 
> On my machine the -release flag brings another 25%.
> 
>> The inout on the Ray parameter and the other changes to this
>> function alone change my D runtime from 22 sec to 15 sec.
> 
> The compiler should be smart enough to detect, that the Ray
> parameter is not used as an lvalue and thus can be replaced by a
> reference.

No, it can't.. Passing a struct by ref will result in unexpected behavior if it changes in some other thread. As always, the default should be safe no matter what, and that means copying the struct's contents.

I guess a new modifier like "byref" is the only option..

L.