February 16, 2008 Re: narrowed down the problem area | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | downs <default_357-line@yahoo.de> wrote: > For some reason, the bad case, although inlined, stores its values back into memory. The fast case keeps working with them. > > Here's the disassembly for ray_sphere for both cases: > > slow (opSub) > > http://paste.dprogramming.com/dpcds3p3 > > fast > > http://paste.dprogramming.com/dpd6pi8n > > So it comes down to a GDC FP "bug". I think changing to 4.2 or 4.3 might help. Does anybody have an up-to-date version of the 4.2.x patch? I'm trying to investigate this issue, too. I'm comparing the C++ code generated by Visual C Express 2005, and GDC 0.24 based on GCC 3.4.5 and DMD 1.020. Here's the commented out comparison of unitise() function: http://paste.dprogramming.com/dpl9p4pt As you can see, the code is very close. But the static opCall() which initializes the by-value return struct is not inlined, and therefore not optimized out. So there is an additional call and extra copying of already calculated values. If not that, the code would be nearly identical. -- SnakE | |||
February 16, 2008 Re: narrowed down the problem area | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sergey Gromov | Sergey Gromov <snake.scaly@gmail.com> wrote: > I'm trying to investigate this issue, too. I'm comparing the C++ code generated by Visual C Express 2005, and GDC 0.24 based on GCC 3.4.5 and DMD 1.020. Here's the commented out comparison of unitise() function: Continuing investigation. Here are raw results: >make-cpp-gcc.cmd gcc -c -O3 -fomit-frame-pointer -fweb -finline-functions ray-cpp.cpp gcc ray-cpp.o -o ray-cpp.exe -lstdc++ >test-cpp.cmd ray-cpp 1>ray-cpp.pbm 10968 >make-d.cmd gdc -c -O3 -fomit-frame-pointer -fweb -frelease -finline-functions ray- d.d gdc ray-d.o -o ray-d.exe >test-d.cmd ray-d 1>ray-d.pbm 10828 The numbers printed by tests are milliseconds. As you can see, the D version is slightly faster. The outputs are identical. C++ and D program is here, respectively: http://paste.dprogramming.com/dpaftqa2 http://paste.dprogramming.com/dptiniar The only change in C++ is the time output at the end of the main(). D program is refactored so that all struct manipulations happen in-place, without passing and returning by value. GDC has troubles inlining static opCalls for some reason. Microsoft's compiler produces FP/math code about 25% shorter than GCC/GDC in average, hence the results: >make-cpp.cmd cl -nologo -EHsc -Ox ray-cpp.cpp >test-cpp.cmd ray-cpp 1>ray-cpp.pbm 7656 -- SnakE | |||
February 16, 2008 Re: narrowed down the problem area | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sergey Gromov | Sergey Gromov: > D program is refactored so that all struct manipulations happen in-place, without passing and returning by value. GDC has troubles inlining static opCalls for some reason. Yep, you seem to have re-invented a fixed-size version of my TinyVector (I have added static opCalls yesterday, but I may have to remove them again). > Microsoft's compiler produces FP/math code about 25% shorter than GCC/GDC in average Nice. Thank you for your experiments. Timings of your code (that has a bug, see downs for a fixed version) on Win, Pentium3, best of 3 runs, image 256x256: D DMD v.1.025: bud -clean -O -release -inline rayD.d 15.8 seconds (memory deallocation too) C++ MinGW based on GCC 4.2.1: g++ -O3 -s rayCpp.cpp -o rayCpp0 9.42 s (memory deallocation too) C++ MinGW (the same): g++ -pipe -O3 -s -ffast-math -fomit-frame-pointer rayCpp.cpp -o rayCpp1 8.89 s (memory deallocation too) C++ MinGW (the same): g++ -pipe -O3 -s -ffast-math -fomit-frame-pointer -fprofile-generate rayCpp.cpp -o rayCpp2 g++ -pipe -O3 -s -ffast-math -fomit-frame-pointer -fprofile-use rayCpp.cpp -o rayCpp2 8.72 s (memory deallocation too) I haven't tried GDC yet. Bye, bearophile | |||
February 16, 2008 Re: narrowed down the problem area | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | bearophile <bearophileHUGS@lycos.com> wrote: > Sergey Gromov: > > D program is refactored so that all struct manipulations happen in-place, without passing and returning by value. GDC has troubles inlining static opCalls for some reason. > > Yep, you seem to have re-invented a fixed-size version of my TinyVector (I have added static opCalls yesterday, but I may have to remove them again). One of programmer's joys is to invent a wheel and pretend it's better than the others. ;) > Timings of your code (that has a bug, see downs for a fixed version) on The only bug I can see is printing out characters through text-mode Windows stdout which expands every 0xA into "\r\n". This doesn't have any impact on the benchmark. -- SnakE | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply