February 16, 2008
downs <default_357-line@yahoo.de> wrote:
> For some reason, the bad case, although inlined, stores its values back into memory. The fast case keeps working with them.
> 
> Here's the disassembly for ray_sphere for both cases:
> 
> slow (opSub)
> 
> http://paste.dprogramming.com/dpcds3p3
> 
> fast
> 
> http://paste.dprogramming.com/dpd6pi8n
> 
> So it comes down to a GDC FP "bug". I think changing to 4.2 or 4.3 might help. Does anybody have an up-to-date version of the 4.2.x patch?

I'm trying to investigate this issue, too.  I'm comparing the C++ code generated by Visual C Express 2005, and GDC 0.24 based on GCC 3.4.5 and DMD 1.020.  Here's the commented out comparison of unitise() function:

http://paste.dprogramming.com/dpl9p4pt

As you can see, the code is very close.  But the static opCall() which initializes the by-value return struct is not inlined, and therefore not optimized out.  So there is an additional call and extra copying of already calculated values.  If not that, the code would be nearly identical.

-- 
SnakE
February 16, 2008
Sergey Gromov <snake.scaly@gmail.com> wrote:
> I'm trying to investigate this issue, too.  I'm comparing the C++ code generated by Visual C Express 2005, and GDC 0.24 based on GCC 3.4.5 and DMD 1.020.  Here's the commented out comparison of unitise() function:

Continuing investigation.  Here are raw results:

>make-cpp-gcc.cmd
gcc -c -O3 -fomit-frame-pointer -fweb -finline-functions ray-cpp.cpp gcc ray-cpp.o -o ray-cpp.exe -lstdc++

>test-cpp.cmd
ray-cpp  1>ray-cpp.pbm
10968

>make-d.cmd
gdc -c -O3 -fomit-frame-pointer -fweb -frelease -finline-functions ray-
d.d
gdc ray-d.o -o ray-d.exe

>test-d.cmd
ray-d  1>ray-d.pbm
10828

The numbers printed by tests are milliseconds.  As you can see, the D version is slightly faster.  The outputs are identical.

C++ and D program is here, respectively: http://paste.dprogramming.com/dpaftqa2 http://paste.dprogramming.com/dptiniar

The only change in C++ is the time output at the end of the main().  D program is refactored so that all struct manipulations happen in-place, without passing and returning by value.  GDC has troubles inlining static opCalls for some reason.

Microsoft's compiler produces FP/math code about 25% shorter than GCC/GDC in average, hence the results:

>make-cpp.cmd
cl -nologo -EHsc -Ox ray-cpp.cpp

>test-cpp.cmd
ray-cpp  1>ray-cpp.pbm
7656

-- 
SnakE
February 16, 2008
Sergey Gromov:
> D program is refactored so that all struct manipulations happen in-place, without passing and returning by value.  GDC has troubles inlining static opCalls for some reason.

Yep, you seem to have re-invented a fixed-size version of my TinyVector (I have added static opCalls yesterday, but I may have to remove them again).


> Microsoft's compiler produces FP/math code about 25% shorter than GCC/GDC in average

Nice.
Thank you for your experiments.

Timings of your code (that has a bug, see downs for a fixed version) on Win, Pentium3, best of 3 runs, image 256x256:

D DMD v.1.025:
bud -clean -O -release -inline rayD.d
15.8 seconds (memory deallocation too)

C++ MinGW based on GCC 4.2.1:
g++ -O3 -s rayCpp.cpp -o rayCpp0
9.42 s (memory deallocation too)

C++ MinGW (the same):
g++ -pipe -O3 -s -ffast-math -fomit-frame-pointer rayCpp.cpp -o rayCpp1
8.89 s (memory deallocation too)

C++ MinGW (the same):
g++ -pipe -O3 -s -ffast-math -fomit-frame-pointer -fprofile-generate rayCpp.cpp -o rayCpp2
g++ -pipe -O3 -s -ffast-math -fomit-frame-pointer -fprofile-use rayCpp.cpp -o rayCpp2
8.72 s (memory deallocation too)

I haven't tried GDC yet.

Bye,
bearophile
February 16, 2008
bearophile <bearophileHUGS@lycos.com> wrote:
> Sergey Gromov:
> > D program is refactored so that all struct manipulations happen in-place, without passing and returning by value.  GDC has troubles inlining static opCalls for some reason.
> 
> Yep, you seem to have re-invented a fixed-size version of my TinyVector (I have added static opCalls yesterday, but I may have to remove them again).

One of programmer's joys is to invent a wheel and pretend it's better than the others. ;)

> Timings of your code (that has a bug, see downs for a fixed version) on

The only bug I can see is printing out characters through text-mode Windows stdout which expands every 0xA into "\r\n".  This doesn't have any impact on the benchmark.

-- 
SnakE
1 2 3 4
Next ›   Last »