Thread overview | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
November 07, 2007 Raytracing speed again, this time with gdc. | ||||
---|---|---|---|---|
| ||||
I've tried to translate the attached C++ program to D (D program also attached). Even after explicitly using the __builtin_sqrt (which correctly generates a fsqrt instruction (shame on you, non-inlined std.math)), the D code is significantly slower (12.9s for D vs 9s for C++). Does anybody know why this is so? --downs, confused and saddened PS: benchmark gentoo-pc ~/d/RayBen $ gdc ray1.d -O3 -frelease -ffast-math -o ray1_d tools/base.d && time ./ray1_d >result_d.pnm; g++ ray1.cxx -O3 -ffast-math -o ray1_cpp && time ./ray1_cpp > result_cxx.pnm real 0m13.448s user 0m12.730s sys 0m0.090s real 0m10.128s user 0m9.810s sys 0m0.020s |
November 07, 2007 Re: Raytracing speed again, this time with gdc. | ||||
---|---|---|---|---|
| ||||
Posted in reply to downs | "downs" <default_357-line@yahoo.de> wrote in message news:fgskq5$1p5n$1@digitalmars.com... > writefln("P5\n", n, " ", n, "\n255"); > for (int y=n-1; y>=0; --y) > for (int x=0; x<n; ++x) { > double g=0; > for (int d=0; d<ss*ss; ++d) { > auto dir=Vec(x+(d%ss)*1.0/ss-n/2.0, y+(d/ss)*1.0/ss-n/2.0, > n).unitise(); > g += ray_trace(light, Ray(Vec(0, 0, -4), dir), s); > } > printf("%c", cast(ubyte)(0.5 + 255.0 * g / (ss*ss))); > } > } what kind of output is this? how do you view result? |
November 07, 2007 Re: Raytracing speed again, this time with gdc. | ||||
---|---|---|---|---|
| ||||
Posted in reply to dominik | dominik wrote: > "downs" <default_357-line@yahoo.de> wrote in message news:fgskq5$1p5n$1@digitalmars.com... >> writefln("P5\n", n, " ", n, "\n255"); >> for (int y=n-1; y>=0; --y) >> for (int x=0; x<n; ++x) { >> double g=0; >> for (int d=0; d<ss*ss; ++d) { >> auto dir=Vec(x+(d%ss)*1.0/ss-n/2.0, y+(d/ss)*1.0/ss-n/2.0, >> n).unitise(); >> g += ray_trace(light, Ray(Vec(0, 0, -4), dir), s); >> } >> printf("%c", cast(ubyte)(0.5 + 255.0 * g / (ss*ss))); >> } >> } > > what kind of output is this? how do you view result? > > PGM. Gwenview and Konqueror can display it, as well as any decent image viewer. http://en.wikipedia.org/wiki/Portable_pixmap --downs |
November 08, 2007 Re: Raytracing speed again, this time with gdc. | ||||
---|---|---|---|---|
| ||||
Posted in reply to downs | downs wrote:
> I've tried to translate the attached C++ program to D (D program also
> attached).
>
> Even after explicitly using the __builtin_sqrt (which correctly
> generates a fsqrt instruction (shame on you, non-inlined std.math)), the
> D code is significantly slower (12.9s for D vs 9s for C++).
>
> Does anybody know why this is so?
> --downs, confused and saddened
>
> PS: benchmark
>
> gentoo-pc ~/d/RayBen $ gdc ray1.d -O3 -frelease -ffast-math -o ray1_d
> tools/base.d && time ./ray1_d >result_d.pnm; g++ ray1.cxx -O3
> -ffast-math -o ray1_cpp && time ./ray1_cpp > result_cxx.pnm
>
> real 0m13.448s
> user 0m12.730s
> sys 0m0.090s
>
> real 0m10.128s
> user 0m9.810s
> sys 0m0.020s
>
did you try to profile both versions?
also try benchmarking without the IO.
|
November 08, 2007 Re: Raytracing speed again, this time with gdc. | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jascha Wetzel | Jascha Wetzel wrote: >> gentoo-pc ~/d/RayBen $ gdc ray1.d -O3 -frelease -ffast-math -o ray1_d tools/base.d && time ./ray1_d >result_d.pnm; g++ ray1.cxx -O3 -ffast-math -o ray1_cpp && time ./ray1_cpp > result_cxx.pnm >> >> real 0m13.448s >> user 0m12.730s >> sys 0m0.090s >> >> real 0m10.128s >> user 0m9.810s >> sys 0m0.020s >> > > did you try to profile both versions? > also try benchmarking without the IO. Thanks for the input, but that's not it. The IO of 262K cannot possibly account for three seconds of difference. Just for the books, same result without the printf call. Here's the first three lines of each's profile run. D version: time seconds seconds calls s/call s/call name 77.27 5.10 5.10 6164146 0.00 0.00 void ray1.Group.intersect(inout typedef ray1.Hit, inout struct ray1.Ray, void*) 15.15 6.10 1.00 38551270 0.00 0.00 void ray1.Sphere.intersect(inout typedef ray1.Hit, inout struct ray1.Ray, void*) 4.55 6.40 0.30 1 0.30 6.56 _Dmain All: 5.25s C++ version: time seconds seconds calls ms/call ms/call name 75.18 4.12 4.12 6164146 0.00 0.00 Group::intersect(std::pair<double, Vec>&, Ray const&) const 19.34 5.18 1.06 38551282 0.00 0.00 Sphere::intersect(std::pair<double, Vec>&, Ray const&) const 4.74 5.44 0.26 main All: 6.57s --downs |
November 08, 2007 Re: Raytracing speed again, this time with gdc. -- "oops" | ||||
---|---|---|---|---|
| ||||
Posted in reply to downs | downs wrote:
> D version:
> time seconds seconds calls s/call s/call name
> 77.27 5.10 5.10 6164146 0.00 0.00 void ray1.Group.intersect(inout typedef ray1.Hit, inout struct ray1.Ray, void*)
> 15.15 6.10 1.00 38551270 0.00 0.00 void ray1.Sphere.intersect(inout typedef ray1.Hit, inout struct ray1.Ray, void*)
> 4.55 6.40 0.30 1 0.30 6.56 _Dmain
> All: 5.25s
Okay, now that I look at it again, some of those values smack me as distinctly .. impossible. I'll redo the profiling.
--downs
|
November 08, 2007 Re: Raytracing speed again, this time with gdc. -- "oops" remedied | ||||
---|---|---|---|---|
| ||||
Posted in reply to downs | Here we go again. D version. Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 73.09 5.08 5.08 6164146 0.00 0.00 void ray1.Group.intersect(inout typedef ray1.Hit, inout struct ray1.Ray, void*) 19.57 6.44 1.36 38551270 0.00 0.00 void ray1.Sphere.intersect(inout typedef ray1.Hit, inout struct ray1.Ray, void*) 3.74 6.70 0.26 1 0.26 6.88 _Dmain 2.59 6.88 0.18 4194304 0.00 0.00 double ray1.ray_trace(inout struct ray1.Vec, inout struct ray1.Ray, class ray1.Scene) 0.43 6.91 0.03 struct gcx.Pool* gcx.Gcx.findPool(void*, void*) 0.14 6.92 0.01 void* gcx.GC.mallocNoSync(uint, void*) 0.14 6.93 0.01 uint gcbits.GCBits.test(uint, void*) 0.14 6.94 0.01 _d_callfinalizer 0.14 6.95 0.01 _d_newclass 0.00 6.95 0.00 1 0.00 0.00 void ray1._staticCtor1() 0.00 6.95 0.00 1 0.00 0.00 class ray1.Scene ray1.create(int, inout struct ray1.Vec, double) 0.00 6.95 0.00 1 0.00 0.00 void ray1.__modinit() C++ version. Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 76.88 4.19 4.19 6164146 0.00 0.00 Group::intersect(std::pair<double, Vec>&, Ray const&) const 18.35 5.19 1.00 38551282 0.00 0.00 Sphere::intersect(std::pair<double, Vec>&, Ray const&) const 3.30 5.37 0.18 main 1.28 5.44 0.07 4194304 0.00 0.00 ray_trace(Vec const&, Ray const&, Scene const&) 0.18 5.45 0.01 1 10.00 10.00 create(int, Vec const&, double) 0.00 5.45 0.00 87381 0.00 0.00 Sphere::~Sphere() 0.00 5.45 0.00 1 0.00 0.00 global constructors keyed to real 0.00 5.45 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 5.45 0.00 1 0.00 0.00 Group::~Group() |
November 08, 2007 Re: Raytracing speed again, this time with gdc. -- "oops" remedied | ||||
---|---|---|---|---|
| ||||
Posted in reply to downs | downs wrote:
> Here we go again.
>
> D version.
>
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls s/call s/call name
> 73.09 5.08 5.08 6164146 0.00 0.00 void ray1.Group.intersect(inout typedef ray1.Hit, inout struct ray1.Ray, void*)
> 19.57 6.44 1.36 38551270 0.00 0.00 void ray1.Sphere.intersect(inout typedef ray1.Hit, inout struct ray1.Ray, void*)
> 3.74 6.70 0.26 1 0.26 6.88 _Dmain
> 2.59 6.88 0.18 4194304 0.00 0.00 double ray1.ray_trace(inout struct ray1.Vec, inout struct ray1.Ray, class ray1.Scene)
> 0.43 6.91 0.03 struct gcx.Pool* gcx.Gcx.findPool(void*, void*)
> 0.14 6.92 0.01 void* gcx.GC.mallocNoSync(uint, void*)
> 0.14 6.93 0.01 uint gcbits.GCBits.test(uint, void*)
> 0.14 6.94 0.01 _d_callfinalizer
> 0.14 6.95 0.01 _d_newclass
> 0.00 6.95 0.00 1 0.00 0.00 void ray1._staticCtor1()
> 0.00 6.95 0.00 1 0.00 0.00 class ray1.Scene ray1.create(int, inout struct ray1.Vec, double)
> 0.00 6.95 0.00 1 0.00 0.00 void ray1.__modinit()
>
> C++ version.
>
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 76.88 4.19 4.19 6164146 0.00 0.00 Group::intersect(std::pair<double, Vec>&, Ray const&) const
> 18.35 5.19 1.00 38551282 0.00 0.00 Sphere::intersect(std::pair<double, Vec>&, Ray const&) const
> 3.30 5.37 0.18 main
> 1.28 5.44 0.07 4194304 0.00 0.00 ray_trace(Vec const&, Ray const&, Scene const&)
> 0.18 5.45 0.01 1 10.00 10.00 create(int, Vec const&, double)
> 0.00 5.45 0.00 87381 0.00 0.00 Sphere::~Sphere()
> 0.00 5.45 0.00 1 0.00 0.00 global constructors keyed to real
> 0.00 5.45 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int)
> 0.00 5.45 0.00 1 0.00 0.00 Group::~Group()
in Sphere.intersect, instead of
hit = Hit(lambda, (ray.orig + lambda*ray.dir - center).unitise);
try
hit.first = lamdba;
hit.second = ...
it might not optimize away the creation on stack + copy.
now i'd look at the asm code for ray_sphere in both versions, since that seems to where the time is burnt.
also curious: why does the C++ version call Sphere::intersect 12 times more often than the D version?
|
November 08, 2007 Re: Raytracing speed again, this time with gdc. -- "oops" remedied | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jascha Wetzel | Jascha Wetzel wrote: > downs wrote: >> Here we go again. >> >> D version. >> >> Each sample counts as 0.01 seconds. >> % cumulative self self total >> time seconds seconds calls s/call s/call name >> 73.09 5.08 5.08 6164146 0.00 0.00 void >> ray1.Group.intersect(inout typedef ray1.Hit, inout struct ray1.Ray, >> void*) >> 19.57 6.44 1.36 38551270 0.00 0.00 void >> ray1.Sphere.intersect(inout typedef ray1.Hit, inout struct ray1.Ray, >> void*) >> 3.74 6.70 0.26 1 0.26 6.88 _Dmain >> 2.59 6.88 0.18 4194304 0.00 0.00 double >> ray1.ray_trace(inout struct ray1.Vec, inout struct ray1.Ray, class >> ray1.Scene) >> 0.43 6.91 0.03 struct gcx.Pool* >> gcx.Gcx.findPool(void*, void*) >> 0.14 6.92 0.01 void* >> gcx.GC.mallocNoSync(uint, void*) >> 0.14 6.93 0.01 uint >> gcbits.GCBits.test(uint, void*) >> 0.14 6.94 0.01 _d_callfinalizer >> 0.14 6.95 0.01 _d_newclass >> 0.00 6.95 0.00 1 0.00 0.00 void >> ray1._staticCtor1() >> 0.00 6.95 0.00 1 0.00 0.00 class ray1.Scene >> ray1.create(int, inout struct ray1.Vec, double) >> 0.00 6.95 0.00 1 0.00 0.00 void >> ray1.__modinit() >> >> C++ version. >> >> Each sample counts as 0.01 seconds. >> % cumulative self self total >> time seconds seconds calls ms/call ms/call name >> 76.88 4.19 4.19 6164146 0.00 0.00 >> Group::intersect(std::pair<double, Vec>&, Ray const&) const >> 18.35 5.19 1.00 38551282 0.00 0.00 >> Sphere::intersect(std::pair<double, Vec>&, Ray const&) const >> 3.30 5.37 0.18 main >> 1.28 5.44 0.07 4194304 0.00 0.00 ray_trace(Vec >> const&, Ray const&, Scene const&) >> 0.18 5.45 0.01 1 10.00 10.00 create(int, Vec >> const&, double) >> 0.00 5.45 0.00 87381 0.00 0.00 Sphere::~Sphere() >> 0.00 5.45 0.00 1 0.00 0.00 global >> constructors keyed to real >> 0.00 5.45 0.00 1 0.00 0.00 >> __static_initialization_and_destruction_0(int, int) >> 0.00 5.45 0.00 1 0.00 0.00 Group::~Group() > > in Sphere.intersect, instead of > hit = Hit(lambda, (ray.orig + lambda*ray.dir - center).unitise); > try > hit.first = lamdba; > hit.second = ... > it might not optimize away the creation on stack + copy. > I just tried that. Negligible change. > now i'd look at the asm code for ray_sphere in both versions, since that seems to where the time is burnt. > > also curious: why does the C++ version call Sphere::intersect 12 times more often than the D version? I'd like to know that as well. Far as I know, the algorithm's the same. I'll add debugging code to find out. --downs |
November 08, 2007 Re: Raytracing speed again, this time with gdc. -- "oops" remedied | ||||
---|---|---|---|---|
| ||||
Posted in reply to downs | downs wrote:
> Jascha Wetzel wrote:
>> also curious: why does the C++ version call Sphere::intersect 12 times more often than the D version?
>
> I'd like to know that as well. Far as I know, the algorithm's the same.
> I'll add debugging code to find out.
> --downs
Okay, it's some weird floating point effect around this line: "disc=b*b - v.dot(v) + radius*radius;".
It only happens in very specific conditions; x=430, y=379, d=11.
So whatever it is, responsible for three seconds it is not.
--downs
|
Copyright © 1999-2021 by the D Language Foundation