February 15, 2008 Re: Returning large structs == no difference | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | downs wrote:
> downs wrote:
>> I rewrote my version for freestanding functions .. 9.5s :confused: Why do struct members (which are inlined, I checked) take such a speed hit?
>>
>
> My version had a bug. x__X
>
> The correct version takes 11.2s again.
>
> --downs
If I fix the bug, the 'external function' version is exactly as fast as the opFoo version.
Sorry.
I think the 8s version posted earlier has a similar bug.
Look at the output. :)
-- downs
| |||
February 15, 2008 narrowed down the problem area | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | I've been playing around with the 8-9s version posted earlier. The problem seems to lie in ray_sphere. Strangely, Vec v = void; Vec.sub(center, ray.orig, v); runs in 8.8s, producing a correct output once the printf at the bottom has been fixed, but Vec v = center - ray.orig; runs in 11.1s. Still investigating why this happens. --downs | |||
February 15, 2008 Re: narrowed down the problem area | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | downs wrote: > I've been playing around with the 8-9s version posted earlier. > > The problem seems to lie in ray_sphere. > > Strangely, Vec v = void; Vec.sub(center, ray.orig, v); runs in 8.8s, producing a correct output once the printf at the bottom has been fixed, > but Vec v = center - ray.orig; runs in 11.1s. > > Still investigating why this happens. > > --downs Okay, found the cause, if not the reason, by looking at the assembler output. For some reason, the bad case, although inlined, stores its values back into memory. The fast case keeps working with them. Here's the disassembly for ray_sphere for both cases: slow (opSub) http://paste.dprogramming.com/dpcds3p3 fast http://paste.dprogramming.com/dpd6pi8n So it comes down to a GDC FP "bug". I think changing to 4.2 or 4.3 might help. Does anybody have an up-to-date version of the 4.2.x patch? --downs | |||
February 15, 2008 Re: narrowed down the problem area | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | downs wrote:
>> Strangely, Vec v = void; Vec.sub(center, ray.orig, v); runs in 8.8s, producing a correct output once the printf at the bottom has been fixed,
>> but Vec v = center - ray.orig; runs in 11.1s.
>
> For some reason, the bad case, although inlined, stores its values back into memory. The fast case keeps working with them.
>
> So it comes down to a GDC FP "bug". I think changing to 4.2 or 4.3 might help. Does anybody have an up-to-date version of the 4.2.x patch?
Hey good deal on figuring this out! It's good to know, especially for those of us using D for real-time simulation type stuff.
Is there really a GDC that compiles against gcc >= 4.2?!
| |||
February 15, 2008 Re: narrowed down the problem area | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Tim Burrell | Tim Burrell wrote:
> downs wrote:
>>> Strangely, Vec v = void; Vec.sub(center, ray.orig, v); runs in 8.8s, producing a correct output once the printf at the bottom has been fixed,
>>> but Vec v = center - ray.orig; runs in 11.1s.
>> For some reason, the bad case, although inlined, stores its values back into memory. The fast case keeps working with them.
>>
>> So it comes down to a GDC FP "bug". I think changing to 4.2 or 4.3 might help. Does anybody have an up-to-date version of the 4.2.x patch?
>
> Hey good deal on figuring this out! It's good to know, especially for those of us using D for real-time simulation type stuff.
>
> Is there really a GDC that compiles against gcc >= 4.2?!
I'm not sure; I remember somebody saying he'd managed to build it. And there's a post on d.gnu from somebody saying he'd gotten it to work, although he couldn't build phobos.
Since GDC seems to be .. inert at the moment, it'd probably up to some volunteer effort to upgrade it to 4.[23]. That, or get llvmdc up to speed.
Myself of course is mostly clueless about both compilers. :/
--downs
| |||
February 15, 2008 Re: D slower than C++ by a factor of _two_ for simple raytracer (gdc) | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | With a little bit of commenting, this could be an excellent tutorial. | |||
February 15, 2008 Re: narrowed down the problem area | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | downs wrote: > Tim Burrell wrote: >> downs wrote: >>>> Strangely, Vec v = void; Vec.sub(center, ray.orig, v); runs in 8.8s, producing a correct output once the printf at the bottom has been fixed, >>>> but Vec v = center - ray.orig; runs in 11.1s. >>> For some reason, the bad case, although inlined, stores its values back into memory. The fast case keeps working with them. >>> >>> So it comes down to a GDC FP "bug". I think changing to 4.2 or 4.3 might help. Does anybody have an up-to-date version of the 4.2.x patch? >> >> Hey good deal on figuring this out! It's good to know, especially for those of us using D for real-time simulation type stuff. >> >> Is there really a GDC that compiles against gcc >= 4.2?! > > I'm not sure; I remember somebody saying he'd managed to build it. And there's a post on d.gnu from somebody saying he'd gotten it to work, although he couldn't build phobos. > > Since GDC seems to be .. inert at the moment, it'd probably up to some volunteer effort to upgrade it to 4.[23]. That, or get llvmdc up to speed. > > Myself of course is mostly clueless about both compilers. :/ I notice that the Ubuntu team appears to have a working 4.2 based gdc that the changelog also says works with 4.3: http://packages.ubuntu.com/hardy/devel/gdc-4.2 Changelog is here: http://changelogs.ubuntu.com/changelogs/pool/universe/g/gdc-4.2/gdc-4.2_0.25-4.2.3-0ubuntu1/changelog It'd be really nice to see a new gdc release! I wonder if David even knows about these patches!? | |||
February 15, 2008 Re: narrowed down the problem area | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | downs wrote:
> Here's the disassembly for ray_sphere for both cases:
>
> slow (opSub)
>
> http://paste.dprogramming.com/dpcds3p3
>
> fast
>
> http://paste.dprogramming.com/dpd6pi8n
>
> So it comes down to a GDC FP "bug". I think changing to 4.2 or 4.3 might help. Does anybody have an up-to-date version of the 4.2.x patch?
>
> --downs
Especially interesting to note (slow case):
fstpl -24(%ebp)
[...]
movl -24(%ebp), %eax
movl %eax, -48(%ebp)
movl -20(%ebp), %eax
movl %eax, -44(%ebp)
Translation:
Store floating-point number to ebp[-24]. No, wait, move it to ebp[-48].
This indicates a pretty serious problem with optimization, since the whole thing is basically redundant.
The "fast" version doesn't have any memory writes at all during the computation.
--downs
| |||
February 15, 2008 Re: narrowed down the problem area | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | downs wrote:
> Especially interesting to note (slow case):
>
> fstpl -24(%ebp)
> [...]
> movl -24(%ebp), %eax
> movl %eax, -48(%ebp)
> movl -20(%ebp), %eax
> movl %eax, -44(%ebp)
>
> Translation:
> Store floating-point number to ebp[-24]. No, wait, move it to ebp[-48].
I left something out.
fstpl -24(%ebp)
[...]
movl -24(%ebp), %eax
movl %eax, -48(%ebp)
movl -20(%ebp), %eax
movl %eax, -44(%ebp)
[...]
fldl -48(%ebp)
So, the whole thing comes down to "Store FP number to memory. No wait, move it somewhere else! No wait, read it back!"
No wonder it's slow.
| |||
February 15, 2008 Re: Returning large structs == bad | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | downs wrote:
> No difference. But then why the obvious speed difference? Color me confused ._.
Test to see if the stack is aligned, i.e. if the doubles start on 16 byte address boundaries.
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply