Thread overview |
---|
December 23, 2010 DMD2 out parameters | ||||
---|---|---|---|---|
| ||||
Hi, I'm not sure if this is already a widely known phenomenon but I ran across a little gotcha yesterday regarding floating point out parameters using DMD2. A year or so ago I wrote a ray tracer using DMD1. A few months ago I tried compiling and running it using DMD2. It was 50% slower. This disappointed me so much that I stopped using D2 until about a week ago. I spent a few hours yesterday investigating why the D2 version of the code was so much worse than the D1 version. After some head scratching and use of -profile and objconv, I eventually managed to isolate the problem. It boiled down to this example: float f; func(f); void func(out float ff) { ff = 1; } This use of 'out' causes func to execute in around 250 ticks on DMD2. Change 'out' to 'ref' and it takes around 10 ticks (the same time as the 'out' version executes on DMD1). If you initialise f to 0 before calling func then it all works quickly again which makes me wonder whether it's some strange DMD2 nan/fpu exceptions quirk which may be documented somewhere?? When I looked at the generated assembly I saw that both DMD1 and DMD2 seem to generate the same thing (using -O -inline - release): func LABEL NEAR push ebp mov ebp, esp push eax // eax = ptr to ff fld dword ptr [_nan] fstp dword ptr [eax] fld dword ptr [_one] fstp dword ptr [eax] mov esp, ebp pop ebp ret Now this code looks ok if you ignore the fact that 'ff' is being written to twice. And the strange seemingly redundant push of EAX. Has anyone else come across this and if so is it a bug? I'm also interested in people's thoughts on the strange code gen. My D2 version is now running faster than the old D1 version by the way :) Regards, Pete. |
December 23, 2010 Re: DMD2 out parameters | ||||
---|---|---|---|---|
| ||||
Posted in reply to Pete | //If you initialise f to 0 before calling func then it all works quickly again Actually I think this is a red herring. I don't think initialising f helps |
December 23, 2010 Re: DMD2 out parameters | ||||
---|---|---|---|---|
| ||||
Posted in reply to Pete | Ok, i've done some more investigating and it appears that in DMD2 a float NaN is 0x7FE00000 (in dword format) but when it initialises a float 'out' parameter it initialises it with 0x7FA00000H. This causes an FPU trap which is where the time is going. This looks like a bug to me. Can anyone confirm? Thanks. |
December 23, 2010 Re: DMD2 out parameters | ||||
---|---|---|---|---|
| ||||
Posted in reply to Pete | Pete wrote:
> Ok, i've done some more investigating and it appears that in DMD2 a float NaN is
> 0x7FE00000 (in dword format) but when it initialises a float 'out' parameter it
> initialises it with 0x7FA00000H. This causes an FPU trap which is where the time
> is going. This looks like a bug to me. Can anyone confirm?
>
> Thanks.
Yes, it sounds like a NaN-related peformance issue. Note, though, that the slowdown you experience is processor-model specific. It's a penalty of ~250 cycles on a Pentium 4 with x87 instructions, but zero cycles on many other processors. (in fact, it's also zero cycles with SSE on Pentium 4!).
|
December 23, 2010 Re: DMD2 out parameters | ||||
---|---|---|---|---|
| ||||
Posted in reply to Pete | On 12/23/2010 12:19 PM, Pete wrote:
> Ok, i've done some more investigating and it appears that in DMD2 a float NaN is
> 0x7FE00000 (in dword format) but when it initialises a float 'out' parameter it
> initialises it with 0x7FA00000H. This causes an FPU trap which is where the time
> is going. This looks like a bug to me. Can anyone confirm?
>
> Thanks.
I just did a test with DMD 2.051 on Linux
void F1(ref float a)
{
a++;
}
void F2(out float a)
{
a++;
}
void main()
{
float a;
float b;
F1(a);
F2(b);
}
And ASM:
080490e4 <_D3out2F1FKfZv>:
80490e4: 55 push ebp
80490e5: 8b ec mov ebp,esp
80490e7: 83 ec 04 sub esp,0x4
80490ea: d9 e8 fld1
80490ec: d8 00 fadd DWORD PTR [eax]
80490ee: d9 18 fstp DWORD PTR [eax]
80490f0: c9 leave
80490f1: c3 ret
80490f2: 90 nop
80490f3: 90 nop
080490f4 <_D3out2F2FJfZv>:
80490f4: 55 push ebp
80490f5: 8b ec mov ebp,esp
80490f7: 83 ec 04 sub esp,0x4
80490fa: d9 05 00 81 05 08 fld DWORD PTR ds:0x8058100
8049100: d9 18 fstp DWORD PTR [eax]
8049102: d9 e8 fld1
8049104: d8 00 fadd DWORD PTR [eax]
8049106: d9 18 fstp DWORD PTR [eax]
8049108: c9 leave
8049109: c3 ret
804910a: 90 nop
804910b: 90 nop
0804910c <_Dmain>:
804910c: 55 push ebp
804910d: 8b ec mov ebp,esp
804910f: 83 ec 08 sub esp,0x8
8049112: d9 05 00 81 05 08 fld DWORD PTR ds:0x8058100
8049118: d9 5d f8 fstp DWORD PTR [ebp-0x8]
804911b: d9 05 00 81 05 08 fld DWORD PTR ds:0x8058100
8049121: d9 5d fc fstp DWORD PTR [ebp-0x4]
8049124: 8d 45 f8 lea eax,[ebp-0x8]
8049127: e8 b8 ff ff ff call 80490e4 <_D3out2F1FKfZv>
804912c: 8d 45 fc lea eax,[ebp-0x4]
804912f: e8 c0 ff ff ff call 80490f4 <_D3out2F2FJfZv>
8049134: 31 c0 xor eax,eax
8049136: c9 leave
8049137: c3 ret
And 0x8058100 is 0x7FA00000. As you can see out doesn't force the loading and storing of a different NaN value.
Of course, maybe the compiler should skip initializing a float that gets passed into a routine as an out parameter as its first use. E.g.
float a;
a = 1.0;
wouldn't generate two separate assignments.
|
Copyright © 1999-2021 by the D Language Foundation