December 11, 2009
Walter Bright wrote:
> Don wrote:
>> I had a further look at this. The compiler *is* creating doubles and floats as signalling NaNs. Turns out, that there are slight differences between processors in the way they treat signalling NaNs, especially between Intel vs AMD. Intel Core2 triggers SNANs when loading floats & doubles, but *doesn't* trigger for 80-bit SNANs. The Pentium M that I did most of my testing on, didn't trigger for any of them. AMD's docs say that it triggers for all of them.
>> Won't be too hard to fix.
> 
> 
> How do we fix the CPU? ;-)

Yeah. Actually the CPU problem is an accepts-invalid bug. It worked on my Pentium M, but it shouldn't have.
The problem is what DMD does to the "uninitialized assignments".

float x;

gets changed into

float x = double.snan;

and is implemented with
fld float.snan; fstp x;

The FLD is triggering the snan. They should be changed into mov EAX, reinterpret_cast<int>(float.snan); mov x, EAX;

There's another reason for doing this. On Pentium 4, x87 NaNs are incredibly slow. More than 250 cycles!!! On AMD and on Pentium 4 SSE2, they are the same as any other value (about 0.5 cycles). Yet another reason to hate the P4. But still, this is such a horrific performance killer that we ought to avoid it.
December 11, 2009
Hello Walter,

>>> How do we fix the CPU? ;-)
>>> 
> I was thinking 220VAC might help!
> 

That one way to be totally sure what is wrong with your CPU.


December 11, 2009
Don wrote:
> Yeah. Actually the CPU problem is an accepts-invalid bug. It worked on my Pentium M, but it shouldn't have.
> The problem is what DMD does to the "uninitialized assignments".
> 
> float x;
> 
> gets changed into
> 
> float x = double.snan;
> 
> and is implemented with
> fld float.snan; fstp x;
> 
> The FLD is triggering the snan. They should be changed into mov EAX, reinterpret_cast<int>(float.snan); mov x, EAX;

Sounds like a good idea.

> There's another reason for doing this. On Pentium 4, x87 NaNs are incredibly slow. More than 250 cycles!!! On AMD and on Pentium 4 SSE2, they are the same as any other value (about 0.5 cycles). Yet another reason to hate the P4. But still, this is such a horrific performance killer that we ought to avoid it.

I had no idea that was the case!
December 12, 2009
Walter Bright wrote:
> Don wrote:
>> Yeah. Actually the CPU problem is an accepts-invalid bug. It worked on my Pentium M, but it shouldn't have.
>> The problem is what DMD does to the "uninitialized assignments".
>>
>> float x;
>>
>> gets changed into
>>
>> float x = double.snan;
>>
>> and is implemented with
>> fld float.snan; fstp x;
>>
>> The FLD is triggering the snan. They should be changed into mov EAX, reinterpret_cast<int>(float.snan); mov x, EAX;
> 
> Sounds like a good idea.
> 
>> There's another reason for doing this. On Pentium 4, x87 NaNs are incredibly slow. More than 250 cycles!!! On AMD and on Pentium 4 SSE2, they are the same as any other value (about 0.5 cycles). Yet another reason to hate the P4. But still, this is such a horrific performance killer that we ought to avoid it.
> 
> I had no idea that was the case!

I only just discovered it. Every documentation I've seen just said "These [cycle count] values are for normal operands. NaNs, infinities, and denormals may increase cycle counts considerably." I found a blog of someone who'd actually measured it.

December 12, 2009
On Fri, Dec 11, 2009 at 9:34 PM, Don <nospam@nospam.com> wrote:
> Walter Bright wrote:
>>
>> Don wrote:
>>>
>>> Yeah. Actually the CPU problem is an accepts-invalid bug. It worked on my
>>> Pentium M, but it shouldn't have.
>>> The problem is what DMD does to the "uninitialized assignments".
>>>
>>> float x;
>>>
>>> gets changed into
>>>
>>> float x = double.snan;
>>>
>>> and is implemented with
>>> fld float.snan; fstp x;
>>>
>>> The FLD is triggering the snan. They should be changed into mov EAX, reinterpret_cast<int>(float.snan); mov x, EAX;
>>
>> Sounds like a good idea.
>>
>>> There's another reason for doing this. On Pentium 4, x87 NaNs are incredibly slow. More than 250 cycles!!! On AMD and on Pentium 4 SSE2, they are the same as any other value (about 0.5 cycles). Yet another reason to hate the P4. But still, this is such a horrific performance killer that we ought to avoid it.
>>
>> I had no idea that was the case!
>
> I only just discovered it. Every documentation I've seen just said "These [cycle count] values are for normal operands. NaNs, infinities, and denormals may increase cycle counts considerably." I found a blog of someone who'd actually measured it.

I experienced it in a fluid sim I was working on in grad school.  NaNs were creeping in and performance was terrible.  I thought it was two problems till I got rid of the NaNs and suddenly performance was ok too.

--bb
December 12, 2009
Bill Baxter wrote:
> On Fri, Dec 11, 2009 at 9:34 PM, Don <nospam@nospam.com> wrote:
>> Walter Bright wrote:
>>> Don wrote:
>>>> Yeah. Actually the CPU problem is an accepts-invalid bug. It worked on my
>>>> Pentium M, but it shouldn't have.
>>>> The problem is what DMD does to the "uninitialized assignments".
>>>>
>>>> float x;
>>>>
>>>> gets changed into
>>>>
>>>> float x = double.snan;
>>>>
>>>> and is implemented with
>>>> fld float.snan; fstp x;
>>>>
>>>> The FLD is triggering the snan. They should be changed into mov EAX,
>>>> reinterpret_cast<int>(float.snan); mov x, EAX;
>>> Sounds like a good idea.
>>>
>>>> There's another reason for doing this. On Pentium 4, x87 NaNs are
>>>> incredibly slow. More than 250 cycles!!! On AMD and on Pentium 4 SSE2, they
>>>> are the same as any other value (about 0.5 cycles). Yet another reason to
>>>> hate the P4. But still, this is such a horrific performance killer that we
>>>> ought to avoid it.
>>> I had no idea that was the case!
>> I only just discovered it. Every documentation I've seen just said "These
>> [cycle count] values are for normal operands. NaNs, infinities, and
>> denormals may increase cycle counts considerably." I found a blog of someone
>> who'd actually measured it.
> 
> I experienced it in a fluid sim I was working on in grad school.  NaNs
> were creeping in and performance was terrible.  I thought it was two
> problems till I got rid of the NaNs and suddenly performance was ok
> too.
> 
> --bb

Almost same here. Program was amazingly slow until I figured there were NaNs involved. It would be great if we could eliminate that behavior.

Andrei
1 2 3 4 5 6
Next ›   Last »