Nice document on IEEE 754 floating point arithmetic - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Nice document on IEEE 754 floating point arithmetic

Thread overview

Nice document on IEEE 754 floating point arithmetic
Jan 10, 2005 Norbert Nemec
Jan 12, 2005 Walter
Jan 12, 2005 Norbert Nemec
Jan 13, 2005 Norbert Nemec
Jan 13, 2005 Walter
Jan 13, 2005 Russ Lewis
Jan 13, 2005 Norbert Nemec
Jan 13, 2005 Walter

January 10, 2005

Nice document on IEEE 754 floating point arithmetic

Posted by Norbert Nemec

Norbert Nemec

Hi there,

I just found a really nice document at:

 http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF

It really gives a lot of insight on the rationale behind IEEE754 design, also focusing a lot on language and compiler design.

One of the most relevant points with respect to D is probably the behavior of comparison operators. The D specs go a rather practical way by defining comparisons based on mathematical semantics. For floating points, though, this is often not correct.

There probably are a few more points to consider when evaluating D for numerical purposes. Some of the demands given in the document are probably unrealistic for a general purpose language, but quite a number seem perfectly reasonable to me.

Anyhow: instead of spending many words on the topic at this point, I would rather advise anyone interested in numerics to have a look at the document - be it only to get an understanding what the concerns might be about.

Ciao,
Norbert

January 12, 2005

Re: Nice document on IEEE 754 floating point arithmetic

Posted by Walter
in reply to Norbert Nemec

Walter

Posted in reply to Norbert Nemec

"Norbert Nemec" <Norbert@Nemec-online.de> wrote in message news:crv1i3$2bn0$1@digitaldaemon.com...
> Hi there,
>
> I just found a really nice document at:
>
>  http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
>
> It really gives a lot of insight on the rationale behind IEEE754 design, also focusing a lot on language and compiler design.
>
> One of the most relevant points with respect to D is probably the behavior of comparison operators. The D specs go a rather practical way by defining comparisons based on mathematical semantics. For floating points, though, this is often not correct.
>
> There probably are a few more points to consider when evaluating D for numerical purposes. Some of the demands given in the document are probably unrealistic for a general purpose language, but quite a number seem perfectly reasonable to me.
>
> Anyhow: instead of spending many words on the topic at this point, I would rather advise anyone interested in numerics to have a look at the document - be it only to get an understanding what the concerns might be about.

Thanks for the pointer. As far as I know, D (and Digital Mars C/C++) are the only languages that properly support NaN's in comparison operators. This is deliberate on my part, even at the slight cost of performance it entails. Digital Mars compilers have always leaned towards doing accurate and correct floating point as a priority over performance.

January 12, 2005

Re: Nice document on IEEE 754 floating point arithmetic

Posted by Norbert Nemec
in reply to Walter

Norbert Nemec

Posted in reply to Walter

Walter wrote:
> As far as I know, D (and Digital Mars C/C++) are
> the only languages that properly support NaN's in comparison operators.
> This is deliberate on my part, even at the slight cost of performance it
> entails. Digital Mars compilers have always leaned towards doing accurate
> and correct floating point as a priority over performance.

Great" "Full IEEE 754 conformance" definitely would be a tremendous argument for using D in the numerics field.

Anyhow, there is a few details I wonder about:

A) The chapter "Expressions", section "Equality Expressions" states

 "If either or both operands are NaN, then both the == and != comparisons
 return false."

which is in contrast to the table in "Relational Expressions" (and to the
IEEE754 standard as well)

B) The note 1. under the same table states that "For floating point comparison operators, (a !op b) is not the same as !(a op b)." It should be noted that this refers only to the question whether they signal on NaNs - otherwise, this sentence is hard to understand.

C) Raising of Invalid Exceptions on should just be an option but not the default. IEEE754 states that a *flag* should be raised which can then be checked and reset by the user lateron. Raising an exception would be equivalent to what the document calls "trapping". Unless you are debugging your code, that is hardly ever what you want. The whole power of the NaN-concept is that they do not interrupt the calculation but instead are handled just like similar numbers. My output data from numerical calculations usually is a long block of floating points which may contain some NaNs. This simply tells me that I hit singularities or other special points in some cases and either drop or specially mark these points in the resulting plots.

D) Furthermore: even if the behavior for native floats is correct, operator overloading still is not capable to mimic this behavior.

In any case, I think that operator overloading is not quite flexible enough in several respects. Matlab, for example, allows comparing two arrays of numbers, returning an array of bools, which can then be used in many ways. In D, this would not be possible, since the comparison operators are based on opCmp. Furthermore, ! is not overloadable at all, so even if I write an opEquals for two arrays returning an array of bools, I could never mimic the correct behavior for !=

My suggestion would be:
* Introduce opNot (unlike && and ||, there is no compelling reason why it
should not be overloadable)
* Introduce opLess, opGreater, opLessOrEqual, opGreaterOrEqual. If these are
not defined, the compiler can still fall back to opCmp. In the same course,
even == could fall back to opCmp if opEquals does not exist.

January 13, 2005

Re: Nice document on IEEE 754 floating point arithmetic

Posted by Norbert Nemec
in reply to Norbert Nemec

Norbert Nemec

Posted in reply to Norbert Nemec

Norbert Nemec wrote:

> My suggestion would be:
> * Introduce opNot (unlike && and ||, there is no compelling reason why it
> should not be overloadable)
> * Introduce opLess, opGreater, opLessOrEqual, opGreaterOrEqual. If these
> are not defined, the compiler can still fall back to opCmp. In the same
> course, even == could fall back to opCmp if opEquals does not exist.

On second thought: if this were done, one clearly would have to take care of .sort and other builtins that are based on opCmp.

How is .sort currently supposed to behave on hitting a NaN in an array of floats?

One way to deal with this would be to offer both: opLess etc. for doing exact comparisons that may follow IEEE754 standards or whatever exactly, and opCmp that might not be mathematically correct but guarantees a partial ordering of the list. For float-like objects, opCmp would then just do binary sorting which is mostly accurate and good for sortings floats, but not IEEE754 conformant. Still, <, >, <=, etc. would do the correct thing since they are mapped to the opLess, etc. functions instead of the over-simplistic opCmp

January 13, 2005

Re: Nice document on IEEE 754 floating point arithmetic

Posted by Walter
in reply to Norbert Nemec

Walter

Posted in reply to Norbert Nemec

"Norbert Nemec" <Norbert@Nemec-online.de> wrote in message news:cs41vr$2vgl$1@digitaldaemon.com...
> Walter wrote:
> > As far as I know, D (and Digital Mars C/C++) are
> > the only languages that properly support NaN's in comparison operators.
> > This is deliberate on my part, even at the slight cost of performance it
> > entails. Digital Mars compilers have always leaned towards doing
accurate
> > and correct floating point as a priority over performance.
>
> Great" "Full IEEE 754 conformance" definitely would be a tremendous
argument
> for using D in the numerics field.
>
> Anyhow, there is a few details I wonder about:
>
> A) The chapter "Expressions", section "Equality Expressions" states
>
>  "If either or both operands are NaN, then both the == and != comparisons
>  return false."
>
> which is in contrast to the table in "Relational Expressions" (and to the
> IEEE754 standard as well)

I believe D has this to be correct. Note that there are two different equality operators, one that says a NaN operand returns false and the other giving true for a NaN operand. IEEE 754 suggests "=" for the former (lining up with D's "=="), and "?=" for the latter (in D it is "!<>").


> B) The note 1. under the same table states that "For floating point
> comparison operators, (a !op b) is not the same as !(a op b)." It should
be
> noted that this refers only to the question whether they signal on NaNs - otherwise, this sentence is hard to understand.

NaNs in general are hard to get used to <g>. But it's worthwhile.


> C) Raising of Invalid Exceptions on should just be an option but not the default. IEEE754 states that a *flag* should be raised which can then be checked and reset by the user lateron. Raising an exception would be equivalent to what the document calls "trapping". Unless you are debugging your code, that is hardly ever what you want. The whole power of the NaN-concept is that they do not interrupt the calculation but instead are handled just like similar numbers. My output data from numerical calculations usually is a long block of floating points which may contain some NaNs. This simply tells me that I hit singularities or other special points in some cases and either drop or specially mark these points in the resulting plots.

True, and this is exactly what D does - set the invalid operation flag, which is sticky, and can be tested/set/cleared under programmer control.

> D) Furthermore: even if the behavior for native floats is correct,
operator
> overloading still is not capable to mimic this behavior.
>
> In any case, I think that operator overloading is not quite flexible
enough
> in several respects. Matlab, for example, allows comparing two arrays of numbers, returning an array of bools, which can then be used in many ways. In D, this would not be possible, since the comparison operators are based on opCmp. Furthermore, ! is not overloadable at all, so even if I write an opEquals for two arrays returning an array of bools, I could never mimic the correct behavior for !=
>
> My suggestion would be:
> * Introduce opNot (unlike && and ||, there is no compelling reason why it
> should not be overloadable)
> * Introduce opLess, opGreater, opLessOrEqual, opGreaterOrEqual. If these
are
> not defined, the compiler can still fall back to opCmp. In the same
course,
> even == could fall back to opCmp if opEquals does not exist.

You're right that the current opCmp overloading cannot handle NaN operands. Overloading opNot won't help, either, and I firmly believe that it is a mistake to overload opNot. The problem is that opCmp returns one of 3 states, but 4 states are needed. I think the correct approach is to have an opFCmp overload that returns one of 4 states (less than, greater than, equal, unordered) which the code generator can use to support each of the extended relational operators. The operator overloading mechanism would first look for opFCmp, only if that does not exist will it look for opCmp.

January 13, 2005

Re: Nice document on IEEE 754 floating point arithmetic

Posted by Walter
in reply to Norbert Nemec

Walter

Posted in reply to Norbert Nemec

"Norbert Nemec" <Norbert@Nemec-online.de> wrote in message news:cs5bt7$1s3s$1@digitaldaemon.com...
> How is .sort currently supposed to behave on hitting a NaN in an array of floats?

Putting all the NaN's at the end is one reasonable solution.

> One way to deal with this would be to offer both: opLess etc. for doing exact comparisons that may follow IEEE754 standards or whatever exactly, and opCmp that might not be mathematically correct but guarantees a
partial
> ordering of the list. For float-like objects, opCmp would then just do binary sorting which is mostly accurate and good for sortings floats, but not IEEE754 conformant. Still, <, >, <=, etc. would do the correct thing since they are mapped to the opLess, etc. functions instead of the over-simplistic opCmp

January 13, 2005

Re: Nice document on IEEE 754 floating point arithmetic

Posted by Russ Lewis
in reply to Walter

Russ Lewis

Posted in reply to Walter

Walter wrote:
> "Norbert Nemec" <Norbert@Nemec-online.de> wrote in message
> news:cs5bt7$1s3s$1@digitaldaemon.com...
> 
>>How is .sort currently supposed to behave on hitting a NaN in an array of
>>floats?
> 
> Putting all the NaN's at the end is one reasonable solution.

I noticed in that document that the spec was designed such that you could sort floats using an ordinary sort (as though they were ints).  I don't remember any exception for NaN's, but maybe I missed or forgot it.

January 13, 2005

Re: Nice document on IEEE 754 floating point arithmetic

Posted by Norbert Nemec
in reply to Russ Lewis

Norbert Nemec

Posted in reply to Russ Lewis

Russ Lewis wrote:

> Walter wrote:
>> "Norbert Nemec" <Norbert@Nemec-online.de> wrote in message news:cs5bt7$1s3s$1@digitaldaemon.com...
>> 
>>>How is .sort currently supposed to behave on hitting a NaN in an array of floats?
>> 
>> Putting all the NaN's at the end is one reasonable solution.
> 
> I noticed in that document that the spec was designed such that you could sort floats using an ordinary sort (as though they were ints).  I don't remember any exception for NaN's, but maybe I missed or forgot it.

True, if you sort floats by their binary representation, you get the correct ordering for all finite and infinite values with the NaNs sorted to some special position.

This would mean that opCmp for floats could just do a binary comparison while the relational operators would have to be decoupled from it. Exactly following my proposal that  relational operators should be overloadable individually (using opLess, opGreater, etc.) and only fall back to opCmp if the former do not exist.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation