Jump to page: 1 2
Thread overview
Submission: Floating point near-equality
Aug 15, 2005
Don Clugston
Aug 15, 2005
BCS
Aug 15, 2005
Dave
Aug 15, 2005
Ben Hinkle
Aug 15, 2005
Ben Hinkle
Aug 16, 2005
Don Clugston
Aug 16, 2005
Ben Hinkle
Aug 16, 2005
Don Clugston
Aug 16, 2005
Ben Hinkle
Aug 17, 2005
Don Clugston
Re: Floating point near-equality: near zero and infinity
Aug 17, 2005
Don Clugston
Aug 17, 2005
Ben Hinkle
Aug 17, 2005
Shammah Chancellor
Aug 18, 2005
Don Clugston
Aug 18, 2005
Ben Hinkle
Aug 15, 2005
Walter
Aug 16, 2005
Don Clugston
Aug 16, 2005
Ben Hinkle
Aug 16, 2005
Don Clugston
August 15, 2005
Submission: probably for std.math
int precisionEquality(real x, real y)

"How equal" are two reals x and y?

Rationale:
The use of == to test floating point numbers for equality is somewhat dangerous,
because == is completely intolerant of round-off error.
For example, almost every unit test involving floating point numbers has an
assertion of floating-point equality; but operator == is usually far too strict.
Instead, some form of "close enough" test must be used. If not provided by the
language, programmers must write this test themselves; but unfortunately,
it's easy to write this test incorrectly.

In std.math and std.math2 there are two floating point closeness tests: feq()
and mfeq(). Ultimately these are:
return fabs(x - y) <= precision;
where precision is (say) 0.000001
This is incorrect: for very small numbers, this is a very weak assertion
eg mfeq(1e-30, 7.8e-300) returns true!
but for large numbers, eg x=1e60, it is an *extremely* strong assertion.

The "correct" way to do it, according to Knuth, is to combine a ratio test with a difference test. But, this is a bit slow, and it still has the arbitrary magic precision number ("0.000001" in the code above).

An interesting idea by Bruce Dawson
(http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm)
is to calculate the number of representable IEEE reals which lie between x and
y.
Unfortunately..
* the answer depends on whether the numbers are float, double, or real.
* the resulting value gets _very_ large if x and y are far apart...
* ... and with an IEEE extended float, it won't even fit in a ulong!
* it's not very intuitive.

IMHO, what you actually want is a function which tells you: "to how many significant figures (bits) is x==y?"

The function below works on 80-bit reals. It is very fast (with possibilities for further optimisation), and it provides sensible results for any input, including subnormals, infinities, and NaNs. It could easily be converted for reals with different precision, (you'd just need to know where the exponent lies).

For example, this function ensures that there are at most 4 bits of round-off error:

int feq(real x, real y)
{
return precisionEquality(x, y, real.mant_dig-4);
}

As well as providing a 'closeness' test, the function allows you to evaluate the accuracy of a calculation. More on this in a future post.

writefln("It is accurate to %d bits.", precisionEquality(pifinder(), PI));

I believe that some equivalent to this function belongs in std.math. It would be used in almost every unit test involving floating point numbers!

Some design choices which are open to debate:
* the name. precisionEquality() is the best I've come up with so far.
* behaviour with NaNs. Like ==,  the version below states that two NaNs are
completely unequal. But it would also be reasonable to state that they are equal
to 64 bits!
* My expansion of the notion of "significant figures" to negative values is
unconventional.

This is my first D contribution, my style may be inadequate.
I'd really appreciate feedback, since it's surprisingly tricky, and AFAIK, no
language has ever got this right.

-Don.

================================================

import std.math2; // for abs
import std.math; // for isNan

/*
int precisionEquality(real x, real y)

"How equal" are two reals x and y?
Returns the number of mantissa bits which are equal in x and y;
this is similar to the concept of "significant figures".
For example, 0x1.F8p+60 and 0x1.F1p+60 are equal to 5 bits of precision.
If x==y, then precisionEquality(a,b) == real.mant_dig.

If x and y differ by a factor of two or more, then they have no
bits in common, so this function returns a value which is <=0.
Returns 0 if they differ by a factor of >=2 but < 4,
returns 1 for factors >=4 but <8, etc.

If the numbers have opposite sign, the difference in orders
of magnitude is based on the IEEE binary encoding.
For example, -1 and +1 differ by half the number line.

Pretend the numbers were fixed point. In the IEEE extended format,
there are about 32768 digits in front of the decimal point, and
the same number behind. But only the first 64 bits after the first '1'
bit are recorded.

If the exponents are different, then the return value
is negative, and gives the number of (binary) orders of
magnitude of difference between a and b.
*/
int precisionEquality(real a, real b)
{
real diff = a-b;
ushort *pa = cast(ushort *)(&a);
ushort *pb = cast(ushort *)(&b);
ushort *pd = cast(ushort *)(&diff);

// The difference in exponent between 'a' and 'a-b'
// is equal to the number of mantissa bits of a which are
// equal to b. If the difference is negative, then a and b
// have different exponents. If it is positive, then a and b
// are equal to that number of decimal places.
// AND with 0x7FFF to form the absolute value.
// Subtract 1 so that we round downwards.
int bitsdiff = ( ((pa[4]&0x7FFF) + (pb[4]&0x7FFF)-1)>>1) - (pd[4]&0x7FFF);

if ((pd[4]&0x7FFF) == 0) { // Difference is zero or denormal
if (a==b) return real.mant_dig;
// 'diff' is denormal, so the number of equal bits is higher;
// we need to add the number of zeros that lie at the start of
// its mantissa. We do this by multiplying by 2^real.mant_dig
diff*=0x1p+63;
return bitsdiff + real.mant_dig-(pd[4]&0x7FFF);
}

if (bitsdiff>0) return bitsdiff+1;

// Check for NAN or Infinity.
if (pd[4]&0x7FFF==0x7FFF) { // Result is NAN or INF
if (a==b) { 			 // both were +INF or both were -INF.
return real.mant_dig;
}
if (isnan(diff)) {
return -65535;  // at least one was a NAN.
}
// fall through for comparisons of INF with large reals.
}
// Return the negative of the absolute value of
// the difference in exponents.

if ((pa[4]^pb[4])&0x8000) { // do they have opposite signs?
// Convert the 'offset' exponent format to twos-complement,
// then do a normal subtraction.
return 1-abs(pa[4]-(0x8000-pb[4]));
}
return 1-abs(pa[4]-pb[4]);
}

unittest {
// Exact equality
assert(precisionEquality(real.max,real.max)==real.mant_dig);
assert(precisionEquality(0,0)==real.mant_dig);
assert(precisionEquality(7.1824,7.1824)==real.mant_dig);
assert(precisionEquality(real.infinity,real.infinity)==real.mant_dig);

// one bit off exact equality
assert(precisionEquality(1+real.epsilon,1)==real.mant_dig-1);
assert(precisionEquality(1,1+real.epsilon)==real.mant_dig-1);
// one bit below. Must round off correctly when exponents are different.
assert(precisionEquality(1,1-real.epsilon)==real.mant_dig-1);
assert(precisionEquality(1-real.epsilon,1)==real.mant_dig-1);
assert(precisionEquality(-1-real.epsilon,-1)==real.mant_dig-1);
assert(precisionEquality(-1+real.epsilon,-1)==real.mant_dig-1);

// Numbers that are close
assert(precisionEquality(0x1.Bp+5, 0x1.B8p+5)==5);
assert(precisionEquality(0x1.8p+10, 0x1.Cp+10)==2);
assert(precisionEquality(1, 1.1249)==4);
assert(precisionEquality(1.5, 1)==1);

// Extreme inequality
assert(precisionEquality(real.nan,0)==-65535);
assert(precisionEquality(0,-real.nan)==-65535);
assert(precisionEquality(real.nan,real.infinity)==-65535);
assert(precisionEquality(real.infinity,-real.infinity)==-65533);
assert(precisionEquality(-real.max,real.infinity)==-65532);
assert(precisionEquality(-real.max,real.max)==-65531);

// Numbers that are half the number line apart
// Note that (real.max_exp-real.min_exp) = 32765
assert(precisionEquality(-real.max,0)==-32765);
assert(precisionEquality(-1,1)==-32765);
assert(precisionEquality(real.min,1)==-16381);

// Numbers that differ by one order of magnitude
assert(precisionEquality(real.max,real.infinity)==0);
assert(precisionEquality(1, 2)==0);
assert(precisionEquality(2, 1)==0);
assert(precisionEquality(1.5*(1-real.epsilon), 1)==2);
assert(precisionEquality(4*(1-real.epsilon), 1)==0);
assert(precisionEquality(4, 1)==-1);
}


August 15, 2005
#$%@ you!!!   ;-)

I was going to post on this EXACT subject today!! :)

I was thinking of making it an operator, maybe "~~".


August 15, 2005
In article <ddp3eh$b3$1@digitaldaemon.com>, Don Clugston says...
>

I'm no mathematician, but your proposal seems very reasonable and would be nice to incorporate into an operator of some kind too.


August 15, 2005
Comments inline

"Don Clugston" <Don_member@pathlink.com> wrote in message news:ddp3eh$b3$1@digitaldaemon.com...
> Submission: probably for std.math
> int precisionEquality(real x, real y)
>
> "How equal" are two reals x and y?
>
> Rationale:
> The use of == to test floating point numbers for equality is somewhat
> dangerous,
> because == is completely intolerant of round-off error.
> For example, almost every unit test involving floating point numbers has
> an
> assertion of floating-point equality; but operator == is usually far too
> strict.
> Instead, some form of "close enough" test must be used. If not provided by
> the
> language, programmers must write this test themselves; but unfortunately,
> it's easy to write this test incorrectly.

Agreed - though the notion of 'close enough' depends on the situation. I think there was an earlier thread about this kind of thing, too. There's a really short thread in the archives with Walter's opinion: http://www.digitalmars.com/d/archives/digitalmars/D/6397.html

> In std.math and std.math2 there are two floating point closeness tests:
> feq()
> and mfeq(). Ultimately these are:
> return fabs(x - y) <= precision;
> where precision is (say) 0.000001
> This is incorrect: for very small numbers, this is a very weak assertion
> eg mfeq(1e-30, 7.8e-300) returns true!
> but for large numbers, eg x=1e60, it is an *extremely* strong assertion.
>
> The "correct" way to do it, according to Knuth, is to combine a ratio test with a difference test. But, this is a bit slow, and it still has the arbitrary magic precision number ("0.000001" in the code above).

It'll be hard to avoid magic numbers. Even a statement of "4 bits of precision" is a magic number - just in binary. I actually wouldn't mind being up-front about the fact that there are two notions of equality absolute and relative and have the test take two parameters (the relative and absute tolerances) and go with those. I'm personally not worried about the performance of a division or multiplication when doing a floating pt test - if that's the performance hit you're worried about.

> An interesting idea by Bruce Dawson
> (http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm)
> is to calculate the number of representable IEEE reals which lie between x
> and
> y.
> Unfortunately..
> * the answer depends on whether the numbers are float, double, or real.
> * the resulting value gets _very_ large if x and y are far apart...
> * ... and with an IEEE extended float, it won't even fit in a ulong!
> * it's not very intuitive.
>
> IMHO, what you actually want is a function which tells you: "to how many significant figures (bits) is x==y?"

With special treatment for 0? or around 0?

> The function below works on 80-bit reals. It is very fast (with
> possibilities
> for further optimisation), and it provides sensible results for any input,
> including subnormals, infinities, and NaNs. It could easily be converted
> for
> reals with different precision, (you'd just need to know where the
> exponent lies).

It's actually very useful to have equality tests that know about float and double. That way the promotion to real doesn't introduce extra "precision" in a number that it never had to start with and it's easier to just type feq(x,y) when x and y are doubles instead of something like feq(x,y,double.mant_dig-4)

> For example, this function ensures that there are at most 4 bits of
> round-off
> error:
>
> int feq(real x, real y)
> {
> return precisionEquality(x, y, real.mant_dig-4);
> }

The precisionEquality function below doesn't take 3 inputs. Did you mean
something like
  return precisionEquality(x,y) >= real.mant_dig-4
Also glancing over the code does it assume little-endian architecture? I see
the ushort* used to grab different parts of the real.

> As well as providing a 'closeness' test, the function allows you to
> evaluate the
> accuracy of a calculation. More on this in a future post.
>
> writefln("It is accurate to %d bits.", precisionEquality(pifinder(), PI));
>
> I believe that some equivalent to this function belongs in std.math. It
> would
> be used in almost every unit test involving floating point numbers!
>
> Some design choices which are open to debate:
> * the name. precisionEquality() is the best I've come up with so far.

yeah - that name is pretty long. 'feq' would be the best IMHO.

> * behaviour with NaNs. Like ==,  the version below states that two NaNs
> are
> completely unequal. But it would also be reasonable to state that they are
> equal
> to 64 bits!

NaN's should always be unequal. A function 'fis' could be defined that returned true if both are NaN.

> * My expansion of the notion of "significant figures" to negative values is unconventional.

I'm not sure what you are referring to.

> This is my first D contribution, my style may be inadequate.
> I'd really appreciate feedback, since it's surprisingly tricky, and AFAIK,
> no
> language has ever got this right.
>
> -Don.

It would be nice to combine that with an abstol parameter near 0. That way a
test like feq(0,double.epsilon) will be true. I could see something like feq
checking if one of the two values is exactly 0 and doing absolute checking
there - or just doing a straightforward abs/rel check.



August 15, 2005
>> In std.math and std.math2 there are two floating point closeness tests:
>> feq() and mfeq(). Ultimately these are:
>> return fabs(x - y) <= precision;
>> where precision is (say) 0.000001
>> This is incorrect: for very small numbers, this is a very weak assertion
>> eg mfeq(1e-30, 7.8e-300) returns true!
>> but for large numbers, eg x=1e60, it is an *extremely* strong assertion.
>>
>> The "correct" way to do it, according to Knuth, is to combine a ratio test with a difference test. But, this is a bit slow, and it still has the arbitrary magic precision number ("0.000001" in the code above).
>
> It'll be hard to avoid magic numbers. Even a statement of "4 bits of precision" is a magic number - just in binary. I actually wouldn't mind being up-front about the fact that there are two notions of equality absolute and relative and have the test take two parameters (the relative and absute tolerances) and go with those. I'm personally not worried about the performance of a division or multiplication when doing a floating pt test - if that's the performance hit you're worried about.

To be more concrete here's what I was thinking of:

const float TinyFloat = 1e-4f;
const double TinyDouble = 1e-8;
const real TinyReal = 1e-10L; // depends on real.epsilon
int feq(real x, real y, real reltol = TinyReal, real abstol = TinyReal) {
    real diff = fabs(x-y);
    if (diff <= abstol)
        return 1;
    real fx = fabs(x);
    real fy = fabs(y);
    real scale = fx>fy ? fx : fy;
    return diff/scale <= reltol;
}
int feq(double x, double y, double reltol = TinyDouble, double abstol =
TinyDouble) {
    return feq(cast(real)x,cast(real)y,cast(real)reltol,cast(real)abstol);
}
int feq(float x, float y, float reltol = TinyFloat, float abstol =
TinyFloat) {
    return feq(cast(real)x,cast(real)y,cast(real)reltol,cast(real)abstol);
}

That way if you know you'll be dealing with very small numbers (less than TinyReal, for example) then one can pass 0 for the abstol (and let the optimzer remove any dead code) and only use the relative equality tests. If one wants a pure absolute test (suppose one knows the numbers will always be near 1) then pass 0 for reltol and let the optimzer remove that code. For "normal usage", though, numbers that are "close" either in absolute or relative terms will test as equal.


August 15, 2005
I think this is pretty cool. I've thought about building such a thing, but never got around to it. You've got the right idea, which is the number of significant bits that match.

I agree that if one or both of the operands are NaN, the result should be 0.

And you've got unit tests, too, which I love to see! Need to add a test for
(real.nan, real.nan).

I need to know what license you pick for it - public domain?



August 16, 2005
In article <ddr5og$1phb$1@digitaldaemon.com>, Walter says...
>
>I think this is pretty cool. I've thought about building such a thing, but never got around to it. You've got the right idea, which is the number of significant bits that match.
>
>I agree that if one or both of the operands are NaN, the result should be 0.
>
>And you've got unit tests, too, which I love to see! Need to add a test for
>(real.nan, real.nan).

I need to add a few more tests for denormals, too. There are so many corner cases!

>I need to know what license you pick for it - public domain?
Public domain.

Please note that I haven't finished with it. It works, but some of the comments
are wrong! And some unit tests are missing, and there's some further
optimisation/ cleanup to do. I just wanted to check whether people thought this
was the correct approach before I spent any more time on it.
I will post an update soon.




August 16, 2005
> I'd really appreciate feedback, since it's surprisingly tricky, and AFAIK,
> no
> language has ever got this right.
>
> -Don.

Well Walter seems gung-ho with it but please add in a difference test with feq (at least near 0 where it matters) so that, for example, sin(PI) and sin(-PI) and 0 all feq with each other. I think using a pure relative test is too narrow.


August 16, 2005
From your first post:
>Also glancing over the code does it assume little-endian architecture? I see the ushort* used to grab different parts of the real.

Yes, it's just like the isfinite(), isnan() functions in std.math.
It's also only valid for 80-bit reals.
I'm just happy that I managed to do it without reverting to asm!

>> It'll be hard to avoid magic numbers. Even a statement of "4 bits of precision" is a magic number - just in binary. I actually wouldn't mind being up-front about the fact that there are two notions of equality absolute and relative and have the test take two parameters (the relative and absute tolerances) and go with those.

The point is that in IEEE arithmetic, the relative and absolute tolerances are related via the number of bits of precision. Near zero, "bits of precision" looks like an absolute tolerance. Away from zero, it looks like relative tolerance. And there's a region in between where it's really difficult to think in terms of absolute or relative tolerance, they kind of blur together.

Note that there are many uses beyond simple "equal-enough" tests.
This function allows you to say: At what value of x are f(x) and g(x) most
different? eg when is exp(ln(x)) most different to x?
You can just try a range of different x values, and find the one where
precisionEquality(exp(ln(x)), x) is smallest. You don't need to worry about
whether you should be using absolute or relative tolerance, that's already taken
care of. So, if writing a function, you can find where the precision is least
adequate. Once you've done the best you can, you can put that into a unit test.

This gives you a simple way of reporting the accuracy of library functions and
identities, and for performing more powerful "second-order" unit tests.
(eg, acos(cos(x)) must be close to x for all -PI/2 <= x <= PI/2).
Then, if you know that cos(x) is accurate, you've proved that acos(x) is also
accurate. My goal is to provide one-line unit tests for most of the math
functions.

You're right: for equality tests, it probably is necessary to have seperate
implementations for double and float, because subnormals become normal when
moved to higher precision. It's only a problem if your intermediate calculations
were done in double or float precision, instead of reals.
What is true, though, is that
if precisionEquality(real x, real y) >= double.mant.dig
then cast(double)x == cast(double)y.
The converse does not apply because of the aforementioned subnormals. (But in
this case you kind of "got lucky").

I did the real case first because it is the most difficult. It's trivial to extend it to other precisions.

-Don.


August 16, 2005
In article <ddre2a$1vck$1@digitaldaemon.com>, Ben Hinkle says...
>
>> I'd really appreciate feedback, since it's surprisingly tricky, and AFAIK,
>> no
>> language has ever got this right.
>>
>> -Don.
>
>Well Walter seems gung-ho with it but please add in a difference test with feq (at least near 0 where it matters) so that, for example, sin(PI) and sin(-PI) and 0 all feq with each other. I think using a pure relative test is too narrow.

It's not a pure relative test. Subnormals will compare equal to zero. So it
should work. But...
I just checked, and was surprised to find that sin(PI) = 1p-64 ~ 1e-20.
If you define
real reducedsin(real x) { return sin(x-PI); }
then reducedsin(PI) ==0.

I would have thought that sin(n*PI) ==0 was an important identity to preserve. Could there be an out-by-1 bug in the argument reduction function used by the trig functions? This is exactly the sort of thing I'm hoping to uncover.

-Don.



« First   ‹ Prev
1 2