Jump to page: 1 25  
Page
Thread overview
Exotic floor() function - D is different
Mar 29, 2005
Bob W
Mar 31, 2005
Walter
Mar 31, 2005
Derek Parnell
Apr 01, 2005
Bob W
Apr 01, 2005
Walter
Apr 01, 2005
Bob W
Apr 01, 2005
Walter
Apr 02, 2005
Derek Parnell
Apr 02, 2005
Walter
Apr 02, 2005
Derek Parnell
Apr 02, 2005
Derek Parnell
Apr 02, 2005
Walter
Apr 02, 2005
Bob W
Apr 02, 2005
Derek Parnell
Apr 02, 2005
Walter
Apr 02, 2005
Derek Parnell
Apr 03, 2005
Bob W
Apr 03, 2005
Walter
Apr 03, 2005
Walter
Apr 04, 2005
Walter
Apr 04, 2005
Georg Wrede
Apr 04, 2005
Ben Hinkle
Apr 04, 2005
Ben Hinkle
Apr 04, 2005
Bob W
Apr 04, 2005
Walter
Apr 05, 2005
Charles Hixson
Apr 04, 2005
Walter
Apr 05, 2005
Charles Hixson
Apr 05, 2005
Walter
Apr 03, 2005
Bob W
Apr 02, 2005
Bob W
Apr 02, 2005
Walter
Apr 02, 2005
Bob W
March 29, 2005
The floor() function in D does not produce equivalent
results compared to a bunch of other languages
tested. The other languages were:

  dmc
  djgpp
  dmdscript
  jscript
  assembler ('87 code)

The biggest surprise was that neither dmc nor
dmdscript were able to match the D results.

The sample program below gets an input
from the command line, converts it, multiplies
it with 1e6 and adds 0.5 before calling the
floor() function. The expected result, based on
an input of 0.0000195, would be 20.0, but
D thinks it should be 19.0.

Since 0.0000195 cannot be represented
accurately in any of the usual floating point
formats, the somewhat unique D result is
probably not even a bug. But it is a major
inconvenience when comparing numerical
outputs produced by different programs.

So far I was unable to reproduce the rounding
issue in D with any other language tested.
(I have even tried OpenOffice to check.)
Before someone tells me that D uses a
different floating point format, I'd like to
mention that I have used float, double and
long double in the equivalent C programs
without any changes.


//------------------------------

import std.stdio,std.string,std.math;

int main(char[][] av) {
  if (av.length!=2) {
    printf("\nEnter Val! (e.g. 0.0000195)\n");  return(0);
  }

  double x=atof(av[1]);                    // expecting 0.0000195;
  writef("          x*1e6:%12.6f\n",x*1e6);
  writef("     floor(x..):%12.6f\n",floor(1e6*x));
  writef("  floor(.5+x..):%12.6f\n",floor(.5 + 1e6*x));
  writef("  floor(.5+co.):%12.6f\n",floor(.5 + 1e6*0.0000195));

  return(0);
}



March 31, 2005
"Bob W" <nospam@aol.com> wrote in message news:d2aash$a4s$1@digitaldaemon.com...
> The floor() function in D does not produce equivalent
> results compared to a bunch of other languages
> tested. The other languages were:
>
>   dmc
>   djgpp
>   dmdscript
>   jscript
>   assembler ('87 code)
>
> The biggest surprise was that neither dmc nor
> dmdscript were able to match the D results.
>
> The sample program below gets an input
> from the command line, converts it, multiplies
> it with 1e6 and adds 0.5 before calling the
> floor() function. The expected result, based on
> an input of 0.0000195, would be 20.0, but
> D thinks it should be 19.0.
>
> Since 0.0000195 cannot be represented
> accurately in any of the usual floating point
> formats, the somewhat unique D result is
> probably not even a bug. But it is a major
> inconvenience when comparing numerical
> outputs produced by different programs.
>
> So far I was unable to reproduce the rounding
> issue in D with any other language tested.
> (I have even tried OpenOffice to check.)
> Before someone tells me that D uses a
> different floating point format, I'd like to
> mention that I have used float, double and
> long double in the equivalent C programs
> without any changes.

What you're seeing is the result of using 80 bit precision, which is what D uses in internal calculations. .0000195 is not represented exactly, to print the number it is rounded. So, depending on how many bits of precision there are in the representation, it might be one bit, 63 bits to the right, under "5", so floor() will chop it down.

Few C compilers support 80 bit long doubles, they implement them as 64 bit ones. Very few programs use 80 bit reals.

The std.math.floor function uses 80 bit precision. If you want to use the C 64 bit one instead, add this declaration:

    extern (C) double floor(double);

Then the results are:

          x*1e6:   19.500000
     floor(x..):   19.000000
  floor(.5+x..):   20.000000
  floor(.5+co.):   20.000000

I suggest that while it's a reasonable thing to require a minimum number of floating point bits for a computation, it's probably not a good idea to require a maximum.


March 31, 2005
On Wed, 30 Mar 2005 21:43:07 -0800, Walter wrote:

> "Bob W" <nospam@aol.com> wrote in message news:d2aash$a4s$1@digitaldaemon.com...
>> The floor() function in D does not produce equivalent
>> results compared to a bunch of other languages
>> tested. The other languages were:
>>
>>   dmc
>>   djgpp
>>   dmdscript
>>   jscript
>>   assembler ('87 code)
>>
>> The biggest surprise was that neither dmc nor
>> dmdscript were able to match the D results.
>>
>> The sample program below gets an input
>> from the command line, converts it, multiplies
>> it with 1e6 and adds 0.5 before calling the
>> floor() function. The expected result, based on
>> an input of 0.0000195, would be 20.0, but
>> D thinks it should be 19.0.
>>
>> Since 0.0000195 cannot be represented
>> accurately in any of the usual floating point
>> formats, the somewhat unique D result is
>> probably not even a bug. But it is a major
>> inconvenience when comparing numerical
>> outputs produced by different programs.
>>
>> So far I was unable to reproduce the rounding
>> issue in D with any other language tested.
>> (I have even tried OpenOffice to check.)
>> Before someone tells me that D uses a
>> different floating point format, I'd like to
>> mention that I have used float, double and
>> long double in the equivalent C programs
>> without any changes.
> 
> What you're seeing is the result of using 80 bit precision, which is what D uses in internal calculations. .0000195 is not represented exactly, to print the number it is rounded. So, depending on how many bits of precision there are in the representation, it might be one bit, 63 bits to the right, under "5", so floor() will chop it down.
> 
> Few C compilers support 80 bit long doubles, they implement them as 64 bit ones. Very few programs use 80 bit reals.
> 
> The std.math.floor function uses 80 bit precision. If you want to use the C 64 bit one instead, add this declaration:
> 
>     extern (C) double floor(double);
> 
> Then the results are:
> 
>           x*1e6:   19.500000
>      floor(x..):   19.000000
>   floor(.5+x..):   20.000000
>   floor(.5+co.):   20.000000
> 
> I suggest that while it's a reasonable thing to require a minimum number of floating point bits for a computation, it's probably not a good idea to require a maximum.

I can follow what you say, but can you explain the output of the program below? There appears to be a difference in the way variables and literals are treated.

import std.stdio;
import std.math;
import std.string;

void main() {

  float  x;
  double y;
  real   z;


  x = 0.0000195;
  y = 0.0000195;
  z = 0.0000195;
  writefln("                          Raw            Floor");
  writefln("Using float  variable: %12.6f %12.6f",
                    (.5 + 1e6*x), floor(.5 + 1e6*x));
  writefln("Using double variable: %12.6f %12.6f",
                    (.5 + 1e6*y), floor(.5 + 1e6*y));
  writefln("Using real   variable: %12.6f %12.6f",
                    (.5 + 1e6*z), floor(.5 + 1e6*z));

  writefln("Using float   literal: %12.6f %12.6f",
                    (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));
  writefln("Using double  literal: %12.6f %12.6f",
                    (.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195));
  writefln("Using real    literal: %12.6f %12.6f",
                    (.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l));


}

----------
I get the following output...
----------
                          Raw          Floor
Using float  variable:    19.999999    19.000000
Using double variable:    20.000000    19.000000
Using real   variable:    20.000000    19.000000
Using float   literal:    19.999999    20.000000
Using double  literal:    20.000000    20.000000
Using real    literal:    20.000000    20.000000

-- 
Derek
Melbourne, Australia
31/03/2005 6:43:48 PM
April 01, 2005
"Derek Parnell" <derek@psych.ward> wrote in message news:7di6xztjokyz.6vnxzcx1d7l8.dlg@40tude.net...
> On Wed, 30 Mar 2005 21:43:07 -0800, Walter wrote:
>
>> "Bob W" <nospam@aol.com> wrote in message news:d2aash$a4s$1@digitaldaemon.com...
>>> The floor() function in D does not produce equivalent
>>> results compared to a bunch of other languages
>>> tested. The other languages were:
>>>
>>>   dmc
>>>   djgpp
>>>   dmdscript
>>>   jscript
>>>   assembler ('87 code)
>>>
>>> The biggest surprise was that neither dmc nor
>>> dmdscript were able to match the D results.
>>>
>>> The sample program below gets an input
>>> from the command line, converts it, multiplies
>>> it with 1e6 and adds 0.5 before calling the
>>> floor() function. The expected result, based on
>>> an input of 0.0000195, would be 20.0, but
>>> D thinks it should be 19.0.
>>>
>>> Since 0.0000195 cannot be represented
>>> accurately in any of the usual floating point
>>> formats, the somewhat unique D result is
>>> probably not even a bug. But it is a major
>>> inconvenience when comparing numerical
>>> outputs produced by different programs.
>>>
>>> So far I was unable to reproduce the rounding
>>> issue in D with any other language tested.
>>> (I have even tried OpenOffice to check.)
>>> Before someone tells me that D uses a
>>> different floating point format, I'd like to
>>> mention that I have used float, double and
>>> long double in the equivalent C programs
>>> without any changes.
>>
>> What you're seeing is the result of using 80 bit precision, which is what
>> D
>> uses in internal calculations. .0000195 is not represented exactly, to
>> print
>> the number it is rounded. So, depending on how many bits of precision
>> there
>> are in the representation, it might be one bit, 63 bits to the right,
>> under
>> "5", so floor() will chop it down.
>>
>> Few C compilers support 80 bit long doubles, they implement them as 64
>> bit
>> ones. Very few programs use 80 bit reals.
>>
>> The std.math.floor function uses 80 bit precision. If you want to use the
>> C
>> 64 bit one instead, add this declaration:
>>
>>     extern (C) double floor(double);
>>
>> Then the results are:
>>
>>           x*1e6:   19.500000
>>      floor(x..):   19.000000
>>   floor(.5+x..):   20.000000
>>   floor(.5+co.):   20.000000
>>
>> I suggest that while it's a reasonable thing to require a minimum number
>> of
>> floating point bits for a computation, it's probably not a good idea to
>> require a maximum.
>
> I can follow what you say, but can you explain the output of the program below? There appears to be a difference in the way variables and literals are treated.
>
> import std.stdio;
> import std.math;
> import std.string;
>
> void main() {
>
>  float  x;
>  double y;
>  real   z;
>
>
>  x = 0.0000195;
>  y = 0.0000195;
>  z = 0.0000195;
>  writefln("                          Raw            Floor");
>  writefln("Using float  variable: %12.6f %12.6f",
>                    (.5 + 1e6*x), floor(.5 + 1e6*x));
>  writefln("Using double variable: %12.6f %12.6f",
>                    (.5 + 1e6*y), floor(.5 + 1e6*y));
>  writefln("Using real   variable: %12.6f %12.6f",
>                    (.5 + 1e6*z), floor(.5 + 1e6*z));
>
>  writefln("Using float   literal: %12.6f %12.6f",
>                    (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));
>  writefln("Using double  literal: %12.6f %12.6f",
>                    (.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195));
>  writefln("Using real    literal: %12.6f %12.6f",
>                    (.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l));
>
>
> }
>
> ----------
> I get the following output...
> ----------
>                          Raw          Floor
> Using float  variable:    19.999999    19.000000
> Using double variable:    20.000000    19.000000
> Using real   variable:    20.000000    19.000000
> Using float   literal:    19.999999    20.000000
> Using double  literal:    20.000000    20.000000
> Using real    literal:    20.000000    20.000000
>
> -- 
> Derek
> Melbourne, Australia
> 31/03/2005 6:43:48 PM



Great job! I could not believe it first:

    writefln("Using float   literal: %12.6f %12.6f",
        (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));

producing the following output:

   Using float  variable:    19.999999    20.000000


Looks like floor() mutates to ceil() at times. To ensure
that this is not "down under" specific (Melbourne),
I have repeated your test in the northern hemisphere,
and, not surprisingly, it did the same thing. Now
I am pretty curious to know why this is happening.

We'll see if Walter comes up with an answer .....



April 01, 2005
"Walter" <newshound@digitalmars.com> wrote in message news:d2g9jj$8om$1@digitaldaemon.com...
>
> "Bob W" <nospam@aol.com> wrote in message news:d2aash$a4s$1@digitaldaemon.com...
>> The floor() function in D does not produce equivalent
>> results compared to a bunch of other languages
>> tested. The other languages were:
>>
>>   dmc
>>   djgpp
>>   dmdscript
>>   jscript
>>   assembler ('87 code)
>>
>> The biggest surprise was that neither dmc nor
>> dmdscript were able to match the D results.
>>
>> The sample program below gets an input
>> from the command line, converts it, multiplies
>> it with 1e6 and adds 0.5 before calling the
>> floor() function. The expected result, based on
>> an input of 0.0000195, would be 20.0, but
>> D thinks it should be 19.0.
>>
>> Since 0.0000195 cannot be represented
>> accurately in any of the usual floating point
>> formats, the somewhat unique D result is
>> probably not even a bug. But it is a major
>> inconvenience when comparing numerical
>> outputs produced by different programs.
>>
>> So far I was unable to reproduce the rounding
>> issue in D with any other language tested.
>> (I have even tried OpenOffice to check.)
>> Before someone tells me that D uses a
>> different floating point format, I'd like to
>> mention that I have used float, double and
>> long double in the equivalent C programs
>> without any changes.
>
> What you're seeing is the result of using 80 bit precision, which is what
> D
> uses in internal calculations. .0000195 is not represented exactly, to
> print
> the number it is rounded. So, depending on how many bits of precision
> there
> are in the representation, it might be one bit, 63 bits to the right,
> under
> "5", so floor() will chop it down.
>
> Few C compilers support 80 bit long doubles, they implement them as 64 bit ones. Very few programs use 80 bit reals.
>
> The std.math.floor function uses 80 bit precision. If you want to use the
> C
> 64 bit one instead, add this declaration:
>
>    extern (C) double floor(double);
>
> Then the results are:
>
>          x*1e6:   19.500000
>     floor(x..):   19.000000
>  floor(.5+x..):   20.000000
>  floor(.5+co.):   20.000000
>
> I suggest that while it's a reasonable thing to require a minimum number
> of
> floating point bits for a computation, it's probably not a good idea to
> require a maximum.
>




Thank you for your information, Walter.

However, I am not convinced that the culprit ist the
80-bit floating point format. This is due to some tests
I have made programming the FPU directly.

Based on my above stated example, the 80 bit format
is perfectly capable to generate the 'mainstream result'
of 20 as opposed to the lone 19 which D is producing.


Some more info, which might lead to the real problem:

- D is not entirely 80-bit based as claimed.

- Literals are converted to 64 bit first (and from there
  to 80 bits) at compile time if no suffix is used, even
  if the target is of type 'real'.

- atof() for example is returning a 'real' value which is
  obviously derived from a 'double', thus missing some
  essential bits at the end.


Example:

The hex value for 0.0000195 in 'real' can be expressed as
       3fef a393ee5e edcc20d5
or
       3fef a393ee5e edcc20d6
(due to the non-decimal fraction).

The same value converted from a 'double' would be
       3fef a393ee5e edcc2000
and therefore misses several trailing bits. This could
cause the floor() function to misbehave.


I hope this info was somewhat useful.

Cheers.



April 01, 2005
"Bob W" <nospam@aol.com> wrote in message news:d2i3et$27dg$1@digitaldaemon.com...
> Great job! I could not believe it first:
>
>     writefln("Using float   literal: %12.6f %12.6f",
>         (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));
>
> producing the following output:
>
>    Using float  variable:    19.999999    20.000000
>
> We'll see if Walter comes up with an answer .....

I suggest in general viewing how these things work (floating, chopping,
rounding, precision, etc.) is to print things using the %a format (which
prints out ALL the bits in hexadecimal format).

As to the specific case above, let's break down each (using suffix 'd' to
represent double):

(.5 + 1e6*0.0000195f) => (.5d + 1e6d * cast(double)0.0000195f), result is
double
floor(.5 + 1e6*0.0000195f)) => floor(cast(real)(.5d + 1e6d *
cast(double)0.0000195f)), result is real

When writef prints a real, it adds ".5" to the last signficant decimal digit and chops. This will give DIFFERENT results for a double and for a real. It's also DIFFERENT from the binary rounding that goes on in intermediate floating point calculations, which adds "half a bit" (not .5) and chops. Also, realize that internally to the FPU, a "guard bit" and a "sticky bit" are maintained for a floating point value, these influence rounding, and are discarded when a value leaves the FPU and is written to memory.

What is happening here is that you start with a value that is not exactly representable, then putting it through a series of precision changes and roundings, and comparing it with the result of a different series of precision changes and roundings, and expecting the results to match bit for bit. There's no way to make that happen.


April 01, 2005
"Bob W" <nospam@aol.com> wrote in message news:d2ieh5$2ksl$1@digitaldaemon.com...
> - D is not entirely 80-bit based as claimed.

Not true, it fully supports 80 bits.

> - Literals are converted to 64 bit first (and from there
>   to 80 bits) at compile time if no suffix is used, even
>   if the target is of type 'real'.

Incorrect. You can see for yourself in lexer.c. Do a grep for "strtold".

> - atof() for example is returning a 'real' value which is
>   obviously derived from a 'double', thus missing some
>   essential bits at the end.

Check out std.math2.atof(). It's fully 80 bit.

> Example:
>
> The hex value for 0.0000195 in 'real' can be expressed as
>        3fef a393ee5e edcc20d5
> or
>        3fef a393ee5e edcc20d6
> (due to the non-decimal fraction).
>
> The same value converted from a 'double' would be
>        3fef a393ee5e edcc2000
> and therefore misses several trailing bits. This could
> cause the floor() function to misbehave.
>
>
> I hope this info was somewhat useful.

Perhaps the following program will help:

import std.stdio;

void main()
{
    writefln("float  %a", 0.0000195F);
    writefln("double %a", 0.0000195);
    writefln("real   %a", 0.0000195L);

    writefln("cast(real)float  %a", cast(real)0.0000195F);
    writefln("cast(real)double %a", cast(real)0.0000195);
    writefln("cast(real)real   %a", cast(real)0.0000195L);

    writefln("float  %a", 0.0000195F * 7 - 195);
    writefln("double %a", 0.0000195  * 7 - 195);
    writefln("real   %a", 0.0000195L * 7 - 195);
}



float  0x1.4727dcp-16
double 0x1.4727dcbddb984p-16
real   0x1.4727dcbddb9841acp-16
cast(real)float  0x1.4727dcp-16
cast(real)double 0x1.4727dcbddb984p-16
cast(real)real   0x1.4727dcbddb9841acp-16
float  -0x1.85ffeep+7
double -0x1.85ffee1bd1edap+7
real   -0x1.85ffee1bd1ed9dfep+7


April 02, 2005
On Fri, 1 Apr 2005 15:03:02 -0800, Walter wrote:

> "Bob W" <nospam@aol.com> wrote in message news:d2ieh5$2ksl$1@digitaldaemon.com...
>> - D is not entirely 80-bit based as claimed.
> 
> Not true, it fully supports 80 bits.
> 
>> - Literals are converted to 64 bit first (and from there
>>   to 80 bits) at compile time if no suffix is used, even
>>   if the target is of type 'real'.
> 
> Incorrect. You can see for yourself in lexer.c. Do a grep for "strtold".
> 
>> - atof() for example is returning a 'real' value which is
>>   obviously derived from a 'double', thus missing some
>>   essential bits at the end.
> 
> Check out std.math2.atof(). It's fully 80 bit.
> 
>> Example:
>>
>> The hex value for 0.0000195 in 'real' can be expressed as
>>        3fef a393ee5e edcc20d5
>> or
>>        3fef a393ee5e edcc20d6
>> (due to the non-decimal fraction).
>>
>> The same value converted from a 'double' would be
>>        3fef a393ee5e edcc2000
>> and therefore misses several trailing bits. This could
>> cause the floor() function to misbehave.
>>
>>
>> I hope this info was somewhat useful.
> 
> Perhaps the following program will help:
> 
> import std.stdio;
> 
> void main()
> {
>     writefln("float  %a", 0.0000195F);
>     writefln("double %a", 0.0000195);
>     writefln("real   %a", 0.0000195L);
> 
>     writefln("cast(real)float  %a", cast(real)0.0000195F);
>     writefln("cast(real)double %a", cast(real)0.0000195);
>     writefln("cast(real)real   %a", cast(real)0.0000195L);
> 
>     writefln("float  %a", 0.0000195F * 7 - 195);
>     writefln("double %a", 0.0000195  * 7 - 195);
>     writefln("real   %a", 0.0000195L * 7 - 195);
> }
> 
> 
> 
> float  0x1.4727dcp-16
> double 0x1.4727dcbddb984p-16
> real   0x1.4727dcbddb9841acp-16
> cast(real)float  0x1.4727dcp-16
> cast(real)double 0x1.4727dcbddb984p-16
> cast(real)real   0x1.4727dcbddb9841acp-16
> float  -0x1.85ffeep+7
> double -0x1.85ffee1bd1edap+7
> real   -0x1.85ffee1bd1ed9dfep+7

I repeat, (I think) I understand what you are saying but can you explain
the output of this ...
<code>
import std.stdio;
import std.math;
import std.string;

void main() {

  float  x;
  double y;
  real   z;


  x = 0.0000195;
  y = 0.0000195;
  z = 0.0000195;
  writefln("                       %24s %24s","Raw","Floor");
  writefln("Using float  variable: %24a %24a",
                    (.5 + 1e6*x), floor(.5 + 1e6*x));
  writefln("Using double variable: %24a %24a",
                    (.5 + 1e6*y), floor(.5 + 1e6*y));
  writefln("Using real   variable: %24a %24a",
                    (.5 + 1e6*z), floor(.5 + 1e6*z));

  writefln("Using float   literal: %24a %24a",
                    (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));
  writefln("Using double  literal: %24a %24a",
                    (.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195));
  writefln("Using real    literal: %24a %24a",
                    (.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l));


}
</code>

______________
Output is ...

                                            Raw                    Floor
Using float  variable:         0x1.3fffff4afp+4                 0x1.3p+4
Using double variable:                 0x1.4p+4                 0x1.3p+4
Using real   variable:  0x1.3ffffffffffffe68p+4                 0x1.3p+4
Using float   literal:         0x1.3fffff4afp+4                 0x1.4p+4
Using double  literal:                 0x1.4p+4                 0x1.4p+4
Using real    literal:  0x1.4000000000000002p+4                 0x1.4p+4

There seems to be different treatment of literals and variables.

Even apart from that, given the values above, I can understand the floor
behaviour except for lines 2(double variable)  and 6 (real literal).

-- 
Derek Parnell
Melbourne, Australia
2/04/2005 10:19:43 AM
April 02, 2005
"Walter" <newshound@digitalmars.com> wrote in message news:d2kk71$1pnl$1@digitaldaemon.com...
>
> "Bob W" <nospam@aol.com> wrote in message news:d2ieh5$2ksl$1@digitaldaemon.com...
>> - D is not entirely 80-bit based as claimed.
>
>

I still don't buy that.
Example: std.string.atof() as mentioned below.


>> - Literals are converted to 64 bit first (and from there
>>   to 80 bits) at compile time if no suffix is used, even
>>   if the target is of type 'real'.
>
> Incorrect. You can see for yourself in lexer.c. Do a grep for "strtold".
>

Maybe there is a misunderstanding:

I just wanted to mention that although it is claimed that
the default internal FP format is 80 bits, the default
floating point format for literals is double. The lexer,
(at least to my understanding) seems to confirm this.
Therefore, if someone does not want to experience a
loss in precision, he ALWAYS needs to use the L suffix
for literals, otherwise he gets a real which was converted
from a double.

e.g.:

real r1=1.2L;  // this one is ok thanks to the suffix
real r2=1.2;   // loss in precision, double convt'd to real



>> - atof() for example is returning a 'real' value which is
>>   obviously derived from a 'double', thus missing some
>>   essential bits at the end.
>
> Check out std.math2.atof(). It's fully 80 bit.
>

This one yes, but not the official Phobos
version std.string.atof() which I have used.
Phobos docs suggest that atof() can be found in

  1) std.math (n/a)
  2) std.string

Since I have not found any atof() function in std.math
and std.math2 is not even mentioned in the Phobos docs,
I've got it from std.string AND THIS ONE IS 64 BIT!

--------- quote from "c.stdlib.d" ---------
double atof(char *);
--------------- unquote -------------------


--------- quote from "string.d" ----------
real atof(char[] s)
{
    // BUG: should implement atold()
    return std.c.stdlib.atof(toStringz(s));
}
--------------- unquote -------------------

Due to heavy workload this issue might have
been overlooked. Luckily I do not even have
to mention the word "BUG", this was aparently
already done in the author's comment line.  : )

After searching the archives it looks like
someone was already troubled by the multiple
appearance of atof() in Nov 2004:

http://www.digitalmars.com/d/archives/digitalmars/D/bugs/2196.html



>> Example:
>>
>> The hex value for 0.0000195 in 'real' can be expressed as
>>        3fef a393ee5e edcc20d5
>> or
>>        3fef a393ee5e edcc20d6
>> (due to the non-decimal fraction).
>>
>> The same value converted from a 'double' would be
>>        3fef a393ee5e edcc2000
>> and therefore misses several trailing bits. This could
>> cause the floor() function to misbehave.
>>
>>
>> I hope this info was somewhat useful.
>
> Perhaps the following program will help:
>
> import std.stdio;
>
> void main()
> {
>    writefln("float  %a", 0.0000195F);
>    writefln("double %a", 0.0000195);
>    writefln("real   %a", 0.0000195L);
>
>    writefln("cast(real)float  %a", cast(real)0.0000195F);
>    writefln("cast(real)double %a", cast(real)0.0000195);
>    writefln("cast(real)real   %a", cast(real)0.0000195L);
>
>    writefln("float  %a", 0.0000195F * 7 - 195);
>    writefln("double %a", 0.0000195  * 7 - 195);
>    writefln("real   %a", 0.0000195L * 7 - 195);
> }
>
>
> float  0x1.4727dcp-16
> double 0x1.4727dcbddb984p-16
> real   0x1.4727dcbddb9841acp-16
> cast(real)float  0x1.4727dcp-16
> cast(real)double 0x1.4727dcbddb984p-16
> cast(real)real   0x1.4727dcbddb9841acp-16
> float  -0x1.85ffeep+7
> double -0x1.85ffee1bd1edap+7
> real   -0x1.85ffee1bd1ed9dfep+7
>


In accordance to what I have mentioned before,
the following program demonstrates the
existence of "truncated" reals:


void main() {
  real r1=1.2L;  // converted directly to 80 bit value
  real r2=1.2;   // parsed to 64b, then convt'd to 80b

  writefln("Genuine  : %a",r1);
  writefln("Truncated: %a",r2);
}


Output (using %a):

Genuine  : 0x1.3333333333333334p+0
Truncated: 0x1.3333333333333p+0


Alternative Output:

Genuine:    1.20000000000000000 [3fff 99999999 9999999a] Truncated:  1.19999999999999996 [3fff 99999999 99999800]



April 02, 2005
"Bob W" <nospam@aol.com> wrote in message news:d2kvcc$22qa$1@digitaldaemon.com...
> I just wanted to mention that although it is claimed that the default internal FP format is 80 bits,

Actually, what is happening is that if you write the expression:

    double a, b, c, d;
    a = b + c + d;

then the intermediate values generated by b+c+d are allowed (but not required) to be evaluated to the largest precision available. This means that it's allowed to evaluate it as:

    a = cast(double)(cast(real)b + cast(real)c + cast(real)d));

but it is not required to evaluate it in that way. This produces a slightly different result than:

    double t;
    t = b + c;
    a = t + d;

The latter is the way Java is specified to work, which turns out to be both numerically inferior and *slower* on the x86 FPU. The x86 FPU *wants* to evaluate things to 80 bits.

The D compiler's internal paths fully support 80 bit arithmetic, that means there are no surprising "choke points" where it gets truncated to 64 bits. If the type of a literal is specified to be 'double', which is the case for no suffix, then you get 64 bits of precision. I hope you'll agree that that is the least surprising thing to do.

> > Check out std.math2.atof(). It's fully 80 bit.
> I've got it from std.string AND THIS ONE IS 64 BIT!

True, that's a bug, and I'll fix it


« First   ‹ Prev
1 2 3 4 5