Jump to page: 1 2
Thread overview
[Issue 3248] New: lossless floating point formatting
Aug 12, 2009
moi667@hotmail.com
Aug 12, 2009
Don
Aug 12, 2009
assorted
Aug 13, 2009
Stewart Gordon
Aug 13, 2009
assorted
Aug 15, 2009
Walter Bright
Aug 15, 2009
assorted
Sep 07, 2009
Stewart Gordon
Sep 07, 2009
Don
Sep 07, 2009
Stewart Gordon
Sep 07, 2009
Don
Sep 07, 2009
assorted
August 12, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3248

           Summary: lossless floating point formatting
           Product: D
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: moi667@hotmail.com


Could an option be added to the formatting to elide trailing zero's for %f ? That way it is possible to create an optimal lossless formatting for which the following holds:

float f;
s = format(f);
float f2 = to!(float)(s);
assert(f==f2);

The formatting I'm trying to get can be seen here (decimal): http://www.h-schmidt.net/FloatApplet/IEEE754.html

%g fails to format like this because it uses %f for as small as 10^-5, thus loosing precision for floats with leading zero's, like 0.00001234567.

Fixing this by using %f for 10^-5..10^-1 fails because it doesn't elide trailing zero's making it suboptimal space-wise.

It would be even nicer to have this lossless formatting added to std.format! I would even suggest making this the default formatting for floating point; floating point isn't as straight forward as integral and it is easy to think the current formatting holds all information.

Compared to the hex %a format this new lossless format will be better readable
(less bug-prone) and generally shorter (0.1 will be 0.1).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 12, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3248


Don <clugdbug@yahoo.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |clugdbug@yahoo.com.au




--- Comment #1 from Don <clugdbug@yahoo.com.au>  2009-08-12 12:22:26 PDT ---
It's not that easy, actually. When should it print 0.09999999999999999, and
when should it print 0.1 ? The code to do it correctly is amazingly
complicated.
Just be aware that what you're asking for is much more difficult than you
probably imagine.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 12, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3248





--- Comment #2 from assorted <moi667@hotmail.com>  2009-08-12 15:40:10 PDT ---
(In reply to comment #1)
> It's not that easy, actually. When should it print 0.09999999999999999, and
> when should it print 0.1 ? The code to do it correctly is amazingly
> complicated.
> Just be aware that what you're asking for is much more difficult than you
> probably imagine.

It is less difficult than you imagine :)

Lets take floats:

A float has at most 24bits of precision

2^-24 = 0.000000059604644775390625
2^-23 = 0.00000011920928955078125

to distinguish between these two you only need a precision of 8.

Thus %.8e will always be lossless but isn't always the nicest way of
representation. %g fixes this by using %f if the exponent for an e format is
greater than -5 and less than the precision. The less than precision part is
correct, but the greater than 10^-5 is bad as the precision specifies the
number of digits generated after the decimal point; not excluding leading
zeros.
If %g would be changed to use %f only between 10^-1 and precision that would
solve that problem, if %f were to elide trailing zeros.

Back to the 0.1 question. 0.1 is actually saved as 0.1000000012... Eliding trailing zeros from %f.8 would be sufficient to get 0.1

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 13, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3248


Stewart Gordon <smjg@iname.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |smjg@iname.com




--- Comment #3 from Stewart Gordon <smjg@iname.com>  2009-08-12 18:21:21 PDT ---
I can see a few possible approaches to lossless floating point formatting:

(a) decimal with infinite precision, minus trailing zeros
(b) minimum number of significant figures guaranteed to be unique, minus
trailing zeros
(c) the shortest possible string that, when parsed as a floating point, is
exactly this number

(a) clearly isn't what the reporter is asking for.

(b) seems straightforward.  (Is the number of s.f. in question just the .dig
property?)

(c) is optimal, and could probably be implemented quite simply (not sure whether it would be most efficient though) with the aid of the nextUp and nextDown functions.  This would also address the question in comment 1, though I'm not sure how easy it would be to implement this efficiently.

But (b) and (c) are ambiguous: do we go by uniqueness/exactitude in the real type or in the actual floating point type being used?  I can see that sometimes the app'll know what type it will later be read into, and sometimes it won't.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 13, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3248





--- Comment #4 from assorted <moi667@hotmail.com>  2009-08-12 19:45:28 PDT ---
As far as I understand it, removing trailing zeros from .8 precision and (c)
are the same.
This is because the first (right to left) non-zero you encounter is there
because of 2^x.

I actually used nextUp to test a few ranges of floats :) (I have a not so fast
computer)

I remember .dig being 6 for all floats (could be wrong here, not close to any
dmd.exe)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 13, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3248


Andrei Alexandrescu <andrei@metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrei@metalanguage.com




--- Comment #5 from Andrei Alexandrescu <andrei@metalanguage.com>  2009-08-12 22:43:37 PDT ---
I recommend anyone interested in the subject to peruse the papers:

"How to Read Floating Point Numbers Accurately" ftp://ftp.ccs.neu.edu/pub/people/will/howtoread.ps

and

"Printing Floating-Point Numbers Quickly and Accurately" www.cs.indiana.edu/~burger/FP-Printing-PLDI96.pdf

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 15, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3248


Walter Bright <bugzilla@digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugzilla@digitalmars.com




--- Comment #6 from Walter Bright <bugzilla@digitalmars.com>  2009-08-14 22:47:28 PDT ---
Right, this problem is an old one, and there's no reason to reinvent the wheel. Also, the formatting for them works by simply forwarding the job to the underlying C library. Some C implementations of this are better than others.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 15, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3248





--- Comment #7 from assorted <moi667@hotmail.com>  2009-08-15 09:55:41 PDT ---
Does this mean I can forget about getting this in phobos?
Could then at least an option be added to remove those trailing zeros for %f?
I don't see why %g should be that privileged ;)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 07, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3248



--- Comment #8 from Stewart Gordon <smjg@iname.com> 2009-09-07 02:58:19 PDT ---
(In reply to comment #4)
> As far as I understand it, removing trailing zeros from .8 precision and (c)
> are the same.

I doubt it ... I think the optimal number of decimal s.f. would depend on the binary exponent.  But I'll experiment when I have time.

> I remember .dig being 6 for all floats (could be wrong here, not close to any
> dmd.exe)

The spec describes .dig as "number of decimal digits of precision", which seems ambiguous.  Is it a property of the type or the value?  If it's a type property, is it the maximum number of s.f. that may be required to express a number of the type unambiguously, or the number of s.f. to which numbers are guaranteed to be storeable unambiguously?  If a value property, it is the number of s.f. according to which of the approaches I listed, or something else?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 07, 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3248



--- Comment #9 from Don <clugdbug@yahoo.com.au> 2009-09-07 04:25:35 PDT ---
(In reply to comment #8)
> (In reply to comment #4)
> > As far as I understand it, removing trailing zeros from .8 precision and (c)
> > are the same.
> 
> I doubt it ... I think the optimal number of decimal s.f. would depend on the binary exponent.  But I'll experiment when I have time.

You are correct. Some numbers need an extra digit.

> > I remember .dig being 6 for all floats (could be wrong here, not close to any
> > dmd.exe)
> 
> The spec describes .dig as "number of decimal digits of precision", which seems ambiguous.  Is it a property of the type or the value?

It's a property of the type.

 If it's a type
> property, is it the maximum number of s.f. that may be required to express a number of the type unambiguously, or the number of s.f. to which numbers are guaranteed to be storeable unambiguously?

Neither. It's the number of sic figs which are accurate in the worst case. So it's the _minimum_ number of digits which are stored. To unambiguously define the number, more digits are almost always required.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2