May 24, 2019
On Friday, 24 May 2019 at 18:39:41 UTC, Walter Bright wrote:
> On 5/24/2019 8:35 AM, Jacob Carlborg wrote:
>> This is kind of nice, but I would prefer to have a complete implementation written in D (of sprintf) that is @nogc @safe nothrow and pure. To avoid having to add various hacks to apply these attributes.
>
> C's sprintf is already @nogc nothrow and pure. Doing our own is not that easy, in particular, the floating point formatting is a fair amount of tricky work.

It took me about an hour to port this "float to string" implementation to D:

https://github.com/ulfjack/ryu
https://github.com/dragon-lang/mar/blob/master/src/mar/ryu.d

You can use `floatToString` to print a default-formatted float, or you can add your own formats by calling `f2d` which gives you the exponent and mantissa.

I only added support for 32-bit floats though.  Will add support for more when I need it.

>
> Besides, this is a few lines of code, and would fit in fine with betterC.

True

>
>
>> I can also add that there was this guy at DConf that said that if a D string should be passed to a C library it should manually pass the pointer and length separately without any magic ;)
>
> That wouldn't work with %.*s because the .length argument must be cast to int.

Not sure if you'll find it helpful, but I wrote my own "print" framework in my library that's meant to be usable in -betterC and with/without druntime/phobos.

https://github.com/dragon-lang/mar/blob/master/Print.md
https://github.com/dragon-lang/mar/tree/master/src/mar/print

It doesn't use format strings, instead, allows you to return a struct with a "print" function, i.e.

import mar.print;

int a = 42;
sprint("a is: ", a);
sprint("a in hex is: 0x", a.formatHex);

struct Point
{
    int x;
    int y;
    auto print(P)(P printer) const
    {
        return printArgs(printer, x, ',', y);
    }
}

sprint("point is ", Point(1, 2));


May 24, 2019
On 5/24/2019 12:15 PM, Jacob Carlborg wrote:
> On 2019-05-24 20:39, Walter Bright wrote:
> 
>> C's sprintf is already @nogc nothrow and pure.
> 
> Technically it's not pure because it access `errno`, that's what I meant with "various hacks".

The C standard doesn't say printf can set errno. Be that as it may, I did find one printf that did:

"If a multibyte character encoding error occurs while writing wide characters, errno is set to EILSEQ and a negative number is returned."

http://www.cplusplus.com/reference/cstdio/printf/

It's pure if not sending it malformed UTF.


>> Doing our own is not that easy, in particular, the floating point formatting is a fair amount of tricky work.
> Stefan Koch has an implementation for that [3], even works at CTFE. Not sure if it's compatible with the C implementation though.

I have one, too, the DMC++ one, though it doesn't do the fp formatting exactly right. I infer Stefan's doesn't, either, simply because his test suite spans lines 574-583 and is completely inadequate.

You can get an idea of what is required by reading:

https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf


>> That wouldn't work with %.*s because the .length argument must be cast to int.
> Of course it works. The DMD code base is littered with calls to printf with D strings the manually way of passing the pointer and length separately, including the casting.

The compiler doesn't know to do the cast when passing `string` arguments by .ptr/.length.
May 24, 2019
On 5/24/2019 2:07 PM, Jonathan Marler wrote:
> It took me about an hour to port this "float to string" implementation to D:
> 
> https://github.com/ulfjack/ryu
> https://github.com/dragon-lang/mar/blob/master/src/mar/ryu.d

https://github.com/ulfjack/ryu says: "The Java implementation differs from the output of Double.toString in some cases: sometimes the output is shorter (which is arguably more accurate) and sometimes the output may differ in the precise digits output" which I find fairly concerning. Please review the paper I linked to in my reply to Jacob.

Floating point formatting is not something that can be knocked out in an hour. You can get a "mostly working" implementation that way, but not a serious, robust, correct implementation with the expected flexibility. (And the test cases to prove it correct.)

The fact that people write academic papers about it should be good evidence.

C's printf has been hammered on by literally generations of programmers over 3 decades. While the interface to it is old-fashioned and unsafe, the guts of it are rock solid, fast, and correct.
May 24, 2019
On Friday, 24 May 2019 at 22:59:10 UTC, Walter Bright wrote:
> On 5/24/2019 2:07 PM, Jonathan Marler wrote:
>> It took me about an hour to port this "float to string" implementation to D:
>> 
>> https://github.com/ulfjack/ryu
>> https://github.com/dragon-lang/mar/blob/master/src/mar/ryu.d
>
> https://github.com/ulfjack/ryu says: "The Java implementation differs from the output of Double.toString in some cases: sometimes the output is shorter (which is arguably more accurate) and sometimes the output may differ in the precise digits output" which I find fairly concerning. Please review the paper I linked to in my reply to Jacob.
>
> Floating point formatting is not something that can be knocked out in an hour. You can get a "mostly working" implementation that way, but not a serious, robust, correct implementation with the expected flexibility. (And the test cases to prove it correct.)
>
> The fact that people write academic papers about it should be good evidence.
>
> C's printf has been hammered on by literally generations of programmers over 3 decades. While the interface to it is old-fashioned and unsafe, the guts of it are rock solid, fast, and correct.

I didn't design an implementation in an hour, I just ported one :)

Ulf's algorithm can be implemented in only a few hundred lines and apparently is the fastest implementation to-date that maintains a 100% robust algorithm. At least that what I remember from watching his video.

https://pldi18.sigplan.org/details/pldi-2018-papers/20/Ry-Fast-Float-to-String-Conversion

He explains in the video why this is a hard problem and tries to explain his paper/algorithm.  But's it's very new, only a year old I think.  Cool innovation.
May 24, 2019
On Friday, 24 May 2019 at 22:59:10 UTC, Walter Bright wrote:

> C's printf has been hammered on by literally generations of programmers over 3 decades. While the interface to it is old-fashioned and unsafe, the guts of it are rock solid, fast, and correct.

That may be true, but one problem with `printf` is it is much too large and inefficient for some problem domains [1].

Rust has a more efficient `printf` alternative which is not dependent on a runtime or libc [2].

D could offer a *much* more efficient, pay-for-what-you-use implementation that doesn't require libc, a runtime, etc., like Rust's implementation.  It wouldn't be easy (especially wrt floating point types), but it would be a great benefit to D and its users.  Maybe I'll add it to dlang/projects [3].

There seems to be a perception about C that because it's old and proven, it's magical.  There's nothing `printf` is doing that D can't do better, if someone would just be willing to do the hard work.

Mike

[1] - Minimizing memory use in embedded systems Tip #3 – Don’t use printf() - https://embeddedgurus.com/stack-overflow/tag/printf/
[2] - std.fmt : https://doc.rust-lang.org/std/fmt/
[3] - dlang/projects - https://github.com/dlang/projects


May 25, 2019
On Friday, 24 May 2019 at 23:58:46 UTC, Mike Franklin wrote:
> On Friday, 24 May 2019 at 22:59:10 UTC, Walter Bright wrote:
>
>> [...]
>
> That may be true, but one problem with `printf` is it is much too large and inefficient for some problem domains [1].
>
> [...]

My implementation is "pay for what you use".  A pure D implementation that's also extensible.
May 25, 2019
On Friday, 24 May 2019 at 22:59:10 UTC, Walter Bright wrote:
> https://github.com/ulfjack/ryu says: "The Java implementation differs from the output of Double.toString in some cases: sometimes the output is shorter (which is arguably more accurate) and sometimes the output may differ in the precise digits output" which I find fairly concerning. Please review the paper I linked to in my reply to Jacob.

AFAIK, Ulf Adams is stating that the Java implementation is sloppy (my word).

He states that other implementations provide more digits than is necessary to get an accurate representation.

https://dl.acm.org/citation.cfm?id=3192369

> C's printf has been hammered on by literally generations of programmers over 3 decades. While the interface to it is old-fashioned and unsafe, the guts of it are rock solid, fast, and correct.

Not really. He argues that the C spec isn't clear, so he follows a more stringent criteria than C printf.


Burger and Dybvig found errors in implementations from DEC, HP and SGI:
https://www.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf


Others claim to find roundoff errors in a common printf implementations e.g.:«The implementation that ships with Microsoft Visual C++ 2010 Express sometimes has an off-by-one error when rounding to the closest value.»

http://www.ryanjuckett.com/programming/printing-floating-point-numbers/

 (I haven't checked the claim, but it would not surprise me).


It is clear that not using C standard lib will bring more consistent and portable results across platforms, even the C version is correct as the C-standard leaves wiggle room.  This can be important in scientific computing when comparing results from various platforms.

May 25, 2019
On Friday, 24 May 2019 at 22:59:10 UTC, Walter Bright wrote:
> https://github.com/ulfjack/ryu says: "The Java implementation differs from the output of Double.toString in some cases: sometimes the output is shorter (which is arguably more accurate) and sometimes the output may differ in the precise digits output" which I find fairly concerning. Please review the paper I linked to in my reply to Jacob.

AFAIK, Ulf Adams is stating that the Java specification is unclear, so it is up for debate a to whether the Java implementation is wrong or whether the spec should be reviewed.

He also states that other implementations provide more digits than is necessary to get an accurate representation.

https://dl.acm.org/citation.cfm?id=3192369

> C's printf has been hammered on by literally generations of programmers over 3 decades. While the interface to it is old-fashioned and unsafe, the guts of it are rock solid, fast, and correct.

Not really. He argues that the C spec isn't clear, so he follows a more stringent criteria than C printf.


Burger and Dybvig found errors in implementations from DEC, HP and SGI:
https://www.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf


Others claim to find roundoff errors in a common printf implementations e.g.:«The implementation that ships with Microsoft Visual C++ 2010 Express sometimes has an off-by-one error when rounding to the closest value.»

http://www.ryanjuckett.com/programming/printing-floating-point-numbers/

 (I haven't checked the claim, but it would not surprise me).


It is clear that not using C standard lib will bring more consistent and portable results across platforms, even if the C version is correct as the C-standard leaves wiggle room.  This can be important in scientific computing when comparing results from various platforms.

May 25, 2019
On Friday, 24 May 2019 at 23:55:13 UTC, Jonathan Marler wrote:
> Ulf's algorithm can be implemented in only a few hundred lines and apparently is the fastest implementation to-date that maintains a 100% robust algorithm.

It is quite interesting that you get that performance without bloat.

I wonder if it is faster than the special cased float implementations. (using an estimator that chooses a faster floating point version where it works).

> But's it's very new, only a year old I think.  Cool innovation.

Yes:

ACM SIGPLAN Notices - PLDI '18
Volume 53 Issue 4, April 2018
Pages 270-282

May 25, 2019
On Friday, 24 May 2019 at 22:59:10 UTC, Walter Bright wrote:
> https://github.com/ulfjack/ryu says: "The Java implementation differs from the output of Double.toString in some cases: sometimes the output is shorter (which is arguably more accurate) and sometimes the output may differ in the precise digits output" which I find fairly concerning. Please review the paper I linked to in my reply to Jacob.

FWIW the Ryu algorithm looks like a serious piece of work — see this paper, which references (and compares in detail) to the paper you linked to:
https://dl.acm.org/citation.cfm?id=3192369

It covers in some detail the rationale for the differences you note.

> Floating point formatting is not something that can be knocked out in an hour. You can get a "mostly working" implementation that way, but not a serious, robust, correct implementation with the expected flexibility. (And the test cases to prove it correct.)

One interesting remark in the paper on the Ryu algorithm: "We did not compare our implementation against the C standard library function printf, as its specification does not include the correctness criteria set forth by Steele and White [15], and, accordingly, neither the glibc nor the MacOS implementation does."