January 22, 2021
On Friday, 22 January 2021 at 18:01:41 UTC, Paul Backus wrote:
> On Friday, 22 January 2021 at 16:42:20 UTC, Imperatorn wrote:
>>
>> Interesting, doesn't seem to be ryu though
>
> From the linked PDF [1]:
>
>> Ryu and other fast algorithms have been rejected, because they cannot be used for printf-like functions due to a different design goal.
>
> According to the Ryu README on Github [2], "Ryu generates the shortest decimal representation of a floating point number that maintains round-trip safety." For printf, the user is allowed to specify things like the precision, the field width, and whether leading zeros should be used for padding, so a more flexible algorithm is needed.
>
> [1] https://raw.githubusercontent.com/berni44/printFloat/master/printFloat.pdf
> [2] https://github.com/ulfjack/ryu

People have been suing variation of this for printf. But yes, it is not suffisient by itself.
January 23, 2021
On Friday, 22 January 2021 at 21:37:08 UTC, deadalnix wrote:
> People have been suing variation of this for printf. But yes, it is not suffisient by itself.

Related paper: https://dl.acm.org/doi/pdf/10.1145/3360595
January 24, 2021
On 1/22/2021 1:35 PM, deadalnix wrote:
> I looked. i can tell you it is not ryu. I would need to dive much deeper into it to be confident to include this.

One way is to randomly generate floating point bit patterns and compare the output with that of print formatting on C.
January 24, 2021
On 1/22/2021 5:11 AM, deadalnix wrote:
> ryu pushes thing forward significantly, and proved it was correct to do so. I can't stress enough the importance of that last part, because there are no way you can check all possible double values by yourself and see if the output is correct.

True, but you *can* do it for float values, and the double algorithm is just more bits.
January 25, 2021
On Monday, 25 January 2021 at 04:27:49 UTC, Walter Bright wrote:
> On 1/22/2021 1:35 PM, deadalnix wrote:
>> I looked. i can tell you it is not ryu. I would need to dive much deeper into it to be confident to include this.
>
> One way is to randomly generate floating point bit patterns and compare the output with that of print formatting on C.

You only need to test the boundary cases, so one can do this, no problem.

January 25, 2021
On Monday, 25 January 2021 at 09:17:25 UTC, Ola Fosheim Grøstad wrote:
> On Monday, 25 January 2021 at 04:27:49 UTC, Walter Bright wrote:
>> On 1/22/2021 1:35 PM, deadalnix wrote:
>>> I looked. i can tell you it is not ryu. I would need to dive much deeper into it to be confident to include this.
>>
>> One way is to randomly generate floating point bit patterns and compare the output with that of print formatting on C.
>
> You only need to test the boundary cases, so one can do this, no problem.

My guess is that knowing what the actual boundary cases are is equivalent to having solved the original problem.  If I'm wrong, if it really is as you say "no problem", then the RYU author sure wasted a lot of time on the proof in his PLDI 2018 paper.

I also note that RYU's printf incompatibilities mentioned earlier in this thread seem to have been addressed.  A recent variant of RYU reportedly accommodates printf.

January 26, 2021
Being the author of the PR mentioned here several times, I feel somewhat obliged to write something too.

On Saturday, 23 January 2021 at 00:13:16 UTC, deadalnix wrote:
> Related paper: https://dl.acm.org/doi/pdf/10.1145/3360595

That's great. I'll have a deeper look into it, whether there is something, that can be used for our implementation of printf.

From first glance: It overcomes some of the speed problems I encountered and where I currently am trying to find something better. The drawback of RYU printf is the large amount of memory needed for the tables involved. The paper states 104KB for doubles (can be compressed on the expense of slowing everthing down). For reals it will be even more. We should make sure, that we do not blow up the executables more than needed. We will probably eventually have to decide between speed and code size - or we'll have to find some sort of a compromise.

On Monday, 25 January 2021 at 04:27:49 UTC, Walter Bright wrote:
> One way is to randomly generate floating point bit patterns and compare the output with that of print formatting on C.

The last commit of my PR adds a test, that is doing exactly this. I would be happy, if people could run that test and report differences. (There will be some, because the snprintf implementations are known to contain bugs - see my comments above the test.)

To do so, check out the PR, uncomment the line "version = printFloatTest;" and run the unittests of std.format. The test lasts for 30 minutes (you can configure this by changing the value of "duration"). After that it prints some summary. (Feel free to change the test to make it meet your needs.)

On Monday, 25 January 2021 at 09:17:25 UTC, Ola Fosheim Grøstad wrote:
> You only need to test the boundary cases, so one can do this, no problem.

I fear, with this approach you'll have a hard time: There are so many boundary cases, it's easy to miss some. - You need to test all combinations of five flags (-, #, +, space, 0), width, precision, rounding mode, different algorithms depending on the size of the exponent, nans, infs, subnormals, zero, float/double/real(s) and probably more.

On Monday, 25 January 2021 at 21:40:01 UTC, Bruce Carneal wrote:
> If I'm  wrong, if it really is as you say "no problem", then the RYU author sure wasted a lot of time on the proof in his PLDI 2018 paper.

There is a subtle, but important difference between proving correctness for RYU and printf-implementations: For RYU, the theorem to prove is simple to state: Find one shortest representation of a number, that works in round robin.

But what would be the equivalent theorem for printf? Produce the same result, that snprintf produces? But which implementation of snprintf? And what exactly does snprintf produce? Can you write that down? And wouldn't it make more sense to have an implementation that fixes the known bugs in snprintf instead of copying them? (For example, I read about an implementation where snprintf("%.2f",double.inf) produces "#J" instead of the expected "inf". Not only adding "#", but also the "I" of inf has been rounded up to "J"... *lol*, or printing different results for identical numbers, just because one is float and the other is double.)

I'm not completely sure, I have not found the time to read the paper about RYU printf carefully, so I might be wrong, but from first glance I think they only prove, that the algorithm produces the correct digits. It ignores all the flags and stuff which make up about 90% of the PR.

The heart of the PR (which could be replaced by RYU printf) is this (version for numbers, that have no fractional part - see below for fractions):

> int msu = 0;
> while (msu < count - 1 || mybig[$-1] != 0)
> {
>     ulong mod = 0;
>     foreach (i;msu .. count)
>     {
>         mybig[i] |= mod << 60;  // (a)
>         mod = mybig[i] % 10;    // (b)
>         mybig[i] /= 10;         // (c)
>     }
>     if (mybig[msu] == 0)
>         ++msu;
>
>     buffer[--left] = cast(byte) ('0'+mod);
> }

It's basically: Divide by 10 and print the reminder until you reach zero (printing from right to left). The algorithm is a little bit more complicated, because the numbers involved are up to 1000 digits long and cannot be saved anymore in a single ulong.

Going a little bit more into details here: The number which should be print is given in mybig[] as an integer (similar to the implementation of BigInt). The outer loop produces one decimal digit every time it is executed - the digit is saved in buffer (last line).

The inner loop is a division by 10 for big numbers. It is quite similar to a long division, like we learned it in school (here division by 3):

736 : 3 = 245 reminder 1.
6
-
13
12
--
 16
 15
 --
  1

First, we divide (7 / 3 = 2 reminder 1) and add another digit to the reminder (1 => 13) and so on. The same is done in the algorithm: (a) adds another digit, (b) calculates the reminder and (c) divides by 10. (If you worry about the reordering of the steps: This has the benefit, that the loop is without branch and therefore faster; in the first step a zero is added, which doesn't harm.). Finally, the last reminder is the next digit. (And the if below the inner loop is for speed reasons.)

As for fractional digits: Here a similar algorithm is used, but this time the number is multiplied each round by 10 and the overflow leads to the next digit.


Don't know, if this will comfort you or not, but: This algorithm *is* already part of Phobos since 7th of Februar 2020 and hence in the stable version since v2.091. It's already used for printing floats with %e. The PR is about adding it for %f too.
January 26, 2021
On Monday, 25 January 2021 at 21:40:01 UTC, Bruce Carneal wrote:
> If I'm wrong, if it really is as you say "no problem", then the RYU author sure wasted a lot of time on the proof in his PLDI 2018 paper.

Why is that? If the proof is for the algorithm then it has nothing to do with proving your implementation to be correct.

January 26, 2021
On Tuesday, 26 January 2021 at 16:31:35 UTC, Ola Fosheim Grøstad wrote:
> On Monday, 25 January 2021 at 21:40:01 UTC, Bruce Carneal wrote:
>> If I'm wrong, if it really is as you say "no problem", then the RYU author sure wasted a lot of time on the proof in his PLDI 2018 paper.
>
> Why is that? If the proof is for the algorithm then it has nothing to do with proving your implementation to be correct.

If you can construct a tractable exhaustive proof of an implementation, which is what I believe you asserted in this case, then you don't need any other proof.

I'm very happy to see the 32 bit floats done exhaustively BTW.  It is very suggestive, but not definitive if I understand the problem correctly.

The mapping from the domain to the co-domain is the entire problem IIUC so asserting that certain ranges within the domain can be skipped is tantamount to saying that you know the implementation performs correctly in those regions.  What am I missing?

January 26, 2021
On Tuesday, 26 January 2021 at 19:02:53 UTC, Bruce Carneal wrote:
> On Tuesday, 26 January 2021 at 16:31:35 UTC, Ola Fosheim Grøstad wrote:
>> On Monday, 25 January 2021 at 21:40:01 UTC, Bruce Carneal wrote:
>>> If I'm wrong, if it really is as you say "no problem", then the RYU author sure wasted a lot of time on the proof in his PLDI 2018 paper.
>>
>> Why is that? If the proof is for the algorithm then it has nothing to do with proving your implementation to be correct.
>
> If you can construct a tractable exhaustive proof of an implementation, which is what I believe you asserted in this case, then you don't need any other proof.
>
> I'm very happy to see the 32 bit floats done exhaustively BTW.  It is very suggestive, but not definitive if I understand the problem correctly.
>
> The mapping from the domain to the co-domain is the entire problem IIUC so asserting that certain ranges within the domain can be skipped is tantamount to saying that you know the implementation performs correctly in those regions.  What am I missing?

You can probably inductively prove - or at least become quite certain - that your implementation is identical to the original algorithm within a given set of inputs, given one good result within that set. What those sets look like depends on the algorithm of course.

E.g. given an algorithm for a function with type integer -> integer, if you can prove your implementation of an algorithm is linear w.r.t. its input and you can demonstrate it's correct for 1, you can then infer it will be correct for any other integer input. I think? Off the top of my head.

Of course <insert interesting algorithm here> isn't linear w.r.t. its inputs, but maybe it is within some fixed subset of its inputs, so you only have to test one of those to cover the whole set. Also, linearity isn't the only property one can consider.