October 31, 2019
On Thu, Oct 31, 2019 at 10:14:59AM +0000, Guillaume Piolat via Digitalmars-d wrote:
> On Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:
> > Replacing snprintf for floating point is very challenging, because:
> > 
> > 1. people have been improving snprintf for decades
> > 2. people expect precision and performance
> > 3. the standard is snprintf, any credible implementation must be the
> > same or better
> 
> Moreover, actual printf implementations seems to depend upon the locale.  This creates bugs (say "1,4" instead of "1.4") so this behaviour depends if you want to be bug-compatible. We've been hit by that in `printed` when used with a Russian locale.

*Is* it a bug, though?  Arguably, the reason snprintf was done that way was precisely to support properly-formatted output in the current locale. I.e., when outputting Russian text, the convention is to write the decimal point with "," rather than ".". It would be considered wrong or strange to write "1.4" instead of "1,4". This is important if you want to support i18n in your program.

But if you're outputting to, say, JSON, then you *don't* ever want "1,4", you only want "1.4".

Which leads me to think that these two should be separate format specifiers. Unfortunately, I can see how this would force format() to be impure, because to support checking the current locale implies accessing global state, which is impure.


T

-- 
I'm still trying to find a pun for "punishment"...
October 31, 2019
On 10/31/2019 1:27 AM, drug wrote:
> In some cases it is much more productive to have text representation of data than binary ones. Initially I believed too that binary representation is the more suitable but afterwards I  was forced to use text format and that gives me a good result.

To get round-trip 100% accuracy, print the floats in hex using the %A format.
October 31, 2019
On Thursday, 31 October 2019 at 20:20:24 UTC, Walter Bright wrote:
> On 10/31/2019 1:27 AM, drug wrote:
>> In some cases it is much more productive to have text representation of data than binary ones. Initially I believed too that binary representation is the more suitable but afterwards I  was forced to use text format and that gives me a good result.
>
> To get round-trip 100% accuracy, print the floats in hex using the %A format.

DtoA is also supposed to have 100% accuracy, when it comes to value, not necessarily to binary representation though.

I'd still prefer grisu2 over ryu, since it easier to understand and I already have a ctfeable version of it. (it can't be safe though since it casts double* to ulong*)
October 31, 2019
On Wednesday, 30 October 2019 at 20:46:07 UTC, Stefan Koch wrote:
> If you could post that so I can have a look over the WIP that'd be nice.

See https://github.com/berni44/phobos/tree/printf

The function can be found at the end of std/format.d. I had to comment out some unittests, because e and g qualifiers are not yet supported. I put several comments in the code, so I hope it's clear, what always happens. If not, feel free to ask. (I'll be offline during the weekend.)

I also added a diagram for speed comparison. See https://github.com/berni44/phobos/blob/printf/diagram.png

Blue and green use "%.10f" while black and red use "%.100f". Blue and red is my function, while green and black is snprintf. The X-axis gives the value in the exponent from 0 to 255, the y-axis gives the average time in nanoseconds. The green bottom line at the left is approx at 600ns. For each exponent there have been approx 217886 numbers checked (the same set for both functions).

As you can see, at the left side, snprintf is faster, having an almost constant time, while the time of mine is slightly increasing when exponents get smaller. I scanned the snprintf implementation to find out, what they do - see my comment in the implementation for details.

October 31, 2019
On Wednesday, 30 October 2019 at 20:29:34 UTC, Rumbu wrote:
> In this case rounding becomes a question of how do you interpret the remainder of a division by a power of ten.

Unfortunately not. Think of 0.1500000000000001 rounded to one digit. It's clear, that a reminder of 0-4 is rounded down and of 6-9 is rounded up. But to decide in the case of a 5 you might need to look at the next digits if rounding mode tells you to round down in the case of 0.5...

October 31, 2019
On Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:
> To that end, you'll need to be familiar with the following:

Thanks for that list. I'll have a look, when I find the time to do so.

> 754-2019 IEEE Standard for Floating Point Arithmetic
> https://ieeexplore.ieee.org/document/8766229

Unfortunately I cannot download this file. I've got no company listed there and I'm not willing to pay for it...

> Ryu Fast Float To String Conversion
> https://dl.acm.org/citation.cfm?id=3192369
>
> https://github.com/ulfjack/ryu
> [...]
>
> Jonathan Marler's D implementation of ryu:
> https://github.com/dragon-lang/mar/blob/master/src/mar/ryu.d

I allready read the paper about ryu. IMHO it's of no use here, because the speed advantage comes from being more "inaccurate" than snprintf. Ryu is designed for a round-trip, while snprintf prints as many digits, as the user wants to get (even when they contain no more information). The same holds for grisu variants.

October 31, 2019
On Thu, Oct 31, 2019 at 09:04:49PM +0000, Stefan Koch via Digitalmars-d wrote: [...]
> I'd still prefer grisu2 over ryu, since it easier to understand and I already have a ctfeable version of it.

Meybe we should be using your implementation then?  No need to duplicate work if it's already been done.


> (it can't be safe though since it casts double* to ulong*)

But surely it can be @trusted?


T

-- 
Your inconsistency is the only consistent thing about you! -- KD
October 31, 2019
On Thursday, 31 October 2019 09:58:08 MDT H. S. Teoh via Digitalmars-d wrote:
> On Thu, Oct 31, 2019 at 10:14:59AM +0000, Guillaume Piolat via
Digitalmars-d wrote:
> > On Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:
> > > Replacing snprintf for floating point is very challenging, because:
> > >
> > > 1. people have been improving snprintf for decades
> > > 2. people expect precision and performance
> > > 3. the standard is snprintf, any credible implementation must be the
> > > same or better
> >
> > Moreover, actual printf implementations seems to depend upon the locale.  This creates bugs (say "1,4" instead of "1.4") so this behaviour depends if you want to be bug-compatible. We've been hit by that in `printed` when used with a Russian locale.
>
> *Is* it a bug, though?  Arguably, the reason snprintf was done that way was precisely to support properly-formatted output in the current locale. I.e., when outputting Russian text, the convention is to write the decimal point with "," rather than ".". It would be considered wrong or strange to write "1.4" instead of "1,4". This is important if you want to support i18n in your program.
>
> But if you're outputting to, say, JSON, then you *don't* ever want "1,4", you only want "1.4".
>
> Which leads me to think that these two should be separate format specifiers. Unfortunately, I can see how this would force format() to be impure, because to support checking the current locale implies accessing global state, which is impure.

The version of format that takes the format specifier as a compile-time argument shouldn't have that problem, but the one that took it as a runtime argument certainly would.

- Jonathan M Davis



November 01, 2019
On Thursday, 31 October 2019 at 15:58:08 UTC, H. S. Teoh wrote:

> Which leads me to think that these two should be separate format specifiers.

I would put the localization in a completely different function.

> Unfortunately, I can see how this would force format() to be impure, because to support checking the current locale implies accessing global state, which is impure.

You could pass in the locale to the function, then it can be pure. Even more reason to have it as a separate function. I would say that should be best practice because you might want to run a program in a different locale than the global configured one.

I'm not sure if it's enough to look at the locale. On my computer (a Mac) I have configured it to have the language in English but the date, time, number and currency format to Swedish.

--
/Jacob Carlborg
November 01, 2019
On Fri, Nov 01, 2019 at 01:01:21PM +0000, Jacob Carlborg via Digitalmars-d wrote:
> On Thursday, 31 October 2019 at 15:58:08 UTC, H. S. Teoh wrote:
> 
> > Which leads me to think that these two should be separate format specifiers.
> 
> I would put the localization in a completely different function.

That would be a better solution. It would be different from snprintf, though, and we'd have to document it well so that people can find it.


> > Unfortunately, I can see how this would force format() to be impure, because to support checking the current locale implies accessing global state, which is impure.
> 
> You could pass in the locale to the function, then it can be pure. Even more reason to have it as a separate function. I would say that should be best practice because you might want to run a program in a different locale than the global configured one.

+1.


> I'm not sure if it's enough to look at the locale. On my computer (a Mac) I have configured it to have the language in English but the date, time, number and currency format to Swedish.
[...]

I think it has to do with the LC_* environment variables, at least on a *nix system. You can set LC_ALL to get the same settings across all categories, or you can separately set one or more of the LC_* to get different settings in each category. (Caveat: I've never actually done this myself before, so I could be misunderstanding how it works.)


T

-- 
Famous last words: I wonder what will happen if I do *this*...