Replacement for snprintf (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Replacement for snprintf (page 3)

October 31, 2019

Re: Replacement for snprintf

Posted by H. S. Teoh
in reply to Guillaume Piolat

H. S. Teoh

Posted in reply to Guillaume Piolat

On Thu, Oct 31, 2019 at 10:14:59AM +0000, Guillaume Piolat via Digitalmars-d wrote:
> On Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:
> > Replacing snprintf for floating point is very challenging, because:
> > 
> > 1. people have been improving snprintf for decades
> > 2. people expect precision and performance
> > 3. the standard is snprintf, any credible implementation must be the
> > same or better
> 
> Moreover, actual printf implementations seems to depend upon the locale.  This creates bugs (say "1,4" instead of "1.4") so this behaviour depends if you want to be bug-compatible. We've been hit by that in `printed` when used with a Russian locale.

*Is* it a bug, though?  Arguably, the reason snprintf was done that way was precisely to support properly-formatted output in the current locale. I.e., when outputting Russian text, the convention is to write the decimal point with "," rather than ".". It would be considered wrong or strange to write "1.4" instead of "1,4". This is important if you want to support i18n in your program.

But if you're outputting to, say, JSON, then you *don't* ever want "1,4", you only want "1.4".

Which leads me to think that these two should be separate format specifiers. Unfortunately, I can see how this would force format() to be impure, because to support checking the current locale implies accessing global state, which is impure.

T

-- 
I'm still trying to find a pun for "punishment"...

October 31, 2019

Re: Replacement for snprintf

Posted by Walter Bright
in reply to drug

Walter Bright

Posted in reply to drug

On 10/31/2019 1:27 AM, drug wrote:
> In some cases it is much more productive to have text representation of data than binary ones. Initially I believed too that binary representation is the more suitable but afterwards I  was forced to use text format and that gives me a good result.

To get round-trip 100% accuracy, print the floats in hex using the %A format.

October 31, 2019

Re: Replacement for snprintf

Posted by Stefan Koch
in reply to Walter Bright

Stefan Koch

Posted in reply to Walter Bright

On Thursday, 31 October 2019 at 20:20:24 UTC, Walter Bright wrote:
> On 10/31/2019 1:27 AM, drug wrote:
>> In some cases it is much more productive to have text representation of data than binary ones. Initially I believed too that binary representation is the more suitable but afterwards I  was forced to use text format and that gives me a good result.
>
> To get round-trip 100% accuracy, print the floats in hex using the %A format.

DtoA is also supposed to have 100% accuracy, when it comes to value, not necessarily to binary representation though.

I'd still prefer grisu2 over ryu, since it easier to understand and I already have a ctfeable version of it. (it can't be safe though since it casts double* to ulong*)

October 31, 2019

Re: Replacement for snprintf

Posted by berni44
in reply to Stefan Koch

berni44

Posted in reply to Stefan Koch

On Wednesday, 30 October 2019 at 20:46:07 UTC, Stefan Koch wrote:
> If you could post that so I can have a look over the WIP that'd be nice.

See https://github.com/berni44/phobos/tree/printf

The function can be found at the end of std/format.d. I had to comment out some unittests, because e and g qualifiers are not yet supported. I put several comments in the code, so I hope it's clear, what always happens. If not, feel free to ask. (I'll be offline during the weekend.)

I also added a diagram for speed comparison. See https://github.com/berni44/phobos/blob/printf/diagram.png

Blue and green use "%.10f" while black and red use "%.100f". Blue and red is my function, while green and black is snprintf. The X-axis gives the value in the exponent from 0 to 255, the y-axis gives the average time in nanoseconds. The green bottom line at the left is approx at 600ns. For each exponent there have been approx 217886 numbers checked (the same set for both functions).

As you can see, at the left side, snprintf is faster, having an almost constant time, while the time of mine is slightly increasing when exponents get smaller. I scanned the snprintf implementation to find out, what they do - see my comment in the implementation for details.

October 31, 2019

Re: Replacement for snprintf

Posted by berni44
in reply to Rumbu

berni44

Posted in reply to Rumbu

On Wednesday, 30 October 2019 at 20:29:34 UTC, Rumbu wrote:
> In this case rounding becomes a question of how do you interpret the remainder of a division by a power of ten.

Unfortunately not. Think of 0.1500000000000001 rounded to one digit. It's clear, that a reminder of 0-4 is rounded down and of 6-9 is rounded up. But to decide in the case of a 5 you might need to look at the next digits if rounding mode tells you to round down in the case of 0.5...

October 31, 2019

Re: Replacement for snprintf

Posted by berni44
in reply to Walter Bright

berni44

Posted in reply to Walter Bright

On Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:
> To that end, you'll need to be familiar with the following:

Thanks for that list. I'll have a look, when I find the time to do so.

> 754-2019 IEEE Standard for Floating Point Arithmetic
> https://ieeexplore.ieee.org/document/8766229

Unfortunately I cannot download this file. I've got no company listed there and I'm not willing to pay for it...

> Ryu Fast Float To String Conversion
> https://dl.acm.org/citation.cfm?id=3192369
>
> https://github.com/ulfjack/ryu
> [...]
>
> Jonathan Marler's D implementation of ryu:
> https://github.com/dragon-lang/mar/blob/master/src/mar/ryu.d

I allready read the paper about ryu. IMHO it's of no use here, because the speed advantage comes from being more "inaccurate" than snprintf. Ryu is designed for a round-trip, while snprintf prints as many digits, as the user wants to get (even when they contain no more information). The same holds for grisu variants.

October 31, 2019

Re: Replacement for snprintf

Posted by H. S. Teoh
in reply to Stefan Koch

H. S. Teoh

Posted in reply to Stefan Koch

On Thu, Oct 31, 2019 at 09:04:49PM +0000, Stefan Koch via Digitalmars-d wrote: [...]
> I'd still prefer grisu2 over ryu, since it easier to understand and I already have a ctfeable version of it.

Meybe we should be using your implementation then?  No need to duplicate work if it's already been done.


> (it can't be safe though since it casts double* to ulong*)

But surely it can be @trusted?


T

-- 
Your inconsistency is the only consistent thing about you! -- KD

October 31, 2019

Re: Replacement for snprintf

Posted by Jonathan M Davis

Jonathan M Davis

On Thursday, 31 October 2019 09:58:08 MDT H. S. Teoh via Digitalmars-d wrote:
> On Thu, Oct 31, 2019 at 10:14:59AM +0000, Guillaume Piolat via
Digitalmars-d wrote:
> > On Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:
> > > Replacing snprintf for floating point is very challenging, because:
> > >
> > > 1. people have been improving snprintf for decades
> > > 2. people expect precision and performance
> > > 3. the standard is snprintf, any credible implementation must be the
> > > same or better
> >
> > Moreover, actual printf implementations seems to depend upon the locale.  This creates bugs (say "1,4" instead of "1.4") so this behaviour depends if you want to be bug-compatible. We've been hit by that in `printed` when used with a Russian locale.
>
> *Is* it a bug, though?  Arguably, the reason snprintf was done that way was precisely to support properly-formatted output in the current locale. I.e., when outputting Russian text, the convention is to write the decimal point with "," rather than ".". It would be considered wrong or strange to write "1.4" instead of "1,4". This is important if you want to support i18n in your program.
>
> But if you're outputting to, say, JSON, then you *don't* ever want "1,4", you only want "1.4".
>
> Which leads me to think that these two should be separate format specifiers. Unfortunately, I can see how this would force format() to be impure, because to support checking the current locale implies accessing global state, which is impure.

The version of format that takes the format specifier as a compile-time argument shouldn't have that problem, but the one that took it as a runtime argument certainly would.

- Jonathan M Davis

November 01, 2019

Re: Replacement for snprintf

Posted by Jacob Carlborg
in reply to H. S. Teoh

Jacob Carlborg

Posted in reply to H. S. Teoh

On Thursday, 31 October 2019 at 15:58:08 UTC, H. S. Teoh wrote:

> Which leads me to think that these two should be separate format specifiers.

I would put the localization in a completely different function.

> Unfortunately, I can see how this would force format() to be impure, because to support checking the current locale implies accessing global state, which is impure.

You could pass in the locale to the function, then it can be pure. Even more reason to have it as a separate function. I would say that should be best practice because you might want to run a program in a different locale than the global configured one.

I'm not sure if it's enough to look at the locale. On my computer (a Mac) I have configured it to have the language in English but the date, time, number and currency format to Swedish.

--
/Jacob Carlborg

November 01, 2019

Re: Replacement for snprintf

Posted by H. S. Teoh
in reply to Jacob Carlborg

H. S. Teoh

Posted in reply to Jacob Carlborg

On Fri, Nov 01, 2019 at 01:01:21PM +0000, Jacob Carlborg via Digitalmars-d wrote:
> On Thursday, 31 October 2019 at 15:58:08 UTC, H. S. Teoh wrote:
> 
> > Which leads me to think that these two should be separate format specifiers.
> 
> I would put the localization in a completely different function.

That would be a better solution. It would be different from snprintf, though, and we'd have to document it well so that people can find it.

> > Unfortunately, I can see how this would force format() to be impure, because to support checking the current locale implies accessing global state, which is impure.
> 
> You could pass in the locale to the function, then it can be pure. Even more reason to have it as a separate function. I would say that should be best practice because you might want to run a program in a different locale than the global configured one.

+1.

> I'm not sure if it's enough to look at the locale. On my computer (a Mac) I have configured it to have the language in English but the date, time, number and currency format to Swedish.
[...]

I think it has to do with the LC_* environment variables, at least on a *nix system. You can set LC_ALL to get the same settings across all categories, or you can separately set one or more of the LC_* to get different settings in each category. (Caveat: I've never actually done this myself before, so I could be misunderstanding how it works.)

T

-- 
Famous last words: I wonder what will happen if I do *this*...

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation