Replacement for snprintf (page 5)

On Wednesday, 6 November 2019 at 17:28:58 UTC, H. S. Teoh wrote: > > Then programs that want to support locales can just do this: > > writefln("%.2?d", curLocale.separator, 3.141592); > > For %f, the decimal separator is not the only locale specific info. Full list: -decimal separator -negative pattern -positive pattern -infinity symbol -nan symbol -digit shapes, especially for Arabic and Thai For %d and %g there are more like digit grouping/group separator.

On 2019-11-06 17:17, Petar Kirov [ZombineDev] wrote: > I think the best way to go is to make it locale-independent and simply provide a way for user to specify the decimal separator (and other related locale details, if any). In my experience, I think it's best to leave the locale support to a separate API. The "snprintf" API is never going to be flexible enough. No one is using "snprintf" for serious localization. It's not just the decimal point that needs to be localized. There are various other number related things that need localization. Just have a look at the number formatter in Apple's API [1]. It's pretty big. Then they have separate formatters for currency, length, mass, interval and more. [1] https://developer.apple.com/documentation/foundation/nsnumberformatter?language=objc -- /Jacob Carlborg

November 06, 2019

Re: Replacement for snprintf

Posted by H. S. Teoh
in reply to Rumbu

Permalink

H. S. Teoh

Posted in reply to Rumbu

Permalink

On Wed, Nov 06, 2019 at 06:21:43PM +0000, Rumbu via Digitalmars-d wrote:
> On Wednesday, 6 November 2019 at 17:28:58 UTC, H. S. Teoh wrote:
> > 
> > Then programs that want to support locales can just do this:
> > 
> > 	writefln("%.2?d", curLocale.separator, 3.141592);
[...]
> For %f, the decimal separator is not the only locale specific info. Full list:
> 
> -decimal separator
> -negative pattern
> -positive pattern
> -infinity symbol
> -nan symbol
> -digit shapes, especially for Arabic and Thai
> 
> For %d and %g there are more like digit grouping/group separator.
[...]

Haha, wonderful. Don't you just love it when i18n consistently throws a monkey wrench into any simplistic scheme?  Almost makes me want to suggest that we need std.i18n before we can implement anything sane i18n-wise.

But since that's not gonna happen in the foreseeable future, and I'm sick and tired of the trend around these parts of letting the perfect be the enemy of the good, I'm going to propose that we just forget about i18n and just implement formatting for an English-specific locale. If users *really* want to support locales, just use %s with a wrapper struct with a toString method that does whatever it takes to get the right output. I've used this pattern for various problems with formatting complex objects, and it works fairly well:

	struct i18nFmt {
		float f; // or double, real, whatever
		int precision;
		... // any other params here, like decimal point format, etc.

		void toString(S)(S sink)
			if (isOutputRange!(S, char))
		{
			... // do whatever you need to do here to
			    // produce the right output
		}
	}

	...
	float myData = ...;

	// just use %s instead of some incomprehensible over-engineered
	// crap like %1:3,$13&.*^_7?f
	output = format("%s", myData.i18nFmt);

	// or:
	output2 = format("%s", myData.i18nFmt(curLocale.precision, ...
				/* whatever else */));

This way you lift the complexity out of std.format where it really doesn't belong, and make it possible to plug in different locale handling modules in its place. This even opens the door for a future std.i18n that simply exports a bunch of these locale-dependent proxy formatters that you could just append to your data items. Much more extensible and flexible than trying to shoehorn everything into std.format, which will inevitably turn it into a nasty hairball of intractible dependencies that's impossible to make pure, nothrow, etc.. (Oh wait, it's already such a hairball. :-D  Let's not make it worse!) And it makes std.format more pay-as-you-go; if you never need to use std.i18n it won't pull it in as a dependency just because it needs to support an obscure format specifier that you don't actually use.

T

-- 
Being forced to write comments actually improves code, because it is easier to fix a crock than to explain it. -- G. Steele

On Wed, Nov 06, 2019 at 07:43:06PM +0100, Jacob Carlborg via Digitalmars-d wrote: [...] > In my experience, I think it's best to leave the locale support to a separate API. The "snprintf" API is never going to be flexible enough. No one is using "snprintf" for serious localization. > > It's not just the decimal point that needs to be localized. There are various other number related things that need localization. > > Just have a look at the number formatter in Apple's API [1]. It's pretty big. Then they have separate formatters for currency, length, mass, interval and more. [...] Yeah, after thinking about this more, I've come to the same conclusion. Just use %s for anything that depends on complex locale-dependent configuration, and wrap your data item in a proxy object that does whatever it takes to make it work. float myQuantity = ...; auto output = format("%s", myQuantity.localeFmt(...)); where localeFmt is some function or wrapper struct overloading toString that does whatever it takes to format the data in a locale-specific way. T -- "Hi." "'Lo."

On Wednesday, 6 November 2019 at 16:54:25 UTC, Andre Pany wrote: > This question comes late, but did you considered to just do an 1 to 1 translation of snprintf from C to D? I scanned through the implementation of snprintf several times while I wrote the replacement. I think, the main algorithm is quite similar, apart from some speed improvement for numbers close to zero, which turned out to be quite nasty in detail (and which for now I skipped therefore). By the way: A 1 to 1 translation would not be something, I could do, because my knowledge of C is very little and the algorithm contains lot's of calls to functions I do not know, where to look them up and how to replace them with D functions.

On Wednesday, 6 November 2019 at 17:28:58 UTC, H. S. Teoh wrote: > Yes, I think in the long run this will be the more viable approach. Depending on locale as a global state is problematic because it forces formatting to be impure, and also forces users to implement hacks when they need to temporarily change the locale. E.g., in a system like snprintf, if you need to format German text with snippets of English quotations, you will have to temporarily override LC_* somehow in order to print a number with two different separators, or hack it with string postprocessing, etc.. My current approch is a pure and @safe function that's doing the formating, but ignores the locale completely. This function is called from formatValueImpl and could be modified there, if desired. Currently (I want to make small steps), the function can only be used for the f (and F) specifier (and only for float and double). For all other specifiers/types snprintf is still called. That might result in different behaviour depending on the specifier and the type. I'd prefere to make it behave identically. Having said this, I completely agree, that it would be better if format ignores the locale and let's the user do this in a wrapper, if desired.

On Wednesday, 6 November 2019 at 18:21:43 UTC, Rumbu wrote: > For %f, the decimal separator is not the only locale specific info. Full list: > > -decimal separator > -negative pattern > -positive pattern > -infinity symbol > -nan symbol > -digit shapes, especially for Arabic and Thai > > > For %d and %g there are more like digit grouping/group separator. snprintf only uses the decimal separator (and grouping but that's not used inside format, the grouping is done separately there). All else is ignored by snprintf.

On Wednesday, 30 October 2019 at 13:44:52 UTC, berni44 wrote: > In PR 7222 [1] Robert Schadek suggested replacing the call to snprinf in std.format with an own method written in D. [...] Meanwhile I filed a first PR: https://github.com/dlang/phobos/pull/7264 - only part of a complete replacement is achieved with that: Only the 'f' qualifier is replaced and that only for float and double. But it's a start and I want to make small steps. Many thanks to all of you, who answered to this thread or gave hints at other places. This helped a lot. :-)

On Friday, 8 November 2019 at 14:42:29 UTC, berni44 wrote: > Meanwhile I filed a first PR: https://github.com/dlang/phobos/pull/7264 - only part of a complete replacement is achieved with that: Only the 'f' qualifier is replaced and that only for float and double. But it's a start and I want to make small steps. Update: While this first PR is still waiting for being revied, a second PR (the same for '%a' qualifier) has been merged last week. Today I filed a third PR (for '%e' qualifier). The '%g' qualifier has to wait until these two PRs are merged, because it depends strongly on those two. With the help of Petar Kirov [ZombineDev] I meanwhile also managed to make the whole CTFEable. But this also has to wait for the two PRs mentioned above. Next steps will be some speed optimization for small exponents (works allready on paper but I havn't implemented and tested it yet) and for large exponents (only a vague idea yet).

Forums