March 02, 2009
Georg Wrede wrote:
> Andrei Alexandrescu wrote:
>> Sooner or later that will need to be defined. I know next to nothing about locales. (I know I dislike the design C++ uses.)
> 
> 
> D uses Utf-8, and that is *good enough*!
> 
> This lets my programs "understand" Finnish, and doesn't give me undue headaches.
> 
> 
> Seriously tending to locale issues would be an *endless swamp*. Just for this, I looked up something suitable to read:
> 
> http://www.manpagez.com/man/1/perllocale/
> 
> It may even be that you would find the time, but think about Walter and us, please. There *really are* other things to do.
> 
> 
> An excellent string hierarchy without the entire rest of i18n, is only going to look like a Ferrari with a Trabant engine. Which is worse than nothing at all.
> 
> Besides, there's more to this than just designing the perfect, or even a good locale system in a language. *Somebody should actually use it*.
> 
> Now, the non-English programmer, what does he really want? He wants to be able to type stuff into his program in his native character set. D already does that, by way of Utf-8.
> 
> What else? Well, it is conceivable that he wants his program to print dates and times the way it's done over there. He simply writes the program "by hand" so it does dates and times like he wants. Even if there was a locale thing in the language, he wouldn't bother with the hassle. And he couldn't care less about Urdu.
> 
> The hypothetical Ambitious Programmer might want to use locale. He could then have the dates and times (and currencies, etc.) follow the country. Now, that might sound commendable, but in practice it *crumbles*.
> He can't possibly know how to deal with languages that are written backwards, languages where several characters make one letter, exotic ways of writing dates, etc.
> 
> So, his fancy i18n project is doomed to be, at most, as usable as the "normal" D program. Probably less, since his decisions will actually worsen the user experience -- for users in another culture.
> 
> 
> And, any project big enough to tackle this, will implement its own locale handling anyway. I'm sorry to say.
> 
> ----
> 
> Yes, locales are nice and all.
> For D 3.5 that is.
> Honestly.

If you don't use it, you don't use it; but please don't ruin it for the sake of those of us who will.

I will use it (go Andrei!)
people who have to muck with spreadsheet libraries might use it
people who write spreadsheet libraries might use it

wish I had some good ideas for Andrei, but I can't say as I do.
March 02, 2009
Walter Bright wrote:
> I've attempted to use locales, but the reason I'd always wind up doing it by hand is because the existing libraries to do it are obtuse, impenetrable, execrable, and pretty much unusable.
> 
> So it may be that it's an insoluble problem, or maybe nobody has come up with the right abstraction yet. I don't have nearly enough experience with it to know the answer.

Sounds like it’s not yet suitable for D2, then, at least not in std. Perhaps put an experimental interface in ext?

—Joel Salomon
March 02, 2009
Andrei Alexandrescu wrote:
> Walter Bright wrote:
>> Andrei Alexandrescu wrote:
>>> Walter Bright wrote:
>>>> Andrei Alexandrescu wrote:
>>>>> There will be a global reference to a Locale class, e.g. defaultLocale. By default the reference will be null, implying the C locale should be in effect. Applications can assign to it as they find fit, and also pass around multiple locale variables.
>>>>
>>>> I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable.
>>>>
>>>> Any function that is locale aware should be parameterized with a locale parameter. (Not only is that better design, it self-documents the dependency.)
>>>
>>> I don't understand this. That means there's no more default locale. Here's what I had in mind:
>>>
>>> class Locale { ... }
>>>
>>> // function parameterized with an optional locale
>>> void foo(Data d, Locale loc = null);
>>>
>>> So there's no more default locale. If you pass in null, that's the default locale.
>>
>> That's fine, I was thrown off by your reference to a "global reference".
> 
> Well I was thinking a global reference might be handy for people who e.g. want to set the locale once and then be done with it.

That's what I was objecting to!

> I think only a few apps actually manipulate multiple locales simultaneously. Most would just want to load the locale present on the user's computer and then use it.

User settable global state is eeevil.
March 02, 2009
Walter Bright wrote:
> Andrei Alexandrescu wrote:
>> Well I was thinking a global reference might be handy for people who e.g. want to set the locale once and then be done with it.
> 
> That's what I was objecting to!
> 
>> I think only a few apps actually manipulate multiple locales simultaneously. Most would just want to load the locale present on the user's computer and then use it.
> 
> User settable global state is eeevil.

I am thinking of a better form using scope-based locale usage. Consider:

class Locale { ... }
struct LocaleContext {
    this(Locale value);
    ~this();
    private Locale value();
    alias value this;
    ...
}

People wouldn't have access to a global Locale object. They can, however, create LocaleContext objects. Such objects set the current locale to user's locale in the constructor and restore the previous locale in the destructor.

That way use of locales follows use of scopes and the long-distance dependency created by globals is largely diminished.

An application just needing to create a LocaleContext upon loading and be done with it can create its own LocaleContext inside e.g. main(). A more sophisticated app may manage multiple locale contexts and put them in action as it needs. It's really flexible, and without promoting bad programming styles.


Andrei
March 02, 2009
Joel C. Salomon wrote:
> Walter Bright wrote:
>> I've attempted to use locales, but the reason I'd always wind up doing
>> it by hand is because the existing libraries to do it are obtuse,
>> impenetrable, execrable, and pretty much unusable.
>>
>> So it may be that it's an insoluble problem, or maybe nobody has come up
>> with the right abstraction yet. I don't have nearly enough experience
>> with it to know the answer.
> 
> Sounds like it’s not yet suitable for D2, then, at least not in std.
> Perhaps put an experimental interface in ext?

Good idea. But before we do so, I was hoping I'd pick the brains of people who have used locales in other languages and understand the burning points. Somehow, however, I'm doing a lousy job at eliciting contributions from people on this newsgroup (guess I'd be a lousy salesman). I tried a couple of times and all I got was a few new keyword proposals and a few new syntax proposals :o). What am I doing wrong?

Andrei
March 02, 2009
Hello Walter,
>
> User settable global state is eeevil.
> 

User *alterable* global state is eeevil. I can see a good argument for immutable WORM variables that can be assigned to exactly once very early in the program load process.


March 02, 2009
Walter Bright wrote:
> Georg Wrede wrote:
>> What else? Well, it is conceivable that he wants his program to print dates and times the way it's done over there. He simply writes the program "by hand" so it does dates and times like he wants. Even if there was a locale thing in the language, he wouldn't bother with the hassle. And he couldn't care less about Urdu.
> 
> I've attempted to use locales, but the reason I'd always wind up doing it by hand is because the existing libraries to do it are obtuse, impenetrable, execrable, and pretty much unusable.

I'd venture to say, it's not only the libraries -- the stuff itself is obtuse. In most countries there's no *real* consensus on what and how folks want their settings, and often the Official Settings (as dictated by either a real or imagined authority) are less than practical.

A case in point, in Finland, what I get when trying to type a dollar sign, is a ¤, which is a circle with four spokes. This sign is not used for absolutely anything, anywhere. Ever. (And I've been at this for more than 25 years.)

> So it may be that it's an insoluble problem, or maybe nobody has come up with the right abstraction yet. I don't have nearly enough experience with it to know the answer.

National pride, anti-imperialism, you name it. The numeric keyboard around here has a comma instead of the decimal point. Just guess if it's nice to try to do spread sheets, where you have use a decimal point just because this spread sheet goes to company correspondence overseas.

Folks are all eager about locales, until they get their hands dirty.

IMHO, it actually is an insoluble problem -- at least as far as a *programming language* is concerned.
March 02, 2009
BCS wrote:
> Hello Walter,
>>
>> User settable global state is eeevil.
>>
> 
> User *alterable* global state is eeevil. I can see a good argument for immutable WORM variables that can be assigned to exactly once very early in the program load process.

Sure, I meant global state once initialized.
March 02, 2009
Andrei Alexandrescu wrote:
> Georg Wrede wrote:
>> Walter Bright wrote:
>>> Andrei Alexandrescu wrote:
>>>> There will be a global reference to a Locale class, e.g. defaultLocale. By default the reference will be null, implying the C locale should be in effect. Applications can assign to it as they find fit, and also pass around multiple locale variables.
>>>
>>> I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable.
>>
>> The two programs that are most "locale aware" are usually spread sheets and word processors.
>>
>> It is usual that the user needs to write, say, in Swedish or in Russian, while in a Finnish setting. Or that one wants to use a decimal separator other than what is "proper" for the country.
>>
>> For example, a lot of people use "." instead of the official "," in Finland, and many use time as "18:23" instead of "18.23".
>>
>> For this purpose, these programs let the users define these any way they want.
> 
> That's exactly what my proposal is doing. People can start with the defaults of the Finnish locale and then overwrite whichever parts they want.

From Java.util.class.locale (j2se/1.4.2): "A Locale object represents a specific geographical, political, or cultural region."

Nice. If those three were orthogonal, then you'd choose each once and be done with it. Unfortunately, they blend. And they blend in a different way in every area. That creates "continuums" of needs for settings, and these can't really be predicted easily.

A GUI user can rely on the settings been made at OS install by himself or the local vendor. But the console is different. (See below.)

>> I think the notion of locales is, slowly but steadily, going away.
> 
> Do you have any data backing this up?

For instance, in the old days, the operating system used to define the variable LC_LOCAL for the user. It signified the locale, usually the user's country.

Today, I see no such thing. The only variables related to such are for the GUI:

LANG=en_US.UTF-8
GDM_LANG=en_US.UTF-8

One is the console input language and the other is the GUI input language. No locale stuff anywhere.

>> It was a nice idea at the time, but with two problems: users don't use it, and programmers don't use it.
> 
> Is it because it hasn't been properly packaged?

No. Imagine for a moment that we had a Perfect Locale Implementation (which I say is not even possible, but still).

If a programmer wanted to use locale dependent printing, then he'd have to get familiar with all the possible ways his string may get printed if someone uses his program in a far away country. And there are a few different ways, believe me.

Would you imagine anybody actually bothering to do that? Would you?? So what the programmer does, is, he prints things the way he wants, and caters only to the specific things he feels he needs to. And creates a solution that behaves *predictably*, from his point of view.

He may want folks in France and Finland to use his program. And since he doesn't write the UI strings in any other language, the program will be unusable to folks in Afghanistan anyway.

Or he writes an English UI, whereupon people accept that it may not cater for all kinds of exotic needs.

>> Of course, eventually we will want to "do something" about this. But that should be left to the day when real issues are all sorted out in D. This is a non-urgent, low-priority thing.

Had there been any need for locales, believe me, the "foreigners" in this NG would have asked for it.

> I guess. Now please tell me how I print arrays in D.

Think about it for a moment. We have two kinds of programs, those written for the console, and those written for a GUI. It's natural for the GUI programs to be locale aware, but with the console apps, it simply is not possible to do properly. I'll explain, but first:

Let's split this into two separate issues, the console and the GUI.

The GUI is aware of your preferences.
You don't use writefln with the GUI.
You use the GUI API for any I/O, right?

Now, wouldn't it be natural to assume that the GUI API takes care of all of this? Print a date, and it prints it with the user's preferred format. *The same with your array*.


And then let's look at the console.

A proper internationalisation would mean that the Chinese could use the console, and all character mode apps in Chinese. Problem is, there simply aren't enough pixels on many consoles to render the Chinese character set.

So we're off track already. And with the ubiquitous GUIs around, people are increasingly accepting that a GUI is for nationalised stuff, and the console is for "technical" stuff.

Haven't you noticed: in the last decade it has become all the more evident that the reason to write a non-GUI app, is very specifically just to get rid of all kinds of hassles, and simply concentrate on what the program is supposed to do!

(You know, a few years ago we had a major conversation here about whether non-ASCII variable names should be accepted in D. The end result is, yes. (I just tried it.) Now, how can an international team cowork on a project where variable names are written so the other folks can't even type them with their keyboards??? -- All very nice, but no cigar. That's about as smart as letting people define *unlimited* length variable names!)


*** How to print arrays ***

You print arrays in a predictable and expected way.

D array printing is for non-GUI stuff. Hence, you use the C locale, period.

A matematician seriously doesn't want his arrays to have commas instead of decimal points. He sure as heck doesn't want the numbers to all of a sudden turn to Klingon like hieroglyphs just because he is showing his results in an overseas seminar, on the local computer!!!!!


And what about the programmer who wants his array to go into another program? What do you think happens to parsing when the decimal point is suddenly a comma??

We've had Walter make nice features to D that were laborious to create, only to see nobody use them. It's happened, ask him. *Now* is not the time to do that again.
March 02, 2009
Georg Wrede wrote:
> Let's split this into two separate issues, the console and the GUI.
> 
> The GUI is aware of your preferences.
> You don't use writefln with the GUI.
> You use the GUI API for any I/O, right?

There's a third faction: graphical apps that don't use the underlying GUI API.  Most games fall in this category.

When writing cross-platform apps (whether gui, non-gui-but-graphical, or console), you need some layer of abstraction over the underlying platform localization API.  This abstraction can be provided by the programming language, or a third-party library.

> A proper internationalisation would mean that the Chinese could use the console, and all character mode apps in Chinese. Problem is, there simply aren't enough pixels on many consoles to render the Chinese character set.

I have Windows configured to use a Japanese text encoding for command windows.  I can and do run Japanese console applications, but console applications that assume CP437 or Latin-1 don't work for me.


-- 
Rainer Deyke - rainerd@eldwood.com