March 01, 2009
Sooner or later that will need to be defined. I know next to nothing about locales. (I know I dislike the design C++ uses.)

I was thinking of a design along the following lines. There are RFCs dedicated to locale nomenclature:

http://tools.ietf.org/html/rfc4646 for language names
http://www.unicode.org/cldr/ for various locale names

So we know the basic names we want to follow, which is one less burden. Then what I want to do is to define a hierarchical string table that fills the appropriate names.

This is in opposition to defining an actual class hierarchy that mimics the localization table. I think a hierarchical string table is better because it allows simple extensibility.

The type stored by each slot of a locale is:

Algebraic!(
    int,
    string,
    Variant delegate(Variant),
    This[string]);

meaning that a locale could store one of these types. (What else should go in there?)

The access pattern goes like:

// Get the date display pattern
auto pat = myLocale.get("calendars", "calendar=default",
    "dateFormats", "dateFormatLength=medium", "pattern");

This will return an Algebraic with a string in it. The string looks like e.g. "yyyy-MM-dd".

The access is rather verbose because the corresponding locale names tree is equally (actually more) verbose, see http://unicode.org/Public/cldr/1.6.1/core.zip. But the flexibility and the standards-compliance are there. We may add later some convenience functions for frequently-used stuff such as dates, times, and numbers.

Extension is obvious:

myLocale.put("my-category", "my-slot", "whatever");

Getting later the stuff in "my-category", "my-slot" will return a string Algebraic containing "whatever".

There will be a global reference to a Locale class, e.g. defaultLocale. By default the reference will be null, implying the C locale should be in effect. Applications can assign to it as they find fit, and also pass around multiple locale variables.

So I wanted to gather some good ideas about locale design. Is a string-and-Algebraic design good for all uses? What kind of locale functionality does it not capture? I must have missed a ton of details, so if you don't understand what I mean by the above, it must be me.



Andrei
March 02, 2009
Andrei Alexandrescu wrote:
> There will be a global reference to a Locale class, e.g. defaultLocale. By default the reference will be null, implying the C locale should be in effect. Applications can assign to it as they find fit, and also pass around multiple locale variables.

I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable.

Any function that is locale aware should be parameterized with a locale parameter. (Not only is that better design, it self-documents the dependency.)
March 02, 2009
Andrei Alexandrescu wrote:
> Sooner or later that will need to be defined. I know next to nothing about locales. (I know I dislike the design C++ uses.)


D uses Utf-8, and that is *good enough*!

This lets my programs "understand" Finnish, and doesn't give me undue headaches.


Seriously tending to locale issues would be an *endless swamp*. Just for this, I looked up something suitable to read:

http://www.manpagez.com/man/1/perllocale/

It may even be that you would find the time, but think about Walter and us, please. There *really are* other things to do.


An excellent string hierarchy without the entire rest of i18n, is only going to look like a Ferrari with a Trabant engine. Which is worse than nothing at all.

Besides, there's more to this than just designing the perfect, or even a good locale system in a language. *Somebody should actually use it*.

Now, the non-English programmer, what does he really want? He wants to be able to type stuff into his program in his native character set. D already does that, by way of Utf-8.

What else? Well, it is conceivable that he wants his program to print dates and times the way it's done over there. He simply writes the program "by hand" so it does dates and times like he wants. Even if there was a locale thing in the language, he wouldn't bother with the hassle. And he couldn't care less about Urdu.

The hypothetical Ambitious Programmer might want to use locale. He could then have the dates and times (and currencies, etc.) follow the country. Now, that might sound commendable, but in practice it *crumbles*.
He can't possibly know how to deal with languages that are written backwards, languages where several characters make one letter, exotic ways of writing dates, etc.

So, his fancy i18n project is doomed to be, at most, as usable as the "normal" D program. Probably less, since his decisions will actually worsen the user experience -- for users in another culture.


And, any project big enough to tackle this, will implement its own locale handling anyway. I'm sorry to say.

----

Yes, locales are nice and all.
For D 3.5 that is.
Honestly.
March 02, 2009
Walter Bright wrote:
> Andrei Alexandrescu wrote:
>> There will be a global reference to a Locale class, e.g. defaultLocale. By default the reference will be null, implying the C locale should be in effect. Applications can assign to it as they find fit, and also pass around multiple locale variables.
> 
> I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable.
> 
> Any function that is locale aware should be parameterized with a locale parameter. (Not only is that better design, it self-documents the dependency.)

I don't understand this. That means there's no more default locale. Here's what I had in mind:

class Locale { ... }

// function parameterized with an optional locale
void foo(Data d, Locale loc = null);

So there's no more default locale. If you pass in null, that's the default locale.


Andrei
March 02, 2009
Walter Bright wrote:
> Andrei Alexandrescu wrote:
>> There will be a global reference to a Locale class, e.g. defaultLocale. By default the reference will be null, implying the C locale should be in effect. Applications can assign to it as they find fit, and also pass around multiple locale variables.
> 
> I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable.

The two programs that are most "locale aware" are usually spread sheets and word processors.

It is usual that the user needs to write, say, in Swedish or in Russian, while in a Finnish setting. Or that one wants to use a decimal separator other than what is "proper" for the country.

For example, a lot of people use "." instead of the official "," in Finland, and many use time as "18:23" instead of "18.23".


For this purpose, these programs let the users define these any way they want.

I think the notion of locales is, slowly but steadily, going away.

It was a nice idea at the time, but with two problems: users don't use it, and programmers don't use it.


Of course, eventually we will want to "do something" about this. But that should be left to the day when real issues are all sorted out in D. This is a non-urgent, low-priority thing.
March 02, 2009
Georg Wrede wrote:
> Andrei Alexandrescu wrote:
>> Sooner or later that will need to be defined. I know next to nothing about locales. (I know I dislike the design C++ uses.)
> 
> 
> D uses Utf-8, and that is *good enough*!
> 
> This lets my programs "understand" Finnish, and doesn't give me undue headaches.
> 
> 
> Seriously tending to locale issues would be an *endless swamp*. Just for this, I looked up something suitable to read:
> 
> http://www.manpagez.com/man/1/perllocale/
> 
> It may even be that you would find the time, but think about Walter and us, please. There *really are* other things to do.

I don't find that scary at all. It's quite what I expected. We should phase it in, after we do a good design. Also I don't plan to sit down and write locale definition files, I want to parse the XML in that locale repository I referred to.

> An excellent string hierarchy without the entire rest of i18n, is only going to look like a Ferrari with a Trabant engine. Which is worse than nothing at all.

I don't understand this. What is the rest of i18n?

> Besides, there's more to this than just designing the perfect, or even a good locale system in a language. *Somebody should actually use it*.
> 
> Now, the non-English programmer, what does he really want? He wants to be able to type stuff into his program in his native character set. D already does that, by way of Utf-8.
> 
> What else? Well, it is conceivable that he wants his program to print dates and times the way it's done over there. He simply writes the program "by hand" so it does dates and times like he wants. Even if there was a locale thing in the language, he wouldn't bother with the hassle. And he couldn't care less about Urdu.

If we come up with a good design, then they will be compelled to use it. Applications meant to be used across multiple countries have fumbled with locale support because there's no good support in most languages. So then why not offer a compelling support in D?

> The hypothetical Ambitious Programmer might want to use locale. He could then have the dates and times (and currencies, etc.) follow the country. Now, that might sound commendable, but in practice it *crumbles*.
> He can't possibly know how to deal with languages that are written backwards, languages where several characters make one letter, exotic ways of writing dates, etc.

Well my understanding is that the guys who wrote those RFCs and whatnot spent time figuring out the right abstractions. Why not use them?

> So, his fancy i18n project is doomed to be, at most, as usable as the "normal" D program. Probably less, since his decisions will actually worsen the user experience -- for users in another culture.
> 
> 
> And, any project big enough to tackle this, will implement its own locale handling anyway. I'm sorry to say.

They will implement their own because the language doesn't offer an extensible framework that they can build on.

> Yes, locales are nice and all.
> For D 3.5 that is.
> Honestly.

I just don't see where the big problem is. I'm talking about a blessed hierarchical hashtable to begin with. My initial desire is to be able to customize the array separators in writeln.


Andrei
March 02, 2009
Georg Wrede wrote:
> Walter Bright wrote:
>> Andrei Alexandrescu wrote:
>>> There will be a global reference to a Locale class, e.g. defaultLocale. By default the reference will be null, implying the C locale should be in effect. Applications can assign to it as they find fit, and also pass around multiple locale variables.
>>
>> I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable.
> 
> The two programs that are most "locale aware" are usually spread sheets and word processors.
> 
> It is usual that the user needs to write, say, in Swedish or in Russian, while in a Finnish setting. Or that one wants to use a decimal separator other than what is "proper" for the country.
> 
> For example, a lot of people use "." instead of the official "," in Finland, and many use time as "18:23" instead of "18.23".
> 
> 
> For this purpose, these programs let the users define these any way they want.

That's exactly what my proposal is doing. People can start with the defaults of the Finnish locale and then overwrite whichever parts they want.

> I think the notion of locales is, slowly but steadily, going away.

Do you have any data backing this up?

> It was a nice idea at the time, but with two problems: users don't use it, and programmers don't use it.

Is it because it hasn't been properly packaged?

> Of course, eventually we will want to "do something" about this. But that should be left to the day when real issues are all sorted out in D. This is a non-urgent, low-priority thing.

I guess. Now please tell me how I print arrays in D.


Andrei
March 02, 2009
Andrei Alexandrescu wrote:
> Walter Bright wrote:
>> Andrei Alexandrescu wrote:
>>> There will be a global reference to a Locale class, e.g. defaultLocale. By default the reference will be null, implying the C locale should be in effect. Applications can assign to it as they find fit, and also pass around multiple locale variables.
>>
>> I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable.
>>
>> Any function that is locale aware should be parameterized with a locale parameter. (Not only is that better design, it self-documents the dependency.)
> 
> I don't understand this. That means there's no more default locale. Here's what I had in mind:
> 
> class Locale { ... }
> 
> // function parameterized with an optional locale
> void foo(Data d, Locale loc = null);
> 
> So there's no more default locale. If you pass in null, that's the default locale.

That's fine, I was thrown off by your reference to a "global reference".
March 02, 2009
Georg Wrede wrote:
> What else? Well, it is conceivable that he wants his program to print dates and times the way it's done over there. He simply writes the program "by hand" so it does dates and times like he wants. Even if there was a locale thing in the language, he wouldn't bother with the hassle. And he couldn't care less about Urdu.

I've attempted to use locales, but the reason I'd always wind up doing it by hand is because the existing libraries to do it are obtuse, impenetrable, execrable, and pretty much unusable.

So it may be that it's an insoluble problem, or maybe nobody has come up with the right abstraction yet. I don't have nearly enough experience with it to know the answer.
March 02, 2009
Walter Bright wrote:
> Andrei Alexandrescu wrote:
>> Walter Bright wrote:
>>> Andrei Alexandrescu wrote:
>>>> There will be a global reference to a Locale class, e.g. defaultLocale. By default the reference will be null, implying the C locale should be in effect. Applications can assign to it as they find fit, and also pass around multiple locale variables.
>>>
>>> I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable.
>>>
>>> Any function that is locale aware should be parameterized with a locale parameter. (Not only is that better design, it self-documents the dependency.)
>>
>> I don't understand this. That means there's no more default locale. Here's what I had in mind:
>>
>> class Locale { ... }
>>
>> // function parameterized with an optional locale
>> void foo(Data d, Locale loc = null);
>>
>> So there's no more default locale. If you pass in null, that's the default locale.
> 
> That's fine, I was thrown off by your reference to a "global reference".

Well I was thinking a global reference might be handy for people who e.g. want to set the locale once and then be done with it. I think only a few apps actually manipulate multiple locales simultaneously. Most would just want to load the locale present on the user's computer and then use it.

Andrei
« First   ‹ Prev
1 2 3 4 5 6 7 8 9 10 11
Top | Discussion index | About this forum | D home