March 02, 2009
Georg Wrede wrote:
> Walter Bright wrote:
>> Andrei Alexandrescu wrote:
>>> There will be a global reference to a Locale class, e.g. defaultLocale. By default the reference will be null, implying the C locale should be in effect. Applications can assign to it as they find fit, and also pass around multiple locale variables.
>>
>> I disagree with being able to assign to the global defaultLocale. This is going to cause endless problems. Just one is that any function that uses locale can no longer be pure. defaultLocale should be immutable.
> 
> The two programs that are most "locale aware" are usually spread sheets and word processors.

And Microsoft products do "locale awareness" so badly, I'm pretty sure there's no simple solution. (<gripe> They could at least recognize that outside the US, everyone uses A4-size paper, not that bizarro letter/legal stuff </gripe>).

> It is usual that the user needs to write, say, in Swedish or in Russian, while in a Finnish setting. Or that one wants to use a decimal separator other than what is "proper" for the country.
> 
> For example, a lot of people use "." instead of the official "," in Finland, and many use time as "18:23" instead of "18.23".

This is my experience as well. There's an awful lot of expats in the world.

> For this purpose, these programs let the users define these any way they want.
> 
> I think the notion of locales is, slowly but steadily, going away.
> 
> It was a nice idea at the time, but with two problems: users don't use it, and programmers don't use it.

I think the whole idea is based on a fallacy: that there IS a locale.
The idea that you can choose which currency symbol to use, based on where the computer is, is utterly absurd. Surely these days, nearly everyone has to deal with the Euro, the US dollar, the Pound, and the Yen, as well as their local currency.

The world is international now, not local.

I nearly always end up setting the locale to "Antarctica", it turns off most the locale logic <g>. There's so many programs that try to be too clever.

> Of course, eventually we will want to "do something" about this. But that should be left to the day when real issues are all sorted out in D. This is a non-urgent, low-priority thing.
March 02, 2009
Andrei Alexandrescu wrote:
> Georg Wrede wrote:
>> Andrei Alexandrescu wrote:
>>> Sooner or later that will need to be defined. I know next to nothing about locales. (I know I dislike the design C++ uses.)
>>
>> D uses Utf-8, and that is *good enough*!
>>
>> This lets my programs "understand" Finnish, and doesn't give me undue headaches.
>>
>> Seriously tending to locale issues would be an *endless swamp*. Just for this, I looked up something suitable to read:
>>
>> http://www.manpagez.com/man/1/perllocale/
>>
>> It may even be that you would find the time, but think about Walter and us, please. There *really are* other things to do.
> 
> I don't find that scary at all.

Maybe a quick skim doesn't let the issues sink in. :-)

> It's quite what I expected. We should phase it in, after we do a good design. Also I don't plan to sit down and write locale definition files, I want to parse the XML in that locale repository I referred to.

My ex wife has this GPS thing in her car. Very nice. But once on the road, it's too much hassle to type in a street address. And you're always in a hurry, so you don't have time to type it in before driving, while you're stuffing the kids in the car.

>> An excellent string hierarchy without the entire rest of i18n, is only going to look like a Ferrari with a Trabant engine. Which is worse than nothing at all.
> 
> I don't understand this. What is the rest of i18n?

i18n stands for internationalisation. The word was too long to type.

Ah, or you meant the rest? That is, if there is this shiny repository right inside the language for storing these i18n preferences, then that does oblige us to have writefln, regexp, sort, and other stuff to recognise those values, right? Otherwise people will ask how come we have a car but no engine. And that is a job bigger than it looks like. But not doing it fully will have people feel D is less good than if we never had the repository at all!

Oh, and who wants writefln, regexp, sort, and the others to become slower? Hands up.

>> Besides, there's more to this than just designing the perfect, or even a good locale system in a language. *Somebody should actually use it*.
>>
>> Now, the non-English programmer, what does he really want? He wants to be able to type stuff into his program in his native character set. D already does that, by way of Utf-8.
>>
>> What else? Well, it is conceivable that he wants his program to print dates and times the way it's done over there. He simply writes the program "by hand" so it does dates and times like he wants. Even if there was a locale thing in the language, he wouldn't bother with the hassle. And he couldn't care less about Urdu.
> 
> If we come up with a good design, then they will be compelled to use it. 

Gnome and KDE are both GUIs designed by "foreigners". i18n has been a *top priority* from the outset. Start a default project, and you have i18n "inbuilt" in your app.

And still, my default clock applet only lets me choose between 12 and 24 hour clock, but the date is always "Mon Mar 2", and I can't get it to "020309", which I want. Or change it at all.

And while there are simply excellent provisions for having all your app strings in the local language, hardly any application actually has more than a couple language choices.

> Applications meant to be used across multiple countries have fumbled with locale support because there's no good support in most languages. So then why not offer a compelling support in D?

Nobody will use it. (People buy all these expensive workout machines they see on TV, and they never use them after two weeks.) i18n support is more than having your arrays print in peculiar ways overseas.

Ideally, you would translate the UI to several languages, take in consideration some cultural differences, and then have the library muck your strings and variables into the "local" representation.

Won't happen in a non-GUI program.

>> The hypothetical Ambitious Programmer might want to use locale. He could then have the dates and times (and currencies, etc.) follow the country. Now, that might sound commendable, but in practice it *crumbles*.
>>
>> He can't possibly know how to deal with languages that are written backwards, languages where several characters make one letter, exotic ways of writing dates, etc.
> 
> Well my understanding is that the guys who wrote those RFCs and whatnot spent time figuring out the right abstractions. Why not use them?

Because we don't have infinite time. Urgent, much asked for, technologically imperative, and other stuff should be done instead. There are both mundane and interesting tasks. Nice-to-haves come later.

>> So, his fancy i18n project is doomed to be, at most, as usable as the "normal" D program. Probably less, since his decisions will actually worsen the user experience -- for users in another culture.
>>
>>
>> And, any project big enough to tackle this, will implement its own locale handling anyway. I'm sorry to say.
> 
> They will implement their own because the language doesn't offer an extensible framework that they can build on.

No, it's because they will only implement the parts that they're interested in. That's pretty easy to do for a big project. (If there will be one for a non-GUI purpose.)

>> Yes, locales are nice and all.
>> For D 3.5 that is.
>> Honestly.
> 
> I just don't see where the big problem is. I'm talking about a blessed hierarchical hashtable to begin with. 

The  big problem is, SOMEONE will have to tell your XML table what values the user wants. Where is this knowledge stored in a way that every D app can get to it? And how do you force the user to populate the XMl table with his choices to begin with?

What I'm saying is, it's debatable whether this stuff belongs to "the programming language itself" at all. Rather, it should be an external library, provided by someone else than us. It belongs to SourceForge or Dsource, not here.

And definitely all this should be deferred to not 2.0, but to 2.5 or preferrably 3.0. If by that time we have seen that there actually is any use for such a thing, then we can decide whether to outsource it to anybody interested, or to actually try to make it part of the language.


I'm not saying it's impossible to do, or to do well. But I am saying it is *way* too insignificant to deserve any attention at this time.

> My initial desire is to be able to customize the array separators in
> writeln.

One might want to print arrays in different ways, even in the same program. Why not let the programmer customise the array printing the same as he does with integers and floats? Just a little addition to the syntax?

Or why not just have a print function that takes an array and a format? Arrays are different enough to not comfortably fit into writefln semantics anyway. Clean and practical, in a practical language.

Whatever you do, don't mix this with any internationalisation, please.

March 02, 2009
Georg Wrede wrote:
> We've had Walter make nice features to D that were laborious to create, only to see nobody use them. It's happened, ask him.

Sure. Often the only way to see if a feature is useful is to actually implement it and see what happens. Some features have succeeded and found uses far beyond my expectations (CTFE, string mixins) while others have pretty much languished (design by contract, complex numbers).


> *Now* is not the time to do that again.

To some extent, we can't predict that. But I did find your arguments pretty strong.
March 02, 2009
Walter Bright wrote:
> Georg Wrede wrote:
>> We've had Walter make nice features to D that were laborious to create, only to see nobody use them. It's happened, ask him.
> 
> Sure. Often the only way to see if a feature is useful is to actually implement it and see what happens. Some features have succeeded and found uses far beyond my expectations (CTFE, string mixins) while others have pretty much languished (design by contract, complex numbers).
> 
> 
>> *Now* is not the time to do that again.
> 
> To some extent, we can't predict that. But I did find your arguments pretty strong.

LOL :-)
March 02, 2009
Georg Wrede wrote:
> (You know, a few years ago we had a major conversation here about whether non-ASCII variable names should be accepted in D. The end result is, yes. (I just tried it.) Now, how can an international team cowork on a project where variable names are written so the other folks can't even type them with their keyboards???

On the other hand, if you have a Chinese development team, why should they be limited to ASCII variable names? It doesn't make sense for them.

> -- All very nice, but no cigar. That's about as smart as letting people define *unlimited* length variable names!)

I recently dealt with a programming language that specified a limit of 63 characters for identifier names. This wouldn't have been a significant problem, except that I was generating code automatically, and some of my identifiers were over 90 characters. Identifier length limits are evil, unless they're ridiculously large (C#, I think, limits identifiers to 4096 characters).
March 02, 2009
Georg Wrede wrote:
> The hypothetical Ambitious Programmer might want to use locale. He could then have the dates and times (and currencies, etc.) follow the country. Now, that might sound commendable, but in practice it *crumbles*.
> He can't possibly know how to deal with languages that are written backwards, languages where several characters make one letter, exotic ways of writing dates, etc.

*cough*tango.time*cough*
March 02, 2009
On 2009-03-02 01:04:47 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> Good idea. But before we do so, I was hoping I'd pick the brains of people who have used locales in other languages and understand the burning points. Somehow, however, I'm doing a lousy job at eliciting contributions from people on this newsgroup (guess I'd be a lousy salesman). I tried a couple of times and all I got was a few new keyword proposals and a few new syntax proposals :o). What am I doing wrong?

I think there are three aspects to localization. One is date and number formating. Another is offering a facility for translating all the messages an application can give. And the last one is the configuration part, where you know which format to use.

The only problem I've seen addressed by you right now is the configuration part; I believe it's the wrong end to start with.

We should start by defining how to perform the tasks I enumerated above: translating date and number formats, selecting strings for a given language. After that we can figure out how to pass the proper default configuration around. And then you're done.

For date and number formatting, I like very much the NSDateFormatter and NSNumberFormatter approach in Cocoa for instance: you have a base class to format dates, another for numbers; you can easily create your own subclass if you want, and there's a way to get the default formatter instance.

This is extensible, because if you wanted to go further, you could add formatter classes for various units (length, mass...), or anything else.

Translating strings is a little harder because 1) strings are application-defined, 2) strings are often not available in the user's prefered language, adding the need for a fallback mecanism, and 3) different applications will want to to store those strings in different ways. Perhaps we could define a base class for getting translated strings, then allow the program to use whatever subclass it wants.

Notice how I'm not using the word "locale" to talk about these things. "Locale" is a concept too abstract to be able to do something good with it. Since you could only define it using Algebraic type and a loosely defined tree of strings, that seems to confirm my view. Call the module std.locale if you want, but keep in mind that the most important task at hand is facilitating localization, not defining what constitutes a locale, that can wait.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

March 02, 2009
Don wrote:
> there's no simple solution. (<gripe> They could at least recognize that outside the US, everyone uses A4-size paper, not that bizarro letter/legal stuff </gripe>).

Amen!

>> It is usual that the user needs to write, say, in Swedish or in Russian, while in a Finnish setting. Or that one wants to use a decimal separator other than what is "proper" for the country.
>>
>> For example, a lot of people use "." instead of the official "," in Finland, and many use time as "18:23" instead of "18.23".
> 
> This is my experience as well. There's an awful lot of expats in the world.

Not just expats.
For example: I was born & raised in the Netherlands, but even though officially we use a decimal comma here, I almost always use a decimal point instead. This may have been caused by use of the US keyboard layout (and its numeric keypad in particular), but I now even catch myself using it when writing with a pen...

> I nearly always end up setting the locale to "Antarctica", it turns off most the locale logic <g>. There's so many programs that try to be too clever.

lol :)
March 02, 2009
Andrei Alexandrescu, el  1 de marzo a las 19:40 me escribiste:
> Georg Wrede wrote:
> >Andrei Alexandrescu wrote:
> >>Sooner or later that will need to be defined. I know next to nothing about locales. (I know I dislike the design C++ uses.)
> >D uses Utf-8, and that is *good enough*!
> >This lets my programs "understand" Finnish, and doesn't give me undue headaches.
> >Seriously tending to locale issues would be an *endless swamp*. Just for this, I looked up something suitable to read:
> >http://www.manpagez.com/man/1/perllocale/
> >It may even be that you would find the time, but think about Walter and us, please. There *really are* other things to do.
> 
> I don't find that scary at all. It's quite what I expected. We should phase it in, after we do a good design. Also I don't plan to sit down and write locale definition files, I want to parse the XML in that locale repository I referred to.

I'm not following this thread carefully and I don't know if this is what
you are implying, but: Please don't you even think in duplicating the
locale stuff, at least on unix there is a very nice database that needs to
be updated sometimes very often (due to stupid presidents like the one
I have now that changes the summer saving time all the time).

PHP for example maintains a copy of this locale data and is a real PITA.

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
The average person laughs 13 times a day
March 02, 2009
On 2009-03-02 08:32:40 -0500, Leandro Lucarella <llucax@gmail.com> said:

> I'm not following this thread carefully and I don't know if this is what
> you are implying, but: Please don't you even think in duplicating the
> locale stuff, at least on unix there is a very nice database that needs to
> be updated sometimes very often (due to stupid presidents like the one
> I have now that changes the summer saving time all the time).
> 
> PHP for example maintains a copy of this locale data and is a real PITA.

I do agree.

In another post I proposed we create formatter classes for numbers and dates. This way, you can use a formatter binding to the UNIX database and APIs, or the Windows APIs, or Cocoa, etc., or you can build your own. All you need is a generic front end formatter interface you can bind to anything (and a common internal representation for dates) something like:

	interface DateFormatter
	{
		string timestampToString(int timestamp);
		int stringToTimestamp(string date);
	}

	DateFormatter defaultDateFormatter();
	DateFormatter dateFormatterForLocale(string localeName);

	interface NumberFormatter
	{
		string intToString(int number);
		int stringToInt(string number);
	}

	NumberFormatter defaultNumberFormatter();
	NumberFormatter numberFormatterForLocale(string localeName);

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/