March 02, 2009
Georg Wrede wrote:
> *** How to print arrays ***
> 
> You print arrays in a predictable and expected way.
> 
> D array printing is for non-GUI stuff. Hence, you use the C locale, period.

I think the C locale (or any predefined locale) tells what left bracket I should use for array, what separator, and what right bracket. For now the left and right brackets were eliminated because the user can easily add them on the caller side. The separator is a space simply because it looks the least harmful. But for example I don't have a good solution for what to print as the separator between a hash key and a hash value. A simple, extensible locale support would have allowed me to stop worrying about that.

Also, D array printing is not only for console - a GUI may use to!string with arrays.

But overall I guess I'll let myself bludgeoned into complacency...


Andrei
March 02, 2009
What is language specific about how an array is formatted? I think you're abusing the locale stuff as some kind of user customization mechanism for format().
March 02, 2009
Michel Fortin, el  2 de marzo a las 07:30 me escribiste:
> On 2009-03-02 01:04:47 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:
> 
> >Good idea. But before we do so, I was hoping I'd pick the brains of people who have used locales in other languages and understand the burning points. Somehow, however, I'm doing a lousy job at eliciting contributions from people on this newsgroup (guess I'd be a lousy salesman). I tried a couple of times and all I got was a few new keyword proposals and a few new syntax proposals :o). What am I doing wrong?
> 
> I think there are three aspects to localization. One is date and number formating. Another is offering a facility for translating all the messages an application can give. And the last one is the configuration part, where you know which format to use.

I think you are confusing localization (l10n) with internationalization
(i18n)[1]. Locales is about l10n, it's numbers and date formats, time zones,
etc. i18n is about translations.

I've used the standard C API for localization and I found it quite simple and good. What's wrong with it?

I've used gettext[1] too (which is almost a de-facto standard in unix), and even when it could be improved I think it does a pretty good job, and it has a lot of very subtle problems solved.

I think l10n and i18n should be taken with a lot of care, because it's
very hard to get it right (like concurrency ;). There are a lot of rough
edges and exceptions to thing that at first sight looks so universal that
makes very easy to make a bad desing (like plural forms[3]). The gettext
manual[4] is a great source to see how big this is. Gettext is supported
in most major programming languages, so I think D could greatly benefit
from using it
too.

[1] http://en.wikipedia.org/wiki/Internationalization_and_localization
[2] http://www.gnu.org/software/gettext/
[3] http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms
[4] http://www.gnu.org/software/gettext/manual/gettext.html

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
I always get the feeling that when lesbians look at me, they're thinking,
'*That's* why I'm not a heterosexual.'
	-- George Constanza
March 02, 2009
Georg Wrede wrote:
> Andrei Alexandrescu wrote:
>>> An excellent string hierarchy without the entire rest of i18n, is only going to look like a Ferrari with a Trabant engine. Which is worse than nothing at all.
>>
>> I don't understand this. What is the rest of i18n?
> 
> i18n stands for internationalisation. The word was too long to type.
> 
> Ah, or you meant the rest? That is, if there is this shiny repository right inside the language for storing these i18n preferences, then that does oblige us to have writefln, regexp, sort, and other stuff to recognise those values, right? Otherwise people will ask how come we have a car but no engine. And that is a job bigger than it looks like. But not doing it fully will have people feel D is less good than if we never had the repository at all!
> 
> Oh, and who wants writefln, regexp, sort, and the others to become slower? Hands up.

They will only be slower, by necessity, for people who want them localized, not for anyone else.

>> Well my understanding is that the guys who wrote those RFCs and whatnot spent time figuring out the right abstractions. Why not use them?
> 
> Because we don't have infinite time. Urgent, much asked for, technologically imperative, and other stuff should be done instead. There are both mundane and interesting tasks. Nice-to-haves come later.

This is a misunderstanding. I am talking about a few dozens of lines of code that capitalize on Algebraic to structure the locale space. For starters I just want to e.g. allow people to configure how they stringize and print stuff from D. Hardcoding that kind of stuff, or the strings thrown in exceptions, does not sound too good.

>> I just don't see where the big problem is. I'm talking about a blessed hierarchical hashtable to begin with. 
> 
> The  big problem is, SOMEONE will have to tell your XML table what values the user wants. Where is this knowledge stored in a way that every D app can get to it? And how do you force the user to populate the XMl table with his choices to begin with?

You see, we're not communicating. I sent this link:

http://www.unicode.org/cldr/

Did you look at it? It is essentially a database of locale information in a highly structured format. All I want is to define a structure expressive enough to gobble the part of that database that is of interest. The Phobos documentation will say, we just adopt their schema. If users don't want to load any, then fine - everything is just like today.

> What I'm saying is, it's debatable whether this stuff belongs to "the programming language itself" at all. Rather, it should be an external library, provided by someone else than us. It belongs to SourceForge or Dsource, not here.

http://www.unicode.org/cldr/

We just need to load it if there is such a need.

> And definitely all this should be deferred to not 2.0, but to 2.5 or preferrably 3.0. If by that time we have seen that there actually is any use for such a thing, then we can decide whether to outsource it to anybody interested, or to actually try to make it part of the language.
> 
> 
> I'm not saying it's impossible to do, or to do well. But I am saying it is *way* too insignificant to deserve any attention at this time.

You and I have completely different understandings of the level of effort needed. It's not like I don't have anything to do. :o)

Let me try again: I don't want to define locale support. I want to provide the basics for people to roll it out themselves.



Andrei
March 02, 2009
Leandro Lucarella wrote:
> Andrei Alexandrescu, el  1 de marzo a las 19:40 me escribiste:
>> Georg Wrede wrote:
>>> Andrei Alexandrescu wrote:
>>>> Sooner or later that will need to be defined. I know next to nothing about locales. (I know I dislike the design C++ uses.)
>>> D uses Utf-8, and that is *good enough*!
>>> This lets my programs "understand" Finnish, and doesn't give me undue headaches.
>>> Seriously tending to locale issues would be an *endless swamp*. Just for this, I looked up something suitable to read:
>>> http://www.manpagez.com/man/1/perllocale/
>>> It may even be that you would find the time, but think about Walter and us, please. There *really are* other things to do.
>> I don't find that scary at all. It's quite what I expected. We should phase it in, after we do a good design. Also I don't plan to sit down and write locale definition files, I want to parse the XML in that locale repository I referred to.
> 
> I'm not following this thread carefully and I don't know if this is what
> you are implying, but: Please don't you even think in duplicating the
> locale stuff, at least on unix there is a very nice database that needs to
> be updated sometimes very often (due to stupid presidents like the one
> I have now that changes the summer saving time all the time).
> 
> PHP for example maintains a copy of this locale data and is a real PITA.
> 

You're right, we won't engage in the business of maintaining locale databases. We provide mechanism, not policy.

Andrei
March 02, 2009
Michel Fortin wrote:
> On 2009-03-02 08:32:40 -0500, Leandro Lucarella <llucax@gmail.com> said:
> 
>> I'm not following this thread carefully and I don't know if this is what
>> you are implying, but: Please don't you even think in duplicating the
>> locale stuff, at least on unix there is a very nice database that needs to
>> be updated sometimes very often (due to stupid presidents like the one
>> I have now that changes the summer saving time all the time).
>>
>> PHP for example maintains a copy of this locale data and is a real PITA.
> 
> I do agree.
> 
> In another post I proposed we create formatter classes for numbers and dates. This way, you can use a formatter binding to the UNIX database and APIs, or the Windows APIs, or Cocoa, etc., or you can build your own. All you need is a generic front end formatter interface you can bind to anything (and a common internal representation for dates) something like:
> 
>     interface DateFormatter
>     {
>         string timestampToString(int timestamp);
>         int stringToTimestamp(string date);
>     }
> 
>     DateFormatter defaultDateFormatter();
>     DateFormatter dateFormatterForLocale(string localeName);
> 
>     interface NumberFormatter
>     {
>         string intToString(int number);
>         int stringToInt(string number);
>     }
> 
>     NumberFormatter defaultNumberFormatter();
>     NumberFormatter numberFormatterForLocale(string localeName);
> 

This is exactly one thing I want to avoid for Phobos: defining class hierarchies for locales.

No.

If you want to provide a specific date formatter, you plant a delegate in the locale table. The code in Phobos doing formatting will detect that and call your delegate passing in the date. You do whatever you want on your side (format on the spot, use your own class hierarchy etc.)

Again: mechanism only. Not policy.


Andrei
March 02, 2009

Andrei Alexandrescu wrote:
> Georg Wrede wrote:
>> *** How to print arrays ***
>>
>> You print arrays in a predictable and expected way.
>>
>> D array printing is for non-GUI stuff. Hence, you use the C locale, period.
> 
> I think the C locale (or any predefined locale) tells what left bracket I should use for array, what separator, and what right bracket. For now the left and right brackets were eliminated because the user can easily add them on the caller side. The separator is a space simply because it looks the least harmful. But for example I don't have a good solution for what to print as the separator between a hash key and a hash value. A simple, extensible locale support would have allowed me to stop worrying about that.
> 
> Also, D array printing is not only for console - a GUI may use to!string with arrays.
> 
> But overall I guess I'll let myself bludgeoned into complacency...
> 
> 
> Andrei

As far as I'm concerned, an array should be printed as close to how it would be represented in the language as possible.  If the user needs to format the array, then they need to format the array, not the runtime.

  -- Daniel
March 02, 2009
Mon, 02 Mar 2009 09:34:32 +0200, Georg Wrede wrote:

>>> Of course, eventually we will want to "do something" about this. But that should be left to the day when real issues are all sorted out in D. This is a non-urgent, low-priority thing.
> 
> Had there been any need for locales, believe me, the "foreigners" in this NG would have asked for it.

I'm Russian.  For me, encoding problems are a PITA of such epic proportions that little format inconsistencies simply fade away.  Yes it's sometimes hard to decipher what 02/03/08 means since our custom is to put day first and separate with dots.  But compare this to Adobe Flex SDK which prints half compiler error messages in Russian (thank you Adobe!) using system default code page, 1251, while default /console/ code page is actually so-called IBM 866.  Whenever I use MXML compiler from console I get rubbish for error messages.  And there is no way to disable translation--I've found none.  Phobos is no better.  Any exception resulting from an invalid OS call dumps UTF-8 garbage instead of an error message.  std.file.read("non-existent") for instance.

I think games are not an issue.  I've worked for a company producing cell phone games for a long time.  I've localized my game for Chinese market, too.  The thing is, game interfaces are always custom, always ad-hoc.  They *never* work in untested locales.  Well, with some experience you can make them work most of the time in languages you are familiar with, from localization perspective.  Anyway, all you need to know is an ID of a supported locale so that you can replace text and locale-specific images accordingly.  Then you have correctors and native testing to make sure the localization works.
March 02, 2009
Michel Fortin wrote:
> On 2009-03-02 01:04:47 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:
> 
>> Good idea. But before we do so, I was hoping I'd pick the brains of people who have used locales in other languages and understand the burning points. Somehow, however, I'm doing a lousy job at eliciting contributions from people on this newsgroup (guess I'd be a lousy salesman). I tried a couple of times and all I got was a few new keyword proposals and a few new syntax proposals :o). What am I doing wrong?
> 
> I think there are three aspects to localization. One is date and number formating. Another is offering a facility for translating all the messages an application can give. And the last one is the configuration part, where you know which format to use.

Sounds like a good start.

> The only problem I've seen addressed by you right now is the configuration part; I believe it's the wrong end to start with.
> 
> We should start by defining how to perform the tasks I enumerated above: translating date and number formats, selecting strings for a given language. After that we can figure out how to pass the proper default configuration around. And then you're done.
> 
> For date and number formatting, I like very much the NSDateFormatter and NSNumberFormatter approach in Cocoa for instance: you have a base class to format dates, another for numbers; you can easily create your own subclass if you want, and there's a way to get the default formatter instance.

Well I was thinking of passing the buck around. Instead of std.locale defining a hierarchy for formatting numbers and dates, it provides a means for user code to plant a routine in the locale object that knows how to format numbers and dates. Of course, with time default localized routine implementations will show up (hopefully contributed to by people), but the basic mechanism is simple - there exists a locale table that allows you to store a delegate in it.

> This is extensible, because if you wanted to go further, you could add formatter classes for various units (length, mass...), or anything else.

This I want to avoid, at least for the time being. I want to define a table that can contain strings, integers, delegates, and other sub-tables. This is it. The path to extensibility will not be Phobos defining new classes to format various things. This could go on forever. Phobos will use the table consistently, and users who do want to format various things will simply plant their delegates in the table.

> Translating strings is a little harder because 1) strings are application-defined, 2) strings are often not available in the user's prefered language, adding the need for a fallback mecanism, and 3) different applications will want to to store those strings in different ways. Perhaps we could define a base class for getting translated strings, then allow the program to use whatever subclass it wants.

There's no need for classes and subclasses. It's all data. Why should we replace data with code? Data is easier.

Consider some code in phobos that must throw an exception:

throw Exception("File `%s' not found, system error is %s.",
    filename, errnomsg);

The localized version will look like this:

auto format = "File `%s' not found, system error is %s.";
auto localFormat = currentLocale ? currentLocale.peek(format) : null;
if (!localFormat) localFormat = format;
throw Exception(localFormat, filename, errnomsg);

What happens is that the default format string _is_ the key for looking up the localized strings. If there's no value for that string, the default format string is in vigor. Note that on the default path, currentLocale is null so there is hardly any inefficiency.

> Notice how I'm not using the word "locale" to talk about these things. "Locale" is a concept too abstract to be able to do something good with it. Since you could only define it using Algebraic type and a loosely defined tree of strings, that seems to confirm my view. Call the module std.locale if you want, but keep in mind that the most important task at hand is facilitating localization, not defining what constitutes a locale, that can wait.
> 

How should I call it?


Andrei
March 02, 2009
Sergey Gromov wrote:
> Mon, 02 Mar 2009 09:34:32 +0200, Georg Wrede wrote:
> 
>>>> Of course, eventually we will want to "do something" about this. But that should be left to the day when real issues are all sorted out in D. This is a non-urgent, low-priority thing.
>> Had there been any need for locales, believe me, the "foreigners" in this NG would have asked for it.
> 
> Phobos is no better.  Any
> exception resulting from an invalid OS call dumps UTF-8 garbage instead
> of an error message.  std.file.read("non-existent") for instance.

This is serendipitous. I just posted an example involving throwing a localized "File not found" exception. Please let me know whether that would help.

Andrei