June 25, 2004
In article <cbesja$1for$1@digitaldaemon.com>, Sean Kelly says...
>
>In article <cbe04d$6ri$1@digitaldaemon.com>, Arcane Jill says...
>>
>>In a way, Java does this quite well - at least in the sense that it is sufficiently powerful. Where it falls down is in ease of use. It is not easy to do even simple localization in Java. I'm wondering if that's something else we're stuck with - it is even possible to make it BOTH easy AND powerful?
>
>Good question.  C++ has a global locale setting and then IIRC you can associate a stream instance with a different locale.  Formatting information is stored in a class or set of classes that defines the various separator characters and such.  This is quite easy to use but fairly complicated to extend.  But the basic idea does seem to work pretty well.  How does Java work?

There are basically three big problems to solve in localization. Java solves all three. It doesn't necessarily have the BEST solutions, so we don't necessarily have to rip off the way Java does it, but we do need to solve the same three problems. These are:

PROBLEM 1 - HOW TO DEFINE A LOCALE
PROBLEM 2 - RESOURCE FILES
PROBLEM 3 - INDEXED VARIADIC FUNCTIONS


PROBLEM 1 - HOW TO DEFINE A LOCALE

The problem is how to define PRECISELY what you mean by "locale". All other locale stuff will depend on this definition.

Conceptually, a locale is something like "en-us", a string - but it's convenient not to have to deal with strings directly, because strings are slow, and you have to deal with issues of case, punctuation (hyphen verses underscore), and so on. Java solves this by having a class, Locale, which contains such strings, but it converts them to some standard internal format so it only has to do all that casing stuff ONCE.

My own view is that this is too complicated. Especially when you have to deal with the three-letter ISO-639 / ISO-3166 codes (as opposed to the two-letter codes). It strikes me that a simple enum would suffice. Then you could do:

#    import std.locale; // Just pretend
#
#    Language locale_lang = Language.EN;
#    Country locale_ctry = Country.US;

Bingo. Problem solved. Now whenever you want to pass locale information you only have to pass one or two enum values. In the case of ISO-3166 (countries) the actual enum VALUES are even defined for us by the standard. ISO-639 (languages) defines only the names, so we'd have to make up the values.

(And since emums will auto-cast into integers, you could templatize on them. I haven't thought through the implications of this).

C++, of course, defines locales by its own internal means, without regard to any ISO standard. I don't recommend we copy this. Way too messy.


PROBLEM 2 - RESOURCE FILES

This is a very simple problem to define. Given a locale, open a file containing information relevant to that locale.

For instance, you have an application, and in the course of execution, it prints stuff. You want it to print in English if the locale-language is Language.EN, French if the locale-language is Language.FR, and so on. So you open the "right" file (there will be one for each supported locale), read in all the text strings into an array, and then spit out the appropriate one at the the appropriate time.

The problem? Where do you find the file?

See - your app could have been installed anywhere on the user's filesystem. Same goes for your libraries. The current working directory could be anywhere. There is no "obvious" place to look.

Traditionally, the Linux answer is to get the "right place to look" from an environment variable. But then you have the problem of environment variable name clashes.

Traditionally, the Windows answer is to get the "right place to look" from the registry. No problem with name clashes here, but now you have a different problem - it takes about fifteen thousand lines of code to do the equivalent of getenv().

Enter Java's magnificent solution. Java has a class, ResourceBundle, which does the job. There are two forms, if memory serves correct. In one version, it parses a human-readable text file into a class. In the other version, it deserializes an arbitrary serialized object into a class. But either way, it knows where to look by SEARCHING THE JAVA CLASSPATH. What this means, in simple terms, is that source code is able to locate the file RELATIVE TO THE SOURCE FILE. And this alone is staggeringly powerful.

In D terms - you could put a file in the same directory as a source file, and just OPEN IT. Or, if it were in a different directory (in an associated package, say) which could be located RELATIVE TO THE SOURCE FILE then, again, that's problem solved.

However, this is a difficult thing to do in D, because there is no classpath. We may have to come up with other solutions to this one.



PROBLEM 3 - INDEXED VARIADIC FUNCTIONS

Once you've got your strings and your formatting classes, you're very nearly sorted. Just one problem remains - word order. Consider - this will work in standard English:

#    char[] a = "the server";
#    char[] b = "running";
#    char[] message = "%.*s is %.*s";
#    printf(&message[0], a, b);

Prints: the server is running

Assume that the first three lines could be substituted with fetching the values from a resource file or some other means, but the fourth line actually a part of your program. Trouble is, it won't work with all languages, because of word order. On planet Degobar (where Yoda comes from), you'd want to say "running, the server is" (rather than "the server is running"), so that printf just won't do. It can't change the word order.

Java has SOMETHING LIKE printf for this purpose. It takes a format string containing escape sequences such as %1, %2, %3, and so on, and the numbers tell you WHICH string you're referring to. (This is cumbersome in Java, because you have to use an array). But, basically, it lets you do:

#    char[] message = "%1 is %2"; // Standard English
#    char[] message = "%2, %1 is"; // Yoda-speak

The printf-like function can now plug the RIGHT values into the right places. Bingo - one string, correctly formatted for an arbitrary locale.

I understand that Walter is working on a D-printf() right now. If so, Walter, if you could give us some means of accessing the /nth/ argument (instead of always the /next/ argument), then that problem would be solved immediately.

Arcane Jill


1 2 3 4 5
Next ›   Last »