Internationalization - an open discussion (page 3) - D Programming Language Discussion Forum

Juanjo Álvarez wrote: > Thomas Kuehne wrote: > > >>Juanjo Álvarez: >> >>>Anyway if we all can discuss the matter and come with a better solution >> >>than >> >>>gettext (which I'm sure it's possible) I doubt many will be opposed. >> >>Just to point out some other - not neccessary better - localization libs: >> >>qt/kde: http://doc.trolltech.com/3.0/linguist-manual.html >>java: http://java.sun.com/j2se/1.5.0/docs/api/java/util/Formattable.html > > > I don't know about the Java implementation but Qt tr() is very similar to > gettext (the format of the translation files is different.) I don't know if > KDE uses gettext internally but they use po/mo files just like gettext (and > with the same format.) With the Java MessageFormat solution, things work fairly well. Consider for instance: MessageFormat.format("There was a problem in {0}, where {2} parse errors encountered at line {1}", location, lineNum, numErrs); There is a way to map which argument goes to which location in the line. Also, the MessageFormat does have a shorthand for kind of an if:then construct so that the same message would be interpreted differently for plurals/etc. That makes it convenient to handle those issues in I18N. However, I would not say that the MessageFormat is super easy to use. It could have a better interface, but the concepts are pretty decent.

In article <cdgq1p$gqi$1@digitaldaemon.com>, Arcane Jill says... > >In article <cdgiqm$dua$1@digitaldaemon.com>, Juanjo =?ISO-8859-15?Q?=C1lvarez?= says... >> >>"There are %1$d %2$s %3$s" >> >>So translator can change the numbers thus changing the word order. > >Is this a feature of printf()? If so, is a Linux thing or an all-platform thing? >And (probably a silly question, but someone might know the answer) is this >functionality available in the new writef()? It's not a feature of printf and AFAIK it's not in the new writef either. Semi-related: I'm recoding my scanf implementation as unFormat (to match doFormat) and changing the calling syntax to readf. So with any luck there will be both input and output routines written in D. Sean

In article <cdgiqm$dua$1@digitaldaemon.com>, Juanjo =?ISO-8859-15?Q?=C1lvarez?= says... >Mmm, not the GNU gettext, you can put: > >printf(_("There are %d %s %s\n"), count, _(color), _(name)); > >And the output po file will be: > >"There are %1$d %2$s %3$s" > >So translator can change the numbers thus changing the word order. Well, that /sounds/ like the kind of thing we need, but your above example is a little unclear to those of us who have not used gettext() before. As I read the above, and assuming that _() is the text-localizing function, that wouldn't change the word order. But you say it does, so I must have misunderstood something. Can you break that down into steps? Berin mentioned Java's MessageFormat class. This does the job of word order switching. It's cumbersome to use in practice, but we could still borrow the technique if we so needed. We will certainly find a way to do word reording in D. The question is where is the right place for that? Does gettext do it? Should we petition Walter to get writef() to do it? Would Hauke's string class be the right place. We need more information.... Jill

July 19, 2004

Re: Internationalization - an open discussion

Posted by Hauke Duden
in reply to Arcane Jill

Permalink

Hauke Duden

Posted in reply to Arcane Jill

Permalink

Arcane Jill wrote:
> In article <cdgiqm$dua$1@digitaldaemon.com>, Juanjo =?ISO-8859-15?Q?=C1lvarez?=
> says...
> 
> 
>>Mmm, not the GNU gettext, you can put:
>>
>>printf(_("There are %d %s %s\n"), count, _(color), _(name));
>>
>>And the output po file will be:
>>
>>"There are %1$d %2$s %3$s"
>>
>>So translator can change the numbers thus changing the word order.
> 
> 
> Well, that /sounds/ like the kind of thing we need, but your above example is a
> little unclear to those of us who have not used gettext() before. As I read the
> above, and assuming that _() is the text-localizing function, that wouldn't
> change the word order. But you say it does, so I must have misunderstood
> something. Can you break that down into steps?
> 
> Berin mentioned Java's MessageFormat class. This does the job of word order
> switching. It's cumbersome to use in practice, but we could still borrow the
> technique if we so needed.

I've been using a pretty simple but effective technique for quite some time. The translatable string can contain place holders of the form %NAME% and the translation function can take a map parameter that inserts the correct values.

This also has the advantage of better documentation. It is pretty hard to deduce the intended meaning of a string like "There are %d %ss in the %s". It gets easier if you have something like "There are %NUM% %OBJ%s in the %CONTAINER%". Less room for error.

I have also found that it can sometimes be helpful to be able to include some kind of comment for the translator that describes the intended use or any constraints of the string. For example "keep this as short as possible" or "context is file I/O". I implemented this by adding an optional parameter that can be passed to the translation function. It is ignored at runtime, but the "harvester" tool that extracts the strings from the code files includes it in the translatable files.

And last but not least, I think translatable strings should have an ID (a string ID, not a number). Not all strings that are the same in one language are the same in other languages. So if the translation is bound to the original text then you can have situations where you need to specify two different texts for two different contexts, but you are not able to do so, because the original text serves as ID/key.

A good example that I encountered a few years ago:
At the time I played the german version of the game Baldurs Gate. It contained some horrible text bugs that obviously originated from a translation system where the original text served as the ID.
One particular case was the text "XXX attacks YYY" that was displayed whenever one character attacked another. "attacks" in english can mean the plural of the noun "attack" or it can be a form of the verb "to attack". In this case it is the verb form. Unfortunately it was translated with the German plural of the noun, which is different from the verb (probably because it was also used in a different context where it meant the noun). So that the translation made no sense at all.

Hauke

Arcane Jill wrote: > In article <cdgiqm$dua$1@digitaldaemon.com>, Juanjo =?ISO-8859-15?Q?=C1lvarez?= > says... > >>"There are %1$d %2$s %3$s" >> >>So translator can change the numbers thus changing the word order. > > Is this a feature of printf()? If so, is a Linux thing or an all-platform thing? > And (probably a silly question, but someone might know the answer) is this > functionality available in the new writef()? It depends on whose printf() you're looking at. Standard C - no. POSIX - yes. See: http://www.opengroup.org/onlinepubs/009695399/functions/fprintf.html I discussed this once before in this news group, a few weeks after the thread had gone stale (mainly because I only just started to pay attention to D). ...dig...dig...dig...Friday 9th July 2004... http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/5662 -- Jonathan Leffler #include <disclaimer.h> Email: jleffler@earthlink.net, jleffler@us.ibm.com Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/

Forums