Two years ago, H. S. Teoh presented a proof of concept for automatic extraction of gettext-style translation strings. I recently combined that idea with the existing mofile package for reading translation tables in GNU gettext format, and the result is a feature rich solution for the support of multiple natural languages in D applications: https://code.dlang.org/packages/gettext. Perhaps not surprisingly, it can do more than GNU gettext itself.
I'd like to thank Steven Schveighoffer and Adam Ruppe for valuable forum assistance, and SARC B.V. for sponsoring. Some extracts from the readme are included below:
Features
- Concise translation markers that can be aliased to your preference.
- All marked strings that are seen by the compiler are extracted automatically.
- All (current and future) D string literal formats are supported.
- Static initializers of fields, constants, immutables, manifest constants and anonimous enums can be marked as translatable (a D specialty).
- Concatenations of translatable strings, untranslated strings and single chars are supported, even in initializers.
- Arrays of translatable strings are supported, also when statically initialized.
- Plural forms are language dependent, and play nice with format strings.
- Multiple identical strings are translated once, unless they are given different contexts.
- Notes to the translator can be attached to individual translatable strings.
- Code occurrences of strings are communicated to the translator.
- Available languages are discovered and selected at run-time.
- Platfom independent, not linked with C libraries.
- Automated generation of the translation table template.
- Automated merging into existing translations (requires GNU
gettext
utilities). - Automated generation of binary translation tables (requires GNU
gettext
utilities). - Includes utility for listing unmarked strings in the project.
Usage
Marking strings
Prepend tr!
in front of every string literal that needs to be translated. For instance:
writeln(tr!"This string is to be translated");
writeln("This string will remain untranslated.");
Plural forms
Sentences that should change in plural form depending on a number should supply both singlular and plural forms with the number like this:
// Before:
writefln("%d green bottle(s) hanging on the wall", n);
// After:
writeln(tr!("one green bottle hanging on the wall",
"%d green bottles hanging on the wall")(n));
Note that the format specifier (%d
, or %s
, etc.) is optional in the singular form.
Many languages have not just two forms like the English language does, and translations in those languages can supply all the forms that the particular language requires. This is handled by the translator, and is demonstrated in the example below.
Custom markers
If tr
is too verbose for you, you can change it to whatever you want:
import gettext : _ = tr;
writeln(_!"No green bottles...");
Marking format strings
Translatable strings can be format strings, used with std.format
and std.stdio.writefln
etc. These format strings do support plural forms, but the argument that determines the form must be supplied to tr
and not to format
. The corresponding format specifier will not be seen by format
as it will have been replaced with a string by tr
. Example:
format(tr!("Welcome %s, you may make a wish",
"Welcome %s, you may make %d wishes")(n), name);
The format specifier that selects the form is the last specifier in the format string (here %d
). In many sentences, however, the specifier that should select the form cannot be the last. In these cases, format specifiers must be given a position argument, where the highest position determines the form:
foreach (i, where; [tr!"hand", tr!"bush"])
format(tr!("One bird in the %1$s", "%2$d birds in the %1$s")(i + 1), where);
Again, the specifier with the highest position argument will never be seen by format
. On a side note, some translations may need a reordering of words, so translators may need to use position arguments in their translated format strings anyway.
Note: Specifiers with and without a position argument must not be mixed.
Concatenations
Translators will be able to produce the best translations if they get to work with full sentences, like
auto message = format(tr!`Could not open the file "%s" for reading.`, file);
However, in support of legacy code, concatenations of strings do work:
auto message = tr!`Could not open the file "` ~ file ~ tr!`" for reading.`;
Passing attributes
Optionally, two kinds of attributes can be passed to tr
, in the form of an associative array initializer. These are for passing notes to the translator and for disambiguating identical sentences with different meanings.
Passing notes to the translator
Sometimes a sentence can be interpreted to mean different things, and then it is important to be able to clarify things for the translator. Here is an example of how to do this:
auto name = tr!("Walter Bright", [Tr.note: "Proper name. Phonetically: ˈwɔltər braɪt"]);
The GNU gettext
manual has a section about the translation of proper names.
Disambiguate identical sentences
Multiple occurrences of the same sentence are combined into one translation by default. In some cases, that may not work well. Some language, for example, may need to translate identical menu items in different menus differently. These can be disambiguated by adding a context like so:
auto labelOpenFile = tr!("Open", [Tr.context: "Menu|File"]);
auto labelOpenPrinter = tr!("Open", [Tr.context: "Menu|File|Printer"]);
Notes and comments can be combined in any order:
auto message1 = tr!("Review the draft.", [Tr.context: "document"]);
auto message2 = tr!("Review the draft.", [Tr.context: "nautical",
Tr.note: `Nautical term! "Draft" = how deep ` ~
`the bottom of the ship is below ` ~
`the water level.`]);
Selecting a translation
Use the following functions to discover translation tables, get the language code for a table and activate a translation:
string[] availableLanguages(string moPath = null)
string languageCode() @safe
string languageCode(string moFile) @safe
void selectLanguage(string moFile) @safe
Note that any translation that happens before a language is selected, results in the value of the hard coded string.
There's more
See the full readme for adding and updating translations, impact on footprint and performance, limitations, and lots more.
Here's the result of two runs of one of the examples:
Hello! My name is Joe.
I'm counting one apple.
Hello! My name is Schmoe.
I'm counting 3 apples.
Hello! My name is Jane.
I'm counting 5 apples.
Hello! My name is Doe.
I'm counting 7 apples.
Привіт! Мене звати Joe.
Я рахую 1 яблуко.
Привіт! Мене звати Schmoe.
Я рахую 3 яблука.
Привіт! Мене звати Jane.
Я рахую 5 яблук.
Привіт! Мене звати Doe.
Я рахую 7 яблук.
Notice how the translation of "apple" in the last translation changes with three different endings dependent on the number of apples.
-- Bastiaan.