Thread overview
Proof of concept: automatic extraction of gettext-style translation strings
Apr 02, 2020
H. S. Teoh
Apr 02, 2020
Adam D. Ruppe
Apr 02, 2020
Sönke Ludwig
Apr 02, 2020
Adam D. Ruppe
April 02, 2020
This morning a neat idea occurred to me for a gettext-like system in D that allows automatic and reliable extraction of all translation strings from a program, that doesn't need an external parser to run over the program source code.

Traditionally, gettext requires an external tool to parse the source
code and extract translatable strings.  In D, however, we can take
advantage of (1) passing the format string at compile-time to gettext(),
which then allows (2) using static this() to register all format strings
at runtime to a central dictionary of format strings, regardless of
whether the corresponding gettext() call actually got called at runtime.
(3) Wrap that in a version() condition, and you can have the compiler do
the string extraction for you without needing an external source code
parser.

Here's a proof of concept:

	// ------------------------------------------------------------------
	// File: lang.d
	version(extractStr) {
		int[string] allStrings;
		void main() {
			import std.algorithm;
			import std.stdio;
			auto s = allStrings.keys;
			s.sort();
			writefln("string[string] dict = [\n%(\t%s: \"\",\n%|%)];", s);
		}
	}

	template gettext(string fmt, Args...)
	{
		version(extractStr)
		static this() {
			allStrings[fmt]++;
		}
		string gettext(Args args) {
			import std.format;
			return format(fmt, args);
		}
	}

	// ------------------------------------------------------------------
	// File: main.d
	import mod1, mod2;

	version(extractStr) {} else
	void main() {
		auto names = [ "Joe", "Schmoe", "Jane", "Doe" ];
		foreach (i; 0 .. names.length) {
			fun1(names[i]);
			fun2(5 + cast(int)i*10);
		}
	}

	// ------------------------------------------------------------------
	// File: mod1.d
	import std.stdio;
	import lang;

	void fun1(string name) {
		writeln(gettext!"Hello! My name is %s."(name));
	}

	// ------------------------------------------------------------------
	// File: mod2.d
	import std.stdio;
	import lang;

	void fun2(int num) {
		writeln(gettext!"I'm counting %d apples."(num));
	}

	void fun3() {
		writeln(gettext!"Never called, but nevertheless registered!");
	}


Running the program normally with `dmd -i -run main.d` produces the output:

	Hello! My name is Joe.
	I'm counting 5 apples.
	Hello! My name is Schmoe.
	I'm counting 15 apples.
	Hello! My name is Jane.
	I'm counting 25 apples.
	Hello! My name is Doe.
	I'm counting 35 apples.


Format strings can be extracted by compiling with -version=extractStr:

	dmd -i -version=extractStr -run main.d

which produces a template for translating the format strings into another language:

	string[string] dict = [
		"Hello! My name is %s.": "",
		"I'm counting %d apples.": "",
		"Never called, but nevertheless registered!": "",
	];


The idea is that in a real implementation gettext(), it would look up the format string in the l10n file containing a filled-out instance of the above dictionary and map it to the target language. It could also have a fancier extractStr that merges new format strings into an existing translated file, so that l10n files can be continually updated as development proceeds.

The best thing about this is that no additional tooling is required; the string extraction process is 100% reliable and not prone to bugs in an external parser, and done completely within D.


T

-- 
Computerese Irregular Verb Conjugation: I have preferences.  You have biases.  He/She has prejudices. -- Gene Wirchenko
April 02, 2020
On Thursday, 2 April 2020 at 13:01:09 UTC, H. S. Teoh wrote:
> This morning a neat idea occurred to me for a gettext-like system in D that allows automatic and reliable extraction of all translation strings from a program, that doesn't need an external parser to run over the program source code.

Indeed, I have played with this before, it is really cool.

I almost wrote it as an example of my string interpolation proposal, since with my proposal, it would be possible to run this over i"" strings passed to a particular function too.

D rox.
April 02, 2020
Am 02.04.2020 um 15:04 schrieb Adam D. Ruppe:
> On Thursday, 2 April 2020 at 13:01:09 UTC, H. S. Teoh wrote:
>> This morning a neat idea occurred to me for a gettext-like system in D that allows automatic and reliable extraction of all translation strings from a program, that doesn't need an external parser to run over the program source code.
> 
> Indeed, I have played with this before, it is really cool.
> 
> I almost wrote it as an example of my string interpolation proposal, since with my proposal, it would be possible to run this over i"" strings passed to a particular function too.
> 
> D rox.

I'm doing the same in my UI framework. In addition to being able to collect all strings at compile-time, it is also possible to translate and verify the existence of strings at compile-time by loading and parsing the PO-files with CTFE.

BTW, I never got around to commenting on the string interpolation topic, but the inability to translate i"" strings was my biggest practical concern. Never really understood the reluctance against lowering to a template instantiation, though.

(PS: nevermind the e-mail, I can't handle my e-mail client properly)
April 02, 2020
On Thursday, 2 April 2020 at 15:53:58 UTC, Sönke Ludwig wrote:
> BTW, I never got around to commenting on the string interpolation topic, but the inability to translate i"" strings was my biggest practical concern. Never really understood the reluctance against lowering to a template instantiation, though.

Yeah, I don't wanna derail too much but my version here:
https://github.com/dlang/DIPs/pull/186

could be used. Here, I kept the part I cut out of the file - I just wan't happy with the details all being right, but the thrust of it has:

-----

##### Internationalization

The string must be processed in whole, with as much context as possible for the translator to do a good job. If the string was
broken up into a tuple, it would be very difficult for a translator to make sense of it. With `_d_interpolated_string`, however,
the static components are clearly separated from, while still being clearly associated with, their companion runtime arguments, and are indeed available at compile time.

Moreover, it may be necessary to reorder words and act on factors like plurality. With the templated version together with a helper function (e.g. `translate(i"I have $count apples")` you can get a compile-time list of strings needing translation and write runtime functions to handle localization details as required for individual strings.

```
string translate(d_interpolated_string!("I have ", spec(null), " apples") spec, int count) {
   if(count == 1)
        return "I have 1 apple";
   else
        return format(spec.ToFormatString!"%d", count);
}
```

------


I never finished that section, I just wasn't happy with my examples and arguments, but it is one of the things on my mind. You could do complicated logic in D itself all verified at compile time.

Or just pass to a runtime thing like gnu gettext possibly with helper templates.


D has a LOT of potential in this area that we have barely even scratched the surface of.