Jump to page: 1 2
Thread overview
Template wizardry and its cost
6 days ago
Bastiaan Veelo
6 days ago
Adam D Ruppe
6 days ago
Bastiaan Veelo
6 days ago
Bastiaan Veelo
6 days ago
Bastiaan Veelo
6 days ago
Bastiaan Veelo
5 days ago
WebFreak001
5 days ago
Bastiaan Veelo
June 20

Two years ago [1], [2], H. S. Teoh presented an ingenious proof of concept for a gettext-like system that automatically extracts translatable strings for i18n purposes:

	class Language { ... }
	Language curLang = ...;

	version(extractStrings) {
		private int[string] translatableStrings;
		string[] getTranslatableStrings() {
			return translatableStrings.keys;
		}
	}

	string gettext(string str)() {
		version(extractStrings) {
			static struct StrInjector {
				static this() {
					translatableStrings[str]++;
				}
			}
		}
		return curLang.translate(str);
	}

	...
	auto myFunc() {
		...
		writeln(gettext!"Some translatable message");
		...
	}
>

The gettext function uses a static struct to inject a static ctor into the program that inserts all translatable strings into a global AA. Then, when compiled with -version=extractStrings, this will expose the function getTranslatableStrings that returns a list of all translatable strings. Voila! No need for a separate utility to parse source code to discover translatable strings; this does it for you automatically. :-)

It could be made more fancy, of course, like having a function that parses the current l10n files and doing a diff between strings that got deleted / added / changed, and generating a report to inform the translator which strings need to be updated. This is guaranteed to be 100% reliable since the extracted strings are obtained directly from actual calls to gettext, rather than a 3rd party parser that may choke over uncommon syntax / unexpected formatting.

D is just this awesome.

I find this marvellous, and played with it over the weekend. It works brilliantly. I got it to integrate neatly in Dub projects, made it compatible with GNU gettext so that existing translation services and editors can be used (Poedit is awesome, thank you mofile).

But I don't think I'll go through with it this way. My problem is the template instantiation for every individual translatable string literal. I'd like to think the consequences are insignificant, but in large code bases I fear they won't be. And the issue is that only for version(extractStrings) the string needs to be a template argument, otherwise you'd want it to be an ordinary function argument. Maybe this is possible to achieve with string mixins, but probably not without getting much more verbose.

I am ready to be amazed with more wizardry, or to be convinced not to worry because inlining or something (it doesn't inline). Until then, I am thinking an external tool based on libdparse or dmd-as-a-library is probably the better approach; however awesome D is :-)

-- Bastiaan.

June 20
On Monday, 20 June 2022 at 08:37:13 UTC, Bastiaan Veelo wrote:
> It works brilliantly.
> [...]
> but in large code bases I fear they won't be.

*fear*

so what you're saying is you have no evidence there is an actual problem here, but have literally fallen prey to FUD.

In theory, generating hundreds of thousands of functions can indeed be a problem, even if they are small (though the biggest problems in dmd come when an individual function is large moreso than many small functions), but how may unique user-visible strings do you have, even in a large project?
6 days ago

On 6/20/22 4:37 AM, Bastiaan Veelo wrote:

>

But I don't think I'll go through with it this way. My problem is the template instantiation for every individual translatable string literal. I'd like to think the consequences are insignificant, but in large code bases I fear they won't be. And the issue is that only for version(extractStrings) the string needs to be a template argument, otherwise you'd want it to be an ordinary function argument. Maybe this is possible to achieve with string mixins, but probably not without getting much more verbose.

I am ready to be amazed with more wizardry, or to be convinced not to worry because inlining or something (it doesn't inline). Until then, I am thinking an external tool based on libdparse or dmd-as-a-library is probably the better approach; however awesome D is :-)

Let me dust off my wand ;)

struct TranslatedString {
    private string _str;
    string get() {
        return curLang.translate(_str);
    }
    alias get this;
}
template gettext(string str) {
    version(extractStrings) {
        shared static this() {
            ++translatableStrings.require(str); // use require here, even though the ++ works without it.
        }
    }
    enum gettext = TranslatedString(str);
}

What does this do? It still generates the template, but the key difference is that the TranslatedString type is not a template. An enum only exists in the compiler, it's as if you pasted the resulting code at the call site. So it should not take up any space, maybe 2 words for the string reference. But only one TypeInfo (if that's even needed, I'm not sure), and a minor CTFE-call for the construction.

It will take up space in the symbol table, but that goes away once compilation is done.

But in general, one should not be afraid of writing templates in D. I think there may be some room for improvement for performance with compiler hints, or improvements without them.

-Steve

6 days ago
On Monday, 20 June 2022 at 11:16:29 UTC, Adam D Ruppe wrote:
> how may unique user-visible strings do you have, even in a large project?

In total, we currently have 18997 individual translated strings. These are spread over 45 executables (unevenly).

-- Bastiaan.
6 days ago
On Monday, 20 June 2022 at 13:52:12 UTC, Bastiaan Veelo wrote:
> In total, we currently have 18997 individual translated strings. These are spread over 45 executables (unevenly).

That's nothing. Consider this test:

static foreach(i; 0 .. 20000)
        mixin("string a", i, " = gettext!(i.stringof);");

        void main() {}

$ /usr/bin/time dmd templatespam.d
0.32user 0.07system 0:00.40elapsed 98%CPU (0avgtext+0avgdata 135648maxresident)k
0inputs+1640outputs (0major+40899minor)pagefaults 0swaps


About 3x the memory and such of a basic hello world but as you can see, 0.3s and 135 MB ram is nothing to worry about.

What about 200,000 strings?

$ /usr/bin/time dmd templatespam.d
Command terminated by signal 11
1.93user 0.40system 0:02.33elapsed 99%CPU (0avgtext+0avgdata 1415580maxresident)k
0inputs+0outputs (0major+353926minor)pagefaults 0swaps


Now it is adding up, 2s and 1.4 GB build time/ram. Which is still inside the realm of acceptable cost, and it seems unlikely that you'd ever have 200,000 user visible strings in a single build unit anyway, well more than 10x what you have in your actual application right now.

Please note that adding -version=extractStrings has no significant impact on these numbers.

And this is with zero effort to optimize it.
6 days ago

On Monday, 20 June 2022 at 12:59:28 UTC, Steven Schveighoffer wrote:

>

Let me dust off my wand ;)

struct TranslatedString {
    private string _str;
    string get() {
        return curLang.translate(_str);
    }
    alias get this;
}
template gettext(string str) {
    version(extractStrings) {
        shared static this() {
            ++translatableStrings.require(str); // use require here, even though the ++ works without it.
        }
    }
    enum gettext = TranslatedString(str);
}

Wow. Man, this is interesting. My hat is off.

>

What does this do? It still generates the template, but the key difference is that the TranslatedString type is not a template. An enum only exists in the compiler, it's as if you pasted the resulting code at the call site. So it should not take up any space, maybe 2 words for the string reference. But only one TypeInfo (if that's even needed, I'm not sure), and a minor CTFE-call for the construction.

It will take up space in the symbol table, but that goes away once compilation is done.

You have put a big grin on my face, I like your potion!

-- Bastiaan.

6 days ago
On Monday, 20 June 2022 at 14:08:30 UTC, Adam D Ruppe wrote:
> About 3x the memory and such of a basic hello world but as you can see, 0.3s and 135 MB ram is nothing to worry about.

Thanks for bringing my heart rate down :-) I was looking at all the generated functions in the assembly, and Stefan's trick eliminates those.

-- Bastiaan.
6 days ago

As you and Adam pointed out, this may not be worth the trouble; But just to see if I can, I tried to extend your trick to a function taking an argument. I didn't find a way without using a delegate, and that gives a deprecation warning for escaping a reference. Can it be fixed? The code below leaves out the string extraction version, but is otherwise complete and can be pasted into run.dlang.io.

-- Bastiaan.

--- app.d
import gettext;
import std.stdio;

void main()
{
    foreach (n; 0 .. 3)
        writeln(tr!("one goose.", "%d geese.")(n));
}

--- gettext.d
import std;

private @safe struct TranslatableString
{
    string _str;
    string get()
    {
        return gettext(_str);
    }
    alias get this;
}
private @safe struct TranslatableStringPlural
{
    string _str, _strpl;
    string callFormat(int n)
    {
        auto fmt = ngettext(_str, _strpl, n);
        if (countFormatSpecifiers(fmt) == 0)
            // Hack to prevent orphan format arguments if "%d" is replaced by "one" in the singular form:
            return ()@trusted{ return fromStringz(&(format(fmt~"\0%s", n)[0])); }();
        return format(fmt, n);
    }
    string delegate(int) get()
    {
        return &callFormat;
    }
    alias get this;
}
template tr(string singular, string plural = null)
{
    static if (plural == null)
        enum tr = TranslatableString(singular);
    else
        enum tr = TranslatableStringPlural(singular, plural);
}

@safe: private:
int countFormatSpecifiers(string fmt) pure
{
    int count = 0;
    auto f = FormatSpec!char(fmt);
    if (!__ctfe)
    {
        while (f.writeUpToNextSpec(nullSink))
            count++;
    } else {
        auto a = appender!string; // std.range.nullSink does not work at CT.
        while (f.writeUpToNextSpec(a))
            count++;
    }
    return count;
}
// Translation happens here:
string gettext(string str)
{
    return str;
}
string ngettext(string singular, string plural, int n)
{
    return n == 1 ? singular : plural;
}
6 days ago

On 6/20/22 7:01 PM, Bastiaan Veelo wrote:

>

As you and Adam pointed out, this may not be worth the trouble; But just to see if I can, I tried to extend your trick to a function taking an argument. I didn't find a way without using a delegate, and that gives a deprecation warning for escaping a reference. Can it be fixed? The code below leaves out the string extraction version, but is otherwise complete and can be pasted into run.dlang.io.

structs can be functors:

private @safe struct TranslatableStringPlural
{
    string _str, _strpl;
    this(string s1, string s2) { // this is unfortunately necessary
        _str = s1;
        _strpl = s2;
    }
    string opCall(int n)
    {
        auto fmt = ngettext(_str, _strpl, n);
        if (countFormatSpecifiers(fmt) == 0)
            // Hack to prevent orphan format arguments if "%d" is replaced by "one" in the singular form:
            return ()@trusted{ return fromStringz(&(format(fmt~"\0%s", n)[0])); }();
        return format(fmt, n);
    }
}

I will say, I find this line... reprehensible ;)

return ()@trusted{ return fromStringz(&(format(fmt~"\0%s", n)[0])); }();

Actually, the whole function (and the format specifier counter, etc) is very obtuse.

How about:

private @safe struct Strplusarg {
    this(string s) {
        fmt = s;
        auto fs = countFormatSpecifiers(fmt);
        assert(fs == 0 || fs == 1, "Invalid number of specifiers"); // bonus sanity check
        hasArg = fs == 1;
    }
    string fmt;
    bool hasArg;
}

private @safe struct TranslatableStringPlural
{
    Strplusarg _str, _strpl;
    this(string s1, string s2) { // this is unfortunately necessary
        _str = s1;
        _strpl = s2;
    }
    string opCall(int n)
    {
        auto f = n == 1 ? _str : _strpl;
        return f.hasArg ? format(f.fmt, n) : f.fmt;
    }
}

And we can fix your countFormatSpecifiers function so it doesn't have the __ctfe branch

@safe: private:
int countFormatSpecifiers(string fmt) pure
{
    static void ns(const(char)[] arr) {} // the simplest output range
    auto nullSink = &ns;
    int count = 0;
    auto f = FormatSpec!char(fmt);
    while (f.writeUpToNextSpec(nullSink))
        count++;
    return count;
}

But.... I don't see any actual translation happening in the plural/singular form? Is that expected? If it's supposed to happen in ngettext, that translation surely has to be done inside the opCall, and if you can vary the parameter count based on language, then so does the count for the argument specifiers.

In any case, lots to ingest and figure out how it fits your needs.

-Steve

6 days ago

On Tuesday, 21 June 2022 at 02:35:28 UTC, Steven Schveighoffer wrote:

>

structs can be functors:

That was one of the things I tried, but I missed this bit:

>
    this(string s1, string s2) { // this is unfortunately necessary
        _str = s1;
        _strpl = s2;
    }

[...]

>

I will say, I find this line... reprehensible ;)

I left my dirty laundry out in the hopes it would trigger you ;-) Thanks for the rinse, it looks great!

>

But.... I don't see any actual translation happening in the plural/singular form?

That's because the counting needs to be done on the translated string. Fixed below.

Thanks Steve, looks like I'll be releasing my first Dub package soonish.

-- Bastiaan.

--- app.d
import gettext;
import std.stdio;

void main()
{
    foreach (n; 0 .. 3)
        writeln(tr!("one goose.", "%d geese.")(n));
}

--- gettext.d
import std;

private @safe struct TranslatableString
{
    string _str;
    string get()
    {
        return gettext(_str);
    }
    alias get this;
}
private @safe struct Strplusarg {
    this(string s) {
        fmt = s;
        auto fs = countFormatSpecifiers(fmt);
        assert(fs == 0 || fs == 1, "Invalid number of specifiers"); // bonus sanity check
        hasArg = fs == 1;
    }
    string fmt;
    bool hasArg;
}
private @safe struct TranslatableStringPlural
{
    string _str, _strpl;
    this(string s1, string s2) { // this is unfortunately necessary
        _str = s1;
        _strpl = s2;
    }
    string opCall(int n)
    {
        auto f = Strplusarg(ngettext(_str, _strpl, n));
        return f.hasArg ? format(f.fmt, n) : f.fmt;
    }
}
template tr(string singular, string plural = null)
{
    static if (plural == null)
        enum tr = TranslatableString(singular);
    else
        enum tr = TranslatableStringPlural(singular, plural);
}

@safe: private:
int countFormatSpecifiers(string fmt) pure
{
    static void ns(const(char)[] arr) {} // the simplest output range
    auto nullSink = &ns;
    int count = 0;
    auto f = FormatSpec!char(fmt);
    while (f.writeUpToNextSpec(nullSink))
        count++;
    return count;
}
// Translation happens here:
string gettext(string str)
{
    return str;
}
string ngettext(string singular, string plural, int n)
{
    return n == 1 ? singular : plural;
}

0 geese.
one goose.
2 geese.

« First   ‹ Prev
1 2