Thread overview
The cost of doing compile time introspection
May 10, 2017
Moritz Maxeiner
May 10, 2017
Biotronic
May 10, 2017
Stefan Koch
May 11, 2017
Timon Gehr
May 11, 2017
Stefan Koch
May 10, 2017
Moritz Maxeiner
May 11, 2017
Moritz Maxeiner
May 11, 2017
Timon Gehr
May 12, 2017
Moritz Maxeiner
May 10, 2017
So, after being asked to support dynamic loading in llvm-d[1] at DConf 2017 I essentially had the following two options:
- have two separate declarations of the bindings, one with the function signatures (linking) and one with function pointers (loading)
- Have one declaration and derive the other at compile time

Since this is D and I was reminded of Andrei's keynote I thought to myself "use the Compiler, Dummy" and went about introspecting[2] the function signatures[3]. The relevant code looks like this:

---
import functions = llvm.functions.link;

template isCFunction(alias scope_, string member)
{
    static if (isFunction!(__traits(getMember, scope_, member)) &&
               (functionLinkage!(__traits(getMember, scope_, member)) == "C" ||
                functionLinkage!(__traits(getMember, scope_, member)) == "Windows")) {
        enum isCFunction = true;
    } else {
        enum isCFunction = false;
    }
}

template CFunctions(alias mod)
{
    alias isCFunction(string member) = .isCFunction!(mod, member);
    alias CFunctions = Filter!(isCFunction, __traits(allMembers, mod));
}

string declareStubs()
{
    import std.array : appender;
    auto code = appender!string;
    foreach (fn; CFunctions!functions) {
        code.put("typeof(functions."); code.put(fn);
        code.put(")* "); code.put(fn); code.put(";\n");
    }
    return code.data;
}

mixin (declareStubs);
---

Now, the above code essentially searches through all symbols in the llvm.functions.link module, filters out anything that's not an extern(System) function and then generates code declaring a function pointer for each of them, and then finally mixes that code in.

This increases compilation time on my Broadwell i5-5200U from faster than my terminal emulator can print to 4 seconds. Yikes. From experience I knew that it wouldn't be cheap, but that's a hefty cost for a conceptually simple use case (from my PoV).

You might ask why I need to use CTFE here (as the interpreter is not cheap): It's because I want the mixed-in function pointers to be at module scope and I'm not aware of any other way to do that currently; if the new static foreach works the way I suspect, then that part could essentially be replaced (at module scope) with something like

---
static foreach (fn; CFunctions!functions) {
    typeof(functions.fn)* __traits(identifier, fn);
}
---

Looks much nicer, but will it be fast? I don't know, but I hope so, since it shouldn't need to spin up CTFE.

TLDR: Compile time introspection still increases compile time from below human response time (i.e. without encouraging context switching in your brain) to breaking your concentration even for simple use cases.

[1] https://github.com/Calrama/llvm-d
[2] https://github.com/Calrama/llvm-d/blob/master/source/llvm/functions/load.d
[3] https://github.com/Calrama/llvm-d/blob/master/source/llvm/functions/link.d
May 10, 2017
On Wednesday, 10 May 2017 at 11:45:05 UTC, Moritz Maxeiner wrote:
[CTFE slow]

First, as you may know, Stefan Koch is working on an improved CTFE engine that will hopefully make things a lot better.

As for making the code faster right now, could this be done with mixin templates instead?

Something like:

import functions = llvm.functions.link;
import std.meta, std.traits;

template isCFunction(alias member)
{
    static if (isFunction!(member) &&
               (functionLinkage!(member) == "C" ||
                functionLinkage!(member) == "Windows")) {
        enum isCFunction = true;
    } else {
        enum isCFunction = false;
    }
}

template CFunctions(alias scope_)
{
    alias GetSymbol(string member) = AliasSeq!(__traits(getMember, scope_, member));
    alias CFunctions = Filter!(isCFunction, staticMap!(GetSymbol, __traits(allMembers, scope_)));
}

mixin template declareStubsImpl(T...)
{
    static if (T.length == 0) {
    } else {
        mixin("extern (C) typeof(T[0])* "~__traits(identifier, T[0])~";");
        mixin declareStubsImpl!(T[1..$]);
    }
}

mixin template declareStubs(alias scope_)
{
    mixin declareStubsImpl!(CFunctions!scope_);
}

mixin declareStubs!functions;

(tests indicate this compiles but doesn't run, and I'm helping you on company time, so I can't fix everything right now :p)


> if the new static foreach works the way I suspect, then that part could essentially be replaced (at module scope) with something like
>
> ---
> static foreach (fn; CFunctions!functions) {
>     typeof(functions.fn)* __traits(identifier, fn);
> }

A few things here - functions.fn would not do what you want, and neither would __traits(identifier).
functions.fn would treat "fn" like a part of name, not a string value, so this will make the poor compiler barf.
__traits(identifier, fn) expects fn to be a symbol, while here it's a string. In fact, it's exactly the string you want __traits to return.
Lastly, you'll still need a mixin, whether it's for __traits(identifier, fn) or just fn - they're just strings. Something like this:

static foreach (fn; CFunctions!functions) {
    mixin("typeof(__traits(getMember, functions, fn))* "~fn~";");
}


May 10, 2017
On Wednesday, 10 May 2017 at 14:03:58 UTC, Biotronic wrote:
> On Wednesday, 10 May 2017 at 11:45:05 UTC, Moritz Maxeiner wrote:
> [CTFE slow]
>
> First, as you may know, Stefan Koch is working on an improved CTFE engine that will hopefully make things a lot better.
>
It will not; This is issue is caused by templates, and not by CTFE.

May 10, 2017
On Wednesday, 10 May 2017 at 14:03:58 UTC, Biotronic wrote:
> On Wednesday, 10 May 2017 at 11:45:05 UTC, Moritz Maxeiner wrote:
> [CTFE slow]
>
> First, as you may know, Stefan Koch is working on an improved CTFE engine that will hopefully make things a lot better.

As he already pointed out, it won't help with the introspection template costs, though.

>
> As for making the code faster right now, could this be done with mixin templates instead?
>
> Something like:
>
> [...]

This will move from CTFE to templates, sure, but templates aren't that fast, either.
I'll experiment with your suggestion regardless to see what it'll result in, but I'm not very optimistic about that solving the issue.

>
> (tests indicate this compiles but doesn't run, and I'm helping you on company time, so I can't fix everything right now :p)
>

Thank you, I'll gladly take any advice and try to optimize things, though my primary purpose with this post was not to ask for help (for once :p ), but to publicize my experience with compile time introspection costs as a reference for others (and because a certain someone asked me too - you know who you are).

>
>> [...]
>
> A few things here - functions.fn would not do what you want, and neither would __traits(identifier).
> functions.fn would treat "fn" like a part of name, not a string value, so this will make the poor compiler barf.
> __traits(identifier, fn) expects fn to be a symbol, while here it's a string. In fact, it's exactly the string you want __traits to return.
> Lastly, you'll still need a mixin, whether it's for __traits(identifier, fn) or just fn - they're just strings.

Right, but it'll still be one fairly short, readable line in the end (that should be faster than CTFE).

May 11, 2017
On Wednesday, 10 May 2017 at 14:03:58 UTC, Biotronic wrote:
>
> As for making the code faster right now, could this be done with mixin templates instead?
>
> Something like:
>
> import functions = llvm.functions.link;
> import std.meta, std.traits;
>
> template isCFunction(alias member)
> {
>     static if (isFunction!(member) &&
>                (functionLinkage!(member) == "C" ||
>                 functionLinkage!(member) == "Windows")) {
>         enum isCFunction = true;
>     } else {
>         enum isCFunction = false;
>     }
> }
>
> template CFunctions(alias scope_)
> {
>     alias GetSymbol(string member) = AliasSeq!(__traits(getMember, scope_, member));
>     alias CFunctions = Filter!(isCFunction, staticMap!(GetSymbol, __traits(allMembers, scope_)));
> }
>
> mixin template declareStubsImpl(T...)
> {
>     static if (T.length == 0) {
>     } else {
>         mixin("extern (C) typeof(T[0])* "~__traits(identifier, T[0])~";");
>         mixin declareStubsImpl!(T[1..$]);
>     }
> }
>
> mixin template declareStubs(alias scope_)
> {
>     mixin declareStubsImpl!(CFunctions!scope_);
> }
>
> mixin declareStubs!functions;

After testing this approach out, I couldn't even time it. Why? Because the compiler pretty much immediately hits the (I think fixed) recursive template expansion limit. The LLVM C API has too many functions for this :/
May 11, 2017
On 10.05.2017 16:03, Biotronic wrote:
>>
>
> A few things here - functions.fn would not do what you want, and neither
> would __traits(identifier).
> functions.fn would treat "fn" like a part of name, not a string value,
> so this will make the poor compiler barf.
> __traits(identifier, fn) expects fn to be a symbol, while here it's a
> string. In fact, it's exactly the string you want __traits to return.
> Lastly, you'll still need a mixin, whether it's for __traits(identifier,
> fn) or just fn - they're just strings. Something like this:
>
> static foreach (fn; CFunctions!functions) {
>     mixin("typeof(__traits(getMember, functions, fn))* "~fn~";");
> }

Yes, this works and is a few times faster.
It's slightly faster when inlining the condition:

static foreach(fn;__traits(allMembers, functions)){
    static if (isFunction!(__traits(getMember, functions, fn)) &&
               (functionLinkage!(__traits(getMember, functions, fn)) == "C" ||
                functionLinkage!(__traits(getMember, functions, fn)) == "Windows")){
        mixin("typeof(functions."~fn~")* "~fn~";");
    }
}

With the DMD debug build, I measured the following times on my machine:

Baselines:

just imports:
0m0.318s

copy-pasted generated code after printing it with pragma(msg, ...):
0m0.341s

Compile-time code generation:

old version:
0m2.569s

static foreach, uninlined:
0m0.704s

static foreach inlined:
0m0.610s

Still not great, but a notable improvement.


isFunction and functionLinkage are slow, so I got rid of them (as well as the dependency on std.traits):

static foreach(fn;__traits(allMembers, functions)){
    static if(fn != "object" && fn != "llvm" && fn != "orEmpty"):
    mixin("typeof(functions."~fn~")* "~fn~";");
}

timing:
0m0.350s

(This is not perfect as you'll need to edit the list in case you are adding more non-c-function members to that module, but I guess it is a good trade-off.)



You can achieve essentially the same using a string mixin:

mixin({
    string r;
    foreach(fn;__traits(allMembers, functions))
        if(fn != "object" && fn != "llvm" && fn != "orEmpty")
            r~="typeof(functions."~fn~")* "~fn~";";
    return r;
}());

timing:
0m0.370s



In case the original semantics should be preserved, I think this is the best option:

mixin({
    string r;
    foreach(fn;CFunctions!functions)
        r~="typeof(functions."~fn~")* "~fn~";";
    return r;
}());

timing:
0m0.740s
May 11, 2017
On 10.05.2017 16:28, Stefan Koch wrote:
> On Wednesday, 10 May 2017 at 14:03:58 UTC, Biotronic wrote:
>> On Wednesday, 10 May 2017 at 11:45:05 UTC, Moritz Maxeiner wrote:
>> [CTFE slow]
>>
>> First, as you may know, Stefan Koch is working on an improved CTFE
>> engine that will hopefully make things a lot better.
>>
> It will not; This is issue is caused by templates, and not by CTFE.
>

I think my measurements show that the main bottleneck is actually Appender in CTFE, while templates contribute a smaller, yet significant, amount.
May 11, 2017
On Thursday, 11 May 2017 at 21:57:06 UTC, Timon Gehr wrote:
> On 10.05.2017 16:28, Stefan Koch wrote:
>> On Wednesday, 10 May 2017 at 14:03:58 UTC, Biotronic wrote:
>>> On Wednesday, 10 May 2017 at 11:45:05 UTC, Moritz Maxeiner wrote:
>>> [CTFE slow]
>>>
>>> First, as you may know, Stefan Koch is working on an improved CTFE
>>> engine that will hopefully make things a lot better.
>>>
>> It will not; This is issue is caused by templates, and not by CTFE.
>>
>
> I think my measurements show that the main bottleneck is actually Appender in CTFE, while templates contribute a smaller, yet significant, amount.

You are correct.

I should not have made this statement without actually measuring.

Still templates produce the enormous amounts of code that ctfe has to wade trough.

So while they are not the bottleneck in this case, they are still the cause.

May 12, 2017
On Thursday, 11 May 2017 at 21:09:05 UTC, Timon Gehr wrote:
> [...]
>
> Yes, this works and is a few times faster.
> It's slightly faster when inlining the condition:
>
> static foreach(fn;__traits(allMembers, functions)){
>     static if (isFunction!(__traits(getMember, functions, fn)) &&
>                (functionLinkage!(__traits(getMember, functions, fn)) == "C" ||
>                 functionLinkage!(__traits(getMember, functions, fn)) == "Windows")){
>         mixin("typeof(functions."~fn~")* "~fn~";");
>     }
> }
>
> With the DMD debug build, I measured the following times on my machine:
>
> Baselines:
>
> just imports:
> 0m0.318s
>
> copy-pasted generated code after printing it with pragma(msg, ...):
> 0m0.341s
>
> Compile-time code generation:
>
> old version:
> 0m2.569s
>
> static foreach, uninlined:
> 0m0.704s
>
> static foreach inlined:
> 0m0.610s
>
> Still not great, but a notable improvement.
>
>
> isFunction and functionLinkage are slow, so I got rid of them (as well as the dependency on std.traits):
>
> static foreach(fn;__traits(allMembers, functions)){
>     static if(fn != "object" && fn != "llvm" && fn != "orEmpty"):
>     mixin("typeof(functions."~fn~")* "~fn~";");
> }
>
> timing:
> 0m0.350s
>
> (This is not perfect as you'll need to edit the list in case you are adding more non-c-function members to that module, but I guess it is a good trade-off.)
>
>
>
> You can achieve essentially the same using a string mixin:
>
> mixin({
>     string r;
>     foreach(fn;__traits(allMembers, functions))
>         if(fn != "object" && fn != "llvm" && fn != "orEmpty")
>             r~="typeof(functions."~fn~")* "~fn~";";
>     return r;
> }());
>
> timing:
> 0m0.370s
>
>
>
> In case the original semantics should be preserved, I think this is the best option:
>
> mixin({
>     string r;
>     foreach(fn;CFunctions!functions)
>         r~="typeof(functions."~fn~")* "~fn~";";
>     return r;
> }());
>
> timing:
> 0m0.740s

Thank you for the detailed comparison. I have applied your optimizations (with minor refactoring that did not impact compile time for me) and ended up with this (sorry for some name changes, wasn't happy with my original ones):

---
import link = llvm.functions.link;

bool isSym(string m) { return m != "object" && m != "llvm" && m != "orEmpty"; }

string declareSymPtr(string m) { return "typeof(link." ~ m ~ ")* " ~ m ~ ";"; }
string getSymPtr(string m) { return m ~ " = library.getSymbol!(typeof(" ~ m ~ "))(\"" ~ m ~ "\");"; }

mixin ({
        string code;
        foreach (m; __traits(allMembers, link)) if (m.isSym) {
            code ~= m.declareSymPtr;
        }
        return code;
}());

public struct LLVM
{
    static void getSymbols()
    {
        foreach (m; __traits(allMembers, link)) static if (m.isSym) {
            mixin (m.getSymPtr);
        }
    }
}
---

I am not particularly happy about (isSym) having to do a name based blacklist approach instead of a type based whitelist approach, though.
With this I'm at least out of the "OMG why is it still compiling" range and thank you to everyone for that. It's still not in the ideal range of < 100 milliseconds, but I'll take what I can get.