Thread overview
CTFE and Static If Question
May 07, 2020
jmh530
May 07, 2020
H. S. Teoh
May 07, 2020
jmh530
May 07, 2020
ag0aep6g
May 07, 2020
jmh530
May 07, 2020
H. S. Teoh
May 07, 2020
Paul Backus
May 07, 2020
jmh530
May 07, 2020
I am curious how ctfe and static ifs interact. In particular, if an enum bool passed as a template parameter or run-time one will turn an if statement into something like a static if statement (perhaps after the compiler optimizes other code away). In the code below, I am a function that takes a bool as a template parameter and another that has it as a run-time parameter. In the first, I just pass an compile-time bool (y0) into it. In the second I have run-time (x0) and compile-time (y0).

Does foo!y0(rt) generate the same code as foo(rt, y0)?

How is the code generated by foo(rt, x0) different from foo(rt,y0)?

auto foo(bool rtct)(int rt) {
    static if (rtct)
        return rt + 1;
    else
        return rt;
}

auto foo(int rt, bool rtct) {
    if (rtct == true)
	    return rt + 1;
    else
        return rt;
}

void main() {
    int rt = 3;
    bool x0 = true;
    bool x1 = false;
    assert(foo(rt, x0) == 4);
    assert(foo(rt, x1) == 3);

    enum y0 = true;
    enum y1 = false;
    assert(foo!y0(rt) == 4);
    assert(foo!y1(rt) == 3);
    assert(foo(rt, y0) == 4);
    assert(foo(rt, y1) == 3);
}


May 07, 2020
On Thu, May 07, 2020 at 03:00:18PM +0000, jmh530 via Digitalmars-d-learn wrote:
> I am curious how ctfe and static ifs interact.

Then you should find this article helpful:

	https://wiki.dlang.org/User:Quickfur/Compile-time_vs._compile-time


> In particular, if an enum bool passed as a template parameter or run-time one will turn an if statement into something like a static if statement (perhaps after the compiler optimizes other code away). In the code below, I am a function that takes a bool as a template parameter and another that has it as a run-time parameter. In the first, I just pass an compile-time bool (y0) into it. In the second I have run-time (x0) and compile-time (y0).

There are multiple things here that should not be confused:

1) Template instantiation (includes static if): a template takes some
   number of compile-time arguments, and emits multiple copies of the
   template definition, with said arguments substituted for its
   parameters. Templates are processed early on in the compilation
   process, and can change the structure of the AST that the compiler
   sees.

2) CTFE: runs code that has already been compiled into a form that's
   ready to be emitted as runtime code, and feeds the return value back
   to the compiler so that it can be used as a compile-time value
   somewhere else.  Note that CTFE can only run code that has *passed*
   the template instantiation stage (even though the resulting value
   *can* be used for instantiation of other templates, but it cannot
   change the CTFE'd code itself).

3) Code optimization: this takes place independently of (1) and (2), and
   happens as part of code generation. It basically tries to transform
   the machine code so that it runs more efficiently at runtime.  It may
   simplify statements or eliminate redundant / dead code, but this has
   nothing to do with templates, static if, or CTFE; it's a
   transformation applied toward the end of the compilation process and
   it's universally applied to all runtime code (unless you disable
   optimizations).


> Does foo!y0(rt) generate the same code as foo(rt, y0)?

It generates a different AST structure.  But at the end, the optimizer may reduce them into the same code, depending on the circumstances.


> How is the code generated by foo(rt, x0) different from foo(rt,y0)?

There is no difference, except that in the latter case, the optimizer may realize that the argument is a compile-time constant, and inline the function with its return value instead of emitting a runtime call. This has nothing to do with templates or CTFE, it's purely an optimizer issue (and different optimizers may do things differently -- it does not change the meaning of the code).


> auto foo(bool rtct)(int rt) {
>     static if (rtct)
>         return rt + 1;
>     else
>         return rt;
> }
> 
> auto foo(int rt, bool rtct) {
>     if (rtct == true)
> 	    return rt + 1;
>     else
>         return rt;
> }
> 
> void main() {
>     int rt = 3;
>     bool x0 = true;
>     bool x1 = false;
>     assert(foo(rt, x0) == 4);
>     assert(foo(rt, x1) == 3);
> 
>     enum y0 = true;
>     enum y1 = false;
>     assert(foo!y0(rt) == 4);
>     assert(foo!y1(rt) == 3);
>     assert(foo(rt, y0) == 4);
>     assert(foo(rt, y1) == 3);
> }
[...]


T

-- 
IBM = I Blame Microsoft
May 07, 2020
On 07.05.20 17:00, jmh530 wrote:
> Does foo!y0(rt) generate the same code as foo(rt, y0)?
> 
> How is the code generated by foo(rt, x0) different from foo(rt,y0)?
> 
> auto foo(bool rtct)(int rt) {
>      static if (rtct)
>          return rt + 1;
>      else
>          return rt;
> }
> 
> auto foo(int rt, bool rtct) {
>      if (rtct == true)
>          return rt + 1;
>      else
>          return rt;
> }
> 
> void main() {
>      int rt = 3;
>      bool x0 = true;
>      bool x1 = false;
>      assert(foo(rt, x0) == 4);
>      assert(foo(rt, x1) == 3);
> 
>      enum y0 = true;
>      enum y1 = false;
>      assert(foo!y0(rt) == 4);
>      assert(foo!y1(rt) == 3);
>      assert(foo(rt, y0) == 4);
>      assert(foo(rt, y1) == 3);
> }

The `static if` is guaranteed to be evaluated during compilation. That means, `foo!y0` effectively becomes this:

    auto foo(int rt) { return rt + 1; }

There is no such guarantee for `foo(rt, y0)`. It doesn't matter that y0 is an enum.

But a half-decent optimizer will have no problem replacing all your calls with their results. Compared with LDC and GDC, DMD has a poor optimizer, but even DMD turns this:

    int main() {
        int rt = 3;
        bool x0 = true;
        bool x1 = false;
        enum y0 = true;
        enum y1 = false;
        return
            foo(rt, x0) +
            foo(rt, x1) +
            foo!y0(rt) +
            foo!y1(rt) +
            foo(rt, y0) +
            foo(rt, y1);
    }

into this:

    int main() { return 21; }
May 07, 2020
On Thursday, 7 May 2020 at 15:29:01 UTC, H. S. Teoh wrote:
> [snip]

You explained things very well, thanks.
May 07, 2020
On Thursday, 7 May 2020 at 15:34:21 UTC, ag0aep6g wrote:
> [snip]
>
> The `static if` is guaranteed to be evaluated during compilation. That means, `foo!y0` effectively becomes this:
>
>     auto foo(int rt) { return rt + 1; }
>
> There is no such guarantee for `foo(rt, y0)`. It doesn't matter that y0 is an enum.
>
> But a half-decent optimizer will have no problem replacing all your calls with their results. Compared with LDC and GDC, DMD has a poor optimizer, but even DMD turns this:
>
>     int main() {
>         int rt = 3;
>         bool x0 = true;
>         bool x1 = false;
>         enum y0 = true;
>         enum y1 = false;
>         return
>             foo(rt, x0) +
>             foo(rt, x1) +
>             foo!y0(rt) +
>             foo!y1(rt) +
>             foo(rt, y0) +
>             foo(rt, y1);
>     }
>
> into this:
>
>     int main() { return 21; }

Thanks for the reply.

The particular use case I'm thinking of is more like below where some function bar (that returns a T) is called before the return. Not sure if that matters or not for inlining the results.

T foo(T[] x, bool y) {
    if (y)
        return bar(x) / x.length;
    else
        return bar(x) / (x.length - 1);
}
May 07, 2020
On Thursday, 7 May 2020 at 15:00:18 UTC, jmh530 wrote:
> Does foo!y0(rt) generate the same code as foo(rt, y0)?
>
> How is the code generated by foo(rt, x0) different from foo(rt,y0)?

You can look at the generated code using the Compiler Explorer at d.godbolt.org. Here's a link to your example, compiled with ldc, with optimizations enabled:

https://d.godbolt.org/z/x5K7P6

As you can see, the non-static-if version has a runtime comparison and a conditional jump, and the static-if version does not. However, it doesn't make a difference in the end, because the calls have been optimized out, leaving an empty main function.
May 07, 2020
On Thu, May 07, 2020 at 05:34:21PM +0200, ag0aep6g via Digitalmars-d-learn wrote: [...]
> Compared with LDC and GDC, DMD has a poor optimizer,
[...]

DMD's optimizer is the whipping boy around here, but to be fair, it is poor only when it comes to aggrssive inlining and optimizing loops. Other than those two areas, it's actually a pretty decent optimizer.

And even within those two areas, it actually does do some degree of optimization, it's not completely useless.  Its main fault is caused by what I call the domino-effect of optimization: optimal code is often arrived at by applying a chain of optimization steps, each of which depends on (some of) the previous step(s). You can think of previous optimization steps as a domino that knocks down subsequent dominoes (enables further optimizations).  Conversely, if you miss an optimization opportunity, it will also close the door to the subsequent optimization steps. If the previous domino didn't fall, the subsequent ones won't either.

One of the problems with DMD's optimizer is that the inliner gives up too easily, and too early in the process; as a result, it often closes the door to further optimizations.  Its loop optimizer is also rather, shall we say, timid, and so combined with the equally timid inliner, it all adds up to a lot of missed optimization opportunities.  And given that loops are usually where the bottlenecks are, these missed opportunities add up to a big performance hit at runtime.

Comparatively, LDC and GDC's optimizers are extremely aggressive with loop optimizations; as a result, they are able to reach past local minima in computation space and obtain sometimes pretty major performance gains. LDC also especially seems very aggressive with inlining, especially with -O2 and -O3; this ensures no rock is left unturned among the optimization dominoes, thus enabling it to simplify a lot of code that DMD gives up on too early in the process. Heavily range-based code seems to especially get vastly different performance characteristics here -- because range methods tend to be small functions that, if inlined, lead to further optimizations with the adjacent ranges in the UFCS pipeline. But if one method failed to be inlined, it blocks all further optimizations down the chain, causing a chain of missed opportunities and resulting in slow runtime code.


T

-- 
Дерево держится корнями, а человек - друзьями.
May 07, 2020
On Thursday, 7 May 2020 at 17:59:30 UTC, Paul Backus wrote:
> On Thursday, 7 May 2020 at 15:00:18 UTC, jmh530 wrote:
>> Does foo!y0(rt) generate the same code as foo(rt, y0)?
>>
>> How is the code generated by foo(rt, x0) different from foo(rt,y0)?
>
> You can look at the generated code using the Compiler Explorer at d.godbolt.org. Here's a link to your example, compiled with ldc, with optimizations enabled:
>
> https://d.godbolt.org/z/x5K7P6
>
> As you can see, the non-static-if version has a runtime comparison and a conditional jump, and the static-if version does not. However, it doesn't make a difference in the end, because the calls have been optimized out, leaving an empty main function.

Thanks for that. I forgot how much nicer godbolt is for looking assembly than run.dlang.org. Or maybe it's just that the optimized assembly for ldc looks a lot simpler than dmd?

I eventually played around with it for a bit and ended up with what's below. When compiled with ldc -O -release, some of the functions have code that are generated a little differently than the one above (the assembly looks a little prettier when using ints than doubles). It looks like there is a little bit of template bloat too, in that it generates something like four different versions of the run-time function that all basically do the same thing (some slight differences I don't really understand). Anyway, I think that's the first time I've ever used __traits compiles with a static if and I don't think I've ever used an template alias bool before, but I thought it was kind of cool.


int foo(alias bool rtct)(int[] x) {
    static if (__traits(compiles, {static if (rtct) { enum val = rtct;}})) {
        static if (rtct) {
            return bar(x) / cast(int) x.length;
        } else {
            return bar(x) / cast(int) (x.length - 1);
        }
    } else {
        if (rtct)
            return bar(x) / cast(int) x.length;
        else
            return bar(x) / cast(int) (x.length - 1);
    }
}

int foo(int[] x, bool rtct) {
    if (rtct)
	    return foo!true(x);
    else
        return foo!false(x);
}

int bar(int[] x) {
    return x[0] + x[1];
}


void main() {
    import std.stdio: writeln;

    int[] a = [1, 2, 3, 4, 5];
    bool x0 = true;
    bool x1 = false;
    int result0 = foo(a, x0);
    int result1 = foo(a, x1);
    int result2 = foo!x0(a);
    int result3 = foo!x1(a);

    enum y0 = true;
    enum y1 = false;
    int result0_ = foo(a, y0);
    int result1_ = foo(a, y1);
    int result2_ = foo!y0(a);
    int result3_ = foo!y1(a);
}