Thread overview
Lack of optimisation with returning address of inner functions
Sep 04, 2020
Cecil Ward
Sep 04, 2020
Cecil Ward
Sep 04, 2020
Stefan Koch
Sep 04, 2020
Jackel
Sep 04, 2020
Adam D. Ruppe
September 04, 2020
This question is about a peculiar lack of optimisation in a certain weird case only.

Example, see https://d.godbolt.org/z/54eaGd  ; either LDC or GDC may be used, results are the same here :

auto test2() {
    int a = 20;
    int foo() { return a + 5; } // inner function
    return &foo;  // other way to construct delegate
    }

auto bar()
    {
    return foo();
    }

Now with LDC or GDC, inspecting the code generated, the code for foo is simply literally { return 25; }, yet if test2 is called, the code generated for the foo2 routine is not used; rather the generated code is :

    call _d_allocmemory
    mov dword ptr [rax], 20
    mov rdx, foo
    ret

1. So why the lack of optimisation? - could simply have got rid of the delegate generation in test2a as implementations when it is inlined in bar (and which is done sanely [!] in the generated code for test2a).

2. Even weirder, if you delete the & from &foo leaving simply "return foo;" then this fixes the non-optimisation bug. Why?

3. What’s the difference between foo and &foo ?

4. Leaving aside the special case above where the inner function’s address is returned, surely in many cases an inner function can be converted into an ordinary function, or simply _inlined_ so there is no function at all, no? As is seen in the standalone code generated for foo.
September 04, 2020
For the LDC version, see https://d.godbolt.org/z/x4rhbe
September 04, 2020
On Friday, 4 September 2020 at 01:13:53 UTC, Cecil Ward wrote:
> For the LDC version, see https://d.godbolt.org/z/x4rhbe

Compile with -O3 and -Oz.
September 04, 2020
On Friday, 4 September 2020 at 01:10:48 UTC, Cecil Ward wrote:
> 1. So why the lack of optimisation? - could simply have got rid of the delegate generation in test2a as implementations when it is inlined in bar (and which is done sanely [!] in the generated code for test2a).

I think this is a frontend/backend thing.

That optimization is done by the back end, but the front end doesn't know that and still assumes there's a full-blown delegate required.

> 2. Even weirder, if you delete the & from &foo leaving simply "return foo;" then this fixes the non-optimisation bug. Why?

That just calls the function and returns its value, which obviously needs no delegate since the function doesn't outlive the surrounding context.

> 3. What’s the difference between foo and &foo ?

Huge, huge difference.

&foo returns a function pointer or delegate referring to the function. The function is not called here.

foo is just foo() without the optional parenthesis; the function is actually immediately called.

Whenever the compiler frontend sees a `return &some_nested_function` it assumes a longer lifetime is required and allocates the captured variables on the heap up front.

So by the time it gets to the optimizer in the back end, it sees all that allocation and pointer code already existing. With certain settings, it might be able to see through it and optimize anyway, but its job got a lot harder since it might not know what happens with that return value later in the program.

I suspect the best you'd see in practice is all usages get inlined then the linker can discard the actual function that allocates as unused but even that can be harder than it seems for the backend to figure out given the information it has. It doesn't really understand *why* it is calling this other function, it just knows it is.
September 04, 2020
On Friday, 4 September 2020 at 01:13:53 UTC, Cecil Ward wrote:
> For the LDC version, see https://d.godbolt.org/z/x4rhbe

Not sure what your intention here is, but you are returning a delegate. You aren't actually calling it. There could be side effects, eg it has to create the context for the delegate. Which another function could access the delegate's data. So it can't simply optimize it out in this case.

When you actually call the function though, it does just narrow down to returning 25.

https://d.godbolt.org/z/6f87aT

    auto bar() { return test2()(); }

    pure nothrow @safe int example.bar():
            mov     eax, 25
            ret