On Wed, 11 Sept 2024 at 06:21, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
Compile the following with -vasm -O:

```
void bar();

int foo(int i)
{
     if (i)
         return 0;
     bar();
     return 1;
}

int baz(int i)
{
     if (i)
         goto Lreturn0;
     bar();
     return 1;

Lreturn0:
     return 0;
}
```
and you get:
```
_D5test93fooFiZi:
0000:   55                       push      RBP
0001:   48 8B EC                 mov       RBP,RSP
0004:   85 FF                    test      EDI,EDI
0006:   74 04                    je        Lc
0008:   31 C0                    xor       EAX,EAX // hot path
000a:   5D                       pop       RBP
000b:   C3                       ret
000c:   E8 00 00 00 00           call      L0   // cold path
0011:   B8 01 00 00 00           mov       EAX,1
0016:   5D                       pop       RBP
0017:   C3                       ret
_D5test93bazFiZi:
0000:   55                       push      RBP
0001:   48 8B EC                 mov       RBP,RSP
0004:   85 FF                    test      EDI,EDI
0006:   75 0C                    jne       L14
0008:   E8 00 00 00 00           call      L0   // hot path
000d:   B8 01 00 00 00           mov       EAX,1
0012:   5D                       pop       RBP
0013:   C3                       ret
0014:   31 C0                    xor       EAX,EAX // cold path
0016:   5D                       pop       RBP
0017:   C3                       ret
```

Okay, I see. You're depending on the optimiser to specifically collapse the goto into the branch as a simplification.
Surely that's not even remotely reliable. There are several ways to optimise that function, and I see no reason an optimiser would reliably choose a construct like you show.

I'm actually a little surprised; a lifetime of experience with this sort of thing might have lead me to predict that the optimiser would actually shift the `return 0` up into the place of the goto, effectively eliminating the goto... I'm sure I've seen optimisers do that transformation before, but I can't recall ever noting an instance of code generation that looks like what you pasted... I reckon I might have spotted that before.

... and turns out, I'm right. I was so surprised with the codegen you present that I pulled out compiler explorer and ran some experiments.
I tested GCC and Clang for x86, MIPS, and PPC, all of which I am extremely familiar with, and all of them optimise the way I predicted. None of them showed a pattern like you presented here.

If I had to guess; I would actually imagine that GCC and Clang will very deliberately NOT make a transformation like the one you show, for the precise reason that such a transformation changes the nature of static branch prediction which someone might have written code to rely on. It would be dangerous for the optimiser to transform the code in the way you show, and so it doesn't.