On Wed, 11 Sept 2024 at 06:21, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com> wrote:

Compile the following with -vasm -O:

```
void bar();

int foo(int i)
{
if (i)
return 0;
bar();
return 1;
}

int baz(int i)
{
if (i)
goto Lreturn0;
bar();
return 1;

Lreturn0:
return 0;
}
```
and you get:
```
_D5test93fooFiZi:
0000: 55 push RBP
0001: 48 8B EC mov RBP,RSP
0004: 85 FF test EDI,EDI
0006: 74 04 je Lc
0008: 31 C0 xor EAX,EAX // hot path
000a: 5D pop RBP
000b: C3 ret
000c: E8 00 00 00 00 call L0 // cold path
0011: B8 01 00 00 00 mov EAX,1
0016: 5D pop RBP
0017: C3 ret
_D5test93bazFiZi:
0000: 55 push RBP
0001: 48 8B EC mov RBP,RSP
0004: 85 FF test EDI,EDI
0006: 75 0C jne L14
0008: E8 00 00 00 00 call L0 // hot path
000d: B8 01 00 00 00 mov EAX,1
0012: 5D pop RBP
0013: C3 ret
0014: 31 C0 xor EAX,EAX // cold path
0016: 5D pop RBP
0017: C3 ret
```

Okay, I see. You're depending on the optimiser to specifically collapse the goto into the branch as a simplification.

Surely that's not even remotely reliable. There are several ways to optimise that function, and I see no reason an optimiser would reliably choose a construct like you show.

I'm actually a little surprised; a lifetime of experience with this sort of thing might have lead me to predict that the optimiser would actually shift the `return 0` up into the place of the goto, effectively eliminating the goto... I'm sure I've seen optimisers do that transformation before, but I can't recall ever noting an instance of code generation that looks like what you pasted... I reckon I might have spotted that before.

... and turns out, I'm right. I was so surprised with the codegen you present that I pulled out compiler explorer and ran some experiments.

I tested GCC and Clang for x86, MIPS, and PPC, all of which I am extremely familiar with, and all of them optimise the way I predicted. None of them showed a pattern like you presented here.

If I had to guess; I would actually imagine that GCC and Clang will very deliberately NOT make a transformation like the one you show, for the precise reason that such a transformation changes the nature of static branch prediction which someone might have written code to rely on. It would be dangerous for the optimiser to transform the code in the way you show, and so it doesn't.