On 9/11/2024 4:44 AM, Manu wrote:
> Okay, I see. You're depending on the optimiser to specifically collapse the goto
> into the branch as a simplification.
Actually, the same code is generated without optimization. All it's doing is
removing blocks that consist of nothing but "goto". It's a trivial optimization,
and was there in the earliest version of the compiler.
> Surely that's not even remotely reliable. There are several ways to optimise
> that function, and I see no reason an optimiser would reliably choose a
> construct like you show.
gcc -O does more or less the same thing.
> I'm actually a little surprised; a lifetime of experience with this sort of
> thing might have lead me to predict that the optimiser would /actually/ shift
> the `return 0` up into the place of the goto, effectively eliminating the
> goto... I'm sure I've seen optimisers do that transformation before, but I can't
> recall ever noting an instance of code generation that looks like what you
> pasted... I reckon I might have spotted that before.
The goto remains in the gcc -O version.
> ... and turns out, I'm right. I was so surprised with the codegen you present
> that I pulled out compiler explorer and ran some experiments.
> I tested GCC and Clang for x86, MIPS, and PPC, all of which I am extremely
> familiar with, and all of them optimise the way I predicted. None of them showed
> a pattern like you presented here.
gcc -O produced:
```
foo:
mov EAX,0
test EDI,EDI
jne L1B
sub RSP,8
call bar@PC32
mov EAX,1
add RSP,8
L1B: rep
ret
baz:
mov EAX,0
test EDI,EDI
jne L38
sub RSP,8
call bar@PC32
mov EAX,1
add RSP,8
L38: rep
ret
```
> If I had to guess; I would actually imagine that GCC and Clang will very
> deliberately NOT make a transformation like the one you show, for the precise
> reason that such a transformation changes the nature of static branch prediction
> which someone might have written code to rely on. It would be dangerous for the
> optimiser to transform the code in the way you show, and so it doesn't.
The transformation is (intermediate code):
```
if (i) goto L2; else goto L4;
L2:
goto L3;
L4:
bar();
return 1;
L3:
return 0;
```
becomes:
```
if (!i) goto L3; else goto L4;
L4:
bar();
return 1;
L3:
return 0;
```
I.e. the goto->goto was replaced with a single goto.
It's not dangerous or weird at all, nor does it interfere with branch prediction.
It inverts the condition. In the case on trial, that inverts the branch prediction.
But that aside, I'm even more confused; I couldn't reproduce that in any of my tests.
Here's a bunch of my test copiles... they all turn out the same:
gcc:
baz(int):
test edi, edi
je .L10
xor eax, eax
ret
.L10:
sub rsp, 8
call bar()
mov eax, 1
add rsp, 8
ret
clang:
baz(int):
xor eax, eax
test edi, edi
je .LBB0_1
ret
.LBB0_1:
push rax
call bar()@PLT
mov eax, 1
add rsp, 8
ret
gcc-powerpc:
baz(int):
cmpwi 0,3,0
beq- 0,.L9
li 3,0
blr
.L9:
stwu 1,-16(1)
mflr 0
stw 0,20(1)
bl bar()
lwz 0,20(1)
li 3,1
addi 1,1,16
mtlr 0
blr
arm64:
baz(int):
cbz w0, .L9
mov w0, 0
ret
.L9:
stp x29, x30, [sp, -16]!
mov x29, sp
bl bar()
mov w0, 1
ldp x29, x30, [sp], 16
ret
clang-mips:
baz(int):
beqz $4, $BB0_2
addiu $2, $zero, 0
jr $ra
nop
$BB0_2:
addiu $sp, $sp, -24
sw $ra, 20($sp)
sw $fp, 16($sp)
move $fp, $sp
jal bar()
nop
addiu $2, $zero, 1
move $sp, $fp
lw $fp, 16($sp)
lw $ra, 20($sp)
jr $ra
addiu $sp, $sp, 24
Even if you can manage to convince a compiler to write the output you're alleging, I would never imagine for a second that's a reliable strategy. The optimiser could do all kinds of things... even though in all my experiments, it does exactly what I predicted it would.