September 11
On Wed, 11 Sept 2024 at 06:21, Walter Bright via Digitalmars-d < digitalmars-d@puremagic.com> wrote:

> Compile the following with -vasm -O:
>
> ```
> void bar();
>
> int foo(int i)
> {
>      if (i)
>          return 0;
>      bar();
>      return 1;
> }
>
> int baz(int i)
> {
>      if (i)
>          goto Lreturn0;
>      bar();
>      return 1;
>
> Lreturn0:
>      return 0;
> }
> ```
> and you get:
> ```
> _D5test93fooFiZi:
> 0000:   55                       push      RBP
> 0001:   48 8B EC                 mov       RBP,RSP
> 0004:   85 FF                    test      EDI,EDI
> 0006:   74 04                    je        Lc
> 0008:   31 C0                    xor       EAX,EAX // hot path
> 000a:   5D                       pop       RBP
> 000b:   C3                       ret
> 000c:   E8 00 00 00 00           call      L0   // cold path
> 0011:   B8 01 00 00 00           mov       EAX,1
> 0016:   5D                       pop       RBP
> 0017:   C3                       ret
> _D5test93bazFiZi:
> 0000:   55                       push      RBP
> 0001:   48 8B EC                 mov       RBP,RSP
> 0004:   85 FF                    test      EDI,EDI
> 0006:   75 0C                    jne       L14
> 0008:   E8 00 00 00 00           call      L0   // hot path
> 000d:   B8 01 00 00 00           mov       EAX,1
> 0012:   5D                       pop       RBP
> 0013:   C3                       ret
> 0014:   31 C0                    xor       EAX,EAX // cold path
> 0016:   5D                       pop       RBP
> 0017:   C3                       ret
> ```
>

Okay, I see. You're depending on the optimiser to specifically collapse the
goto into the branch as a simplification.
Surely that's not even remotely reliable. There are several ways to
optimise that function, and I see no reason an optimiser would reliably
choose a construct like you show.

I'm actually a little surprised; a lifetime of experience with this sort of thing might have lead me to predict that the optimiser would *actually* shift the `return 0` up into the place of the goto, effectively eliminating the goto... I'm sure I've seen optimisers do that transformation before, but I can't recall ever noting an instance of code generation that looks like what you pasted... I reckon I might have spotted that before.

... and turns out, I'm right. I was so surprised with the codegen you
present that I pulled out compiler explorer and ran some experiments.
I tested GCC and Clang for x86, MIPS, and PPC, all of which I am extremely
familiar with, and all of them optimise the way I predicted. None of them
showed a pattern like you presented here.

If I had to guess; I would actually imagine that GCC and Clang will very deliberately NOT make a transformation like the one you show, for the precise reason that such a transformation changes the nature of static branch prediction which someone might have written code to rely on. It would be dangerous for the optimiser to transform the code in the way you show, and so it doesn't.


September 11
On Wed, 11 Sept 2024 at 09:12, Walter Bright via Digitalmars-d < digitalmars-d@puremagic.com> wrote:

> On 9/9/2024 7:18 PM, Tejas wrote:
> > Hi, what are your views on the article linked below that discourages
> using
> > `[[likely]]` and `[[unlikely]]`?
> >
> >
> https://blog.aaronballman.com/2020/08/dont-use-the-likely-or-unlikely-attributes/
>
> Wow. The article eviscerates that design.
>

Just to be clear; nobody I'm aware of has proposed that design, so I hope that's not your take-away.

My proposal is to allow a hint attached strictly to control statements.
(ideally as a suffix)
It is easy to read, also easy to ignore (this is important), and extremely
low-impact when marking up existing code: no new lines, no rearranging of
code, purely additive; strictly appends to the end of existing
control statements... these are very nice properties for casually marking
up some code where it proves to be profitable, without interfering with
readability, or even interfering with historic diff's in any meaningful way
that might make it annoying to review.


September 11
On 11/09/2024 11:53 PM, Manu wrote:
> On Wed, 11 Sept 2024 at 09:12, Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com <mailto:digitalmars-d@puremagic.com>> wrote:
> 
>     On 9/9/2024 7:18 PM, Tejas wrote:
>      > Hi, what are your views on the article linked below that
>     discourages using
>      > `[[likely]]` and `[[unlikely]]`?
>      >
>      >
>     https://blog.aaronballman.com/2020/08/dont-use-the-likely-or-unlikely-attributes/ <https://blog.aaronballman.com/2020/08/dont-use-the-likely-or-unlikely-attributes/>
> 
>     Wow. The article eviscerates that design.
> 
> 
> Just to be clear; nobody I'm aware of has proposed that design, so I hope that's not your take-away.
> 
> My proposal is to allow a hint attached strictly to control statements. (ideally as a suffix)
> It is easy to read, also easy to ignore (this is important), and extremely low-impact when marking up existing code: no new lines, no rearranging of code, purely additive; strictly appends to the end of existing control statements... these are very nice properties for casually marking up some code where it proves to be profitable, without interfering with readability, or even interfering with historic diff's in any meaningful way that might make it annoying to review.

Ideas forum proposal I wrote a little bit ago, based upon this article for what it should not do.

https://forum.dlang.org/post/oezbkynwdhfqatsvufdm@forum.dlang.org
September 11
On 9/11/2024 4:44 AM, Manu wrote:
> Okay, I see. You're depending on the optimiser to specifically collapse the goto into the branch as a simplification.

Actually, the same code is generated without optimization. All it's doing is removing blocks that consist of nothing but "goto". It's a trivial optimization, and was there in the earliest version of the compiler.


> Surely that's not even remotely reliable. There are several ways to optimise that function, and I see no reason an optimiser would reliably choose a construct like you show.

gcc -O does more or less the same thing.


> I'm actually a little surprised; a lifetime of experience with this sort of thing might have lead me to predict that the optimiser would /actually/ shift the `return 0` up into the place of the goto, effectively eliminating the goto... I'm sure I've seen optimisers do that transformation before, but I can't recall ever noting an instance of code generation that looks like what you pasted... I reckon I might have spotted that before.

The goto remains in the gcc -O version.


> ... and turns out, I'm right. I was so surprised with the codegen you present that I pulled out compiler explorer and ran some experiments.
> I tested GCC and Clang for x86, MIPS, and PPC, all of which I am extremely familiar with, and all of them optimise the way I predicted. None of them showed a pattern like you presented here.

gcc -O produced:

```
foo:
    mov       EAX,0
    test      EDI,EDI
    jne       L1B
    sub       RSP,8
    call      bar@PC32
    mov       EAX,1
    add       RSP,8
L1B:    rep
    ret
baz:
    mov       EAX,0
    test      EDI,EDI
    jne       L38
    sub       RSP,8
    call      bar@PC32
    mov       EAX,1
    add       RSP,8
L38:    rep
    ret
```

> If I had to guess; I would actually imagine that GCC and Clang will very deliberately NOT make a transformation like the one you show, for the precise reason that such a transformation changes the nature of static branch prediction which someone might have written code to rely on. It would be dangerous for the optimiser to transform the code in the way you show, and so it doesn't.

The transformation is (intermediate code):
```
if (i) goto L2; else goto L4;
L2:
   goto L3;
L4:
   bar();
   return 1;
L3:
   return 0;
```
becomes:
```
if (!i) goto L3; else goto L4;
L4:
    bar();
    return 1;
L3:
    return 0;
```
I.e. the goto->goto was replaced with a single goto.

It's not dangerous or weird at all, nor does it interfere with branch prediction.

September 11
On 9/11/2024 4:18 AM, Manu wrote:
> Okay, I lost the signal somewhere... what I'm essentially saying though, is that it doesn't matter what the rule is or how it came about; the point is, it's a tool the architecture offers which is most useful at the language level. Laying out code to match this particular rule is not something that's appropriate, because some other arch with whatever other primitive strategy might come along.

The rule that the code is laid out in the order the programmer wrote it makes the most sense to me. It gives the programmer the control over how it gets executed. The same applies to switch statements - put the most visited case statements first.

> Obfuscating the contorting code is not the goal or a reasonable solution; we just want a mechanism in the language to take advantage of this general category of support in whatever architecture.

I tend to agree, but when micro-optimizing one's code, one accepts that its elegance is going to decline.

There are at least 3 ways to organize the code to get what you want. I won't claim they're beautiful, but they work.
September 11
On 9/11/2024 4:53 AM, Manu wrote:
> Just to be clear; nobody I'm aware of has proposed that design, so I hope that's not your take-away.

Indeed I thought you were proposing that, glad you're not!

> My proposal is to allow a hint attached strictly to control statements. (ideally as a suffix)
> It is easy to read, also easy to ignore (this is important), and extremely low-impact when marking up existing code: no new lines, no rearranging of code, purely additive; strictly appends to the end of existing control statements... these are very nice properties for casually marking up some code where it proves to be profitable, without interfering with readability, or even interfering with historic diff's in any meaningful way that might make it annoying to review.

How is that materially different from [[likely]] annotations?

September 11
On 9/11/24 20:55, Walter Bright wrote:
> 
>> My proposal is to allow a hint attached strictly to control statements. (ideally as a suffix)
>> It is easy to read, also easy to ignore (this is important), and extremely low-impact when marking up existing code: no new lines, no rearranging of code, purely additive; strictly appends to the end of existing control statements... these are very nice properties for casually marking up some code where it proves to be profitable, without interfering with readability, or even interfering with historic diff's in any meaningful way that might make it annoying to review.
> 
> How is that materially different from [[likely]] annotations?

It's associated with the branch and not with the program path.

(This is assuming the ideas in the blog post about C++ [[likely]] are factually correct, I have not sought independent confirmation.)
September 11
On Wed, 11 Sept 2024 at 18:26, Walter Bright via Digitalmars-d < digitalmars-d@puremagic.com> wrote:

> On 9/11/2024 4:44 AM, Manu wrote:
> > Okay, I see. You're depending on the optimiser to specifically collapse
> the goto
> > into the branch as a simplification.
>
> Actually, the same code is generated without optimization. All it's doing
> is
> removing blocks that consist of nothing but "goto". It's a trivial
> optimization,
> and was there in the earliest version of the compiler.
>
>
> > Surely that's not even remotely reliable. There are several ways to
> optimise
> > that function, and I see no reason an optimiser would reliably choose a construct like you show.
>
> gcc -O does more or less the same thing.
>
>
> > I'm actually a little surprised; a lifetime of experience with this sort
> of
> > thing might have lead me to predict that the optimiser would /actually/
> shift
> > the `return 0` up into the place of the goto, effectively eliminating
> the
> > goto... I'm sure I've seen optimisers do that transformation before, but
> I can't
> > recall ever noting an instance of code generation that looks like what
> you
> > pasted... I reckon I might have spotted that before.
>
> The goto remains in the gcc -O version.
>
>
> > ... and turns out, I'm right. I was so surprised with the codegen you
> present
> > that I pulled out compiler explorer and ran some experiments.
> > I tested GCC and Clang for x86, MIPS, and PPC, all of which I am
> extremely
> > familiar with, and all of them optimise the way I predicted. None of
> them showed
> > a pattern like you presented here.
>
> gcc -O produced:
>
> ```
> foo:
>      mov       EAX,0
>      test      EDI,EDI
>      jne       L1B
>      sub       RSP,8
>      call      bar@PC32
>      mov       EAX,1
>      add       RSP,8
> L1B:    rep
>      ret
> baz:
>      mov       EAX,0
>      test      EDI,EDI
>      jne       L38
>      sub       RSP,8
>      call      bar@PC32
>      mov       EAX,1
>      add       RSP,8
> L38:    rep
>      ret
> ```
>
> > If I had to guess; I would actually imagine that GCC and Clang will very deliberately NOT make a transformation like the one you show, for the
> precise
> > reason that such a transformation changes the nature of static branch
> prediction
> > which someone might have written code to rely on. It would be dangerous
> for the
> > optimiser to transform the code in the way you show, and so it doesn't.
>
> The transformation is (intermediate code):
> ```
> if (i) goto L2; else goto L4;
> L2:
>     goto L3;
> L4:
>     bar();
>     return 1;
> L3:
>     return 0;
> ```
> becomes:
> ```
> if (!i) goto L3; else goto L4;
> L4:
>      bar();
>      return 1;
> L3:
>      return 0;
> ```
> I.e. the goto->goto was replaced with a single goto.
>
> It's not dangerous or weird at all, nor does it interfere with branch prediction.
>

It inverts the condition. In the case on trial, that inverts the branch prediction.

But that aside, I'm even more confused; I couldn't reproduce that in any of
my tests.
Here's a bunch of my test copiles... they all turn out the same:

gcc:

baz(int):
        test    edi, edi
        je      .L10
        xor     eax, eax
        ret
.L10:
        sub     rsp, 8
        call    bar()
        mov     eax, 1
        add     rsp, 8
        ret

clang:

baz(int):
        xor     eax, eax
        test    edi, edi
        je      .LBB0_1
        ret
.LBB0_1:
        push    rax
        call    bar()@PLT
        mov     eax, 1
        add     rsp, 8
        ret

gcc-powerpc:

baz(int):
        cmpwi 0,3,0
        beq- 0,.L9
        li 3,0
        blr
.L9:
        stwu 1,-16(1)
        mflr 0
        stw 0,20(1)
        bl bar()
        lwz 0,20(1)
        li 3,1
        addi 1,1,16
        mtlr 0
        blr

arm64:

baz(int):
        cbz     w0, .L9
        mov     w0, 0
        ret
.L9:
        stp     x29, x30, [sp, -16]!
        mov     x29, sp
        bl      bar()
        mov     w0, 1
        ldp     x29, x30, [sp], 16
        ret

clang-mips:

baz(int):
        beqz    $4, $BB0_2
        addiu   $2, $zero, 0
        jr      $ra
        nop
$BB0_2:
        addiu   $sp, $sp, -24
        sw      $ra, 20($sp)
        sw      $fp, 16($sp)
        move    $fp, $sp
        jal     bar()
        nop
        addiu   $2, $zero, 1
        move    $sp, $fp
        lw      $fp, 16($sp)
        lw      $ra, 20($sp)
        jr      $ra
        addiu   $sp, $sp, 24

Even if you can manage to convince a compiler to write the output you're alleging, I would never imagine for a second that's a reliable strategy. The optimiser could do all kinds of things... even though in all my experiments, it does exactly what I predicted it would.


September 11
On Wed, 11 Sept 2024 at 19:56, Walter Bright via Digitalmars-d < digitalmars-d@puremagic.com> wrote:

> On 9/11/2024 4:18 AM, Manu wrote:
> > Okay, I lost the signal somewhere... what I'm essentially saying though,
> is that
> > it doesn't matter what the rule is or how it came about; the point is,
> it's a
> > tool the architecture offers which is most useful at the language level.
> Laying
> > out code to match this particular rule is not something that's
> appropriate,
> > because some other arch with whatever other primitive strategy might
> come along.
>
> The rule that the code is laid out in the order the programmer wrote it
> makes
> the most sense to me. It gives the programmer the control over how it gets
> executed. The same applies to switch statements - put the most visited
> case
> statements first.
>

This isn't a matter of opinion. The compilers do what the compilers do, and that's just the way it is.

> Obfuscating the contorting code is not the goal or a reasonable solution; we
> > just want a mechanism in the language to take advantage of this general
> category
> > of support in whatever architecture.
>
> I tend to agree, but when micro-optimizing one's code, one accepts that
> its
> elegance is going to decline.
>
> There are at least 3 ways to organize the code to get what you want. I
> won't
> claim they're beautiful, but they work.
>

I can't reproduce this claim of yours.
I couldn't reproduce a case where your hack produced the code you say, and
even if I could I would never accept it to be reliable.

...and even if it DID work reliably, I *still* wouldn't accept it, because mangling and contorting my code like that is just stupid.

...and it's not like we're even talking about a trade-off here! An otherwise benign and extremely low-impact hint attribute on a control statement just isn't an edgy or risky move.


September 11
On Wed, 11 Sept 2024 at 20:01, Walter Bright via Digitalmars-d < digitalmars-d@puremagic.com> wrote:

> On 9/11/2024 4:53 AM, Manu wrote:
> > Just to be clear; nobody I'm aware of has proposed that design, so I
> hope that's
> > not your take-away.
>
> Indeed I thought you were proposing that, glad you're not!
>
> > My proposal is to allow a hint attached strictly to control statements.
> (ideally
> > as a suffix)
> > It is easy to read, also easy to ignore (this is important), and
> extremely
> > low-impact when marking up existing code: no new lines, no rearranging
> of code,
> > purely additive; strictly appends to the end of existing
> control statements...
> > these are very nice properties for casually marking up some code where
> it proves
> > to be profitable, without interfering with readability, or even
> interfering with
> > historic diff's in any meaningful way that might make it annoying to
> review.
>
> How is that materially different from [[likely]] annotations?
>

The article given above shows why arbitrary hints given as stand-alone
statements in a flow causes nonsense when conflicting annotations appear
within a flow.
Attaching exactly one annotation specifically to a control statement that
describes a branch is not at risk of nonsense cases.