September 13
On 13/09/2024 10:20 PM, Quirin Schroll wrote:
> On Friday, 13 September 2024 at 10:02:25 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> On 13/09/2024 9:54 PM, Quirin Schroll wrote:
>>> The best thing about |pragma| is that it allows for additional arguments. For example, a |bool| to enable or disable it: |pragma(unlikely, false)| could be as if it’s not there. Great for meta-programming. For |pragma(likely)|, a numerical probability makes sense, too: |pragma(likely, 0)| is equivalent to |pragma(unlikely)| and a single |pragma(likely, value)| (with |value| > 0) is |pragma(likely)|.
>>
>> We can do this with a UDA.
>>
>> ```d
>> struct unlikely {
>>     bool activate=true;
>> }
>>
>> if (...) @unlikely(false) {
>>
>> }
>> ```
> 
> Compiler-recognized UDAs are actually a bad choice in this case. We’d need to change the grammar to allow them at this place in a very special and weird way and they’re harder to ignore.
> 
> Again, the advantage of a pragma is that it’s implementation defined and may end up not have any semantics at all, and this is already specified out. A compiler-recognized UDA is just the wrong tool for the job. The [spec about pragmas](https://dlang.org/spec/pragma) is pretty clear about that and as a related feature, `inline` is a pragma as well for this exact reason.
> 
> I just don’t understand why some people are adamant that those annotations should be attributes. To me, it makes not the least bit of sense.

All the arguments you are making for a pragma equally apply to a UDA.

Except they have the benefit that you can override them, and redefine them if they are not supported.

You cannot do that with a pragma.

So there is a transition path for both old and new code with new compiler versions with the UDA's that are not present with pragmas and that is a major advantage for code maintenance and portability between compilers.

>>> Generally speaking, if there are more than two branches, with two or more of them tagged |likely|, they can be given weights, that may be derived from abstract reasoning or profiling. That’s essentially what GCC has with |__builtin_expect_with_probability|, except that it’s with weights and not probabilities.
>>
>> The way it works in D is if-else not if-elseif-else.
> 
> There is also `switch`.

Walter has stated this elsewhere, that the order of declaration determines likelihood for cases.

https://forum.dlang.org/post/vbspbo$2845$1@digitalmars.com

>> So for something like this, you are swapping the assumption from one path to another.
>>
>> I don't think we need probability support, just because of how the IR will be laid out to the backend.
> 
> GCC supports them, so I thought at least GDC could make use of them, LDC probably, too. DMD can just consider weights > 0 as equal and likely.

So does LLVM.

What I am questioning is the need to offer this in the language, as I don't think we can drive it. To drive it you need if-elseif-else, rather than if-else.
September 13

On Friday, 13 September 2024 at 10:26:48 UTC, Richard (Rikki) Andrew Cattermole wrote:

>

On 13/09/2024 10:20 PM, Quirin Schroll wrote:

>

On Friday, 13 September 2024 at 10:02:25 UTC, Richard (Rikki) Andrew Cattermole wrote:

>

On 13/09/2024 9:54 PM, Quirin Schroll wrote:

>

The best thing about |pragma| is that it allows for additional arguments. For example, a |bool| to enable or disable it: |pragma(unlikely, false)| could be as if it’s not there. Great for meta-programming. For |pragma(likely)|, a numerical probability makes sense, too: |pragma(likely, 0)| is equivalent to |pragma(unlikely)| and a single |pragma(likely, value)| (with |value| > 0) is |pragma(likely)|.

We can do this with a UDA.

struct unlikely {
    bool activate=true;
}

if (...) @unlikely(false) {

}

Compiler-recognized UDAs are actually a bad choice in this case. We’d need to change the grammar to allow them at this place in a very special and weird way and they’re harder to ignore.

Again, the advantage of a pragma is that it’s implementation defined and may end up not have any semantics at all, and this is already specified out. A compiler-recognized UDA is just the wrong tool for the job. The spec about pragmas is pretty clear about that and as a related feature, inline is a pragma as well for this exact reason.

I just don’t understand why some people are adamant that those annotations should be attributes. To me, it makes not the least bit of sense.

All the arguments you are making for a pragma equally apply to a UDA.

I don’t see why. Compilers aren’t free to ignore @safe and not issue an error if you violate its conditions, but every pragma is specified to be ignorable.

>

Except they have the benefit that you can override them, and redefine them if they are not supported.

You cannot do that with a pragma.

So there is a transition path for both old and new code with new compiler versions with the UDA's that are not present with pragmas and that is a major advantage for code maintenance and portability between compilers.

I fail to see why this is the case or even desirable. pragma(likely) isn’t really likely to be in code bases anyway.

> > > >

Generally speaking, if there are more than two branches, with two or more of them tagged |likely|, they can be given weights, that may be derived from abstract reasoning or profiling. That’s essentially what GCC has with |__builtin_expect_with_probability|, except that it’s with weights and not probabilities.

The way it works in D is if-else not if-elseif-else.

There is also switch.

Walter has stated this elsewhere, that the order of declaration determines likelihood for cases.

https://forum.dlang.org/post/vbspbo$2845$1@digitalmars.com

Another case of relying on the choice of specific compilers. AFAICT, the whole point of likelihood annotations is that you can layout the code as you think it’s best to read, but sprinkle in some annotations that don’t hurt reading so that the compiler emits a better binary.

> > >

So for something like this, you are swapping the assumption from one path to another.

I don't think we need probability support, just because of how the IR will be laid out to the backend.

GCC supports them, so I thought at least GDC could make use of them, LDC probably, too. DMD can just consider weights > 0 as equal and likely.

So does LLVM.

What I am questioning is the need to offer this in the language, as I don't think we can drive it. To drive it you need if-elseif-else, rather than if-else.

Well, we’re in circles here. I already mentioned switch, and again, the reason for the annotation is that code can be laid out to be readable and idiomatic. Let’s say you have a big switch in a hot loop. You profiled and now the data tells you how likely each branch was. What would you prefer? Reordering the branches by likelihood, leading to a diff that’s basically impossible to understand or even vet that it’s just a reordering, or the pure addition of likelihood annotations, for which in the diff it’s absolutely clear nothing else changes. And if there’s a fallthrough, you have to jump to the right case now.

You’re arguing as if the compiler couldn’t recognize else if as a pattern.

The reason a compiler optimizes the non-branch or switch cases by ordering (except if it has another clear indication of what’s a cold path, e.g. a thrown exception) is because absent any information, it has to do something and be deterministic. For most cases, it’s fine. Optimization hints are an expert tool.

September 13

On Friday, 13 September 2024 at 10:57:56 UTC, Quirin Schroll wrote:

>

On Friday, 13 September 2024 at 10:26:48 UTC, Richard (Rikki) Andrew Cattermole wrote:
Let’s say you have a big switch in a hot loop. You profiled and now the data tells you how likely each branch was. What would you prefer? Reordering the branches by likelihood, leading to a diff that’s basically impossible to understand or even vet that it’s just a reordering, or the pure addition of likelihood annotations, for which in the diff it’s absolutely clear nothing else changes. And if there’s a fallthrough, you have to jump to the right case now.

i'd prefer handing the compiler a profile log, and the compiler just optimizing based on that file without the need to do any annotations by hand.

September 13
On 9/13/24 10:19, Walter Bright wrote:
> On 9/11/2024 12:46 PM, Timon Gehr wrote:
>> On 9/11/24 20:55, Walter Bright wrote:
>>>
>>>> My proposal is to allow a hint attached strictly to control statements. (ideally as a suffix)
>>>> It is easy to read, also easy to ignore (this is important), and extremely low-impact when marking up existing code: no new lines, no rearranging of code, purely additive; strictly appends to the end of existing control statements... these are very nice properties for casually marking up some code where it proves to be profitable, without interfering with readability, or even interfering with historic diff's in any meaningful way that might make it annoying to review.
>>>
>>> How is that materially different from [[likely]] annotations?
>>
>> It's associated with the branch and not with the program path.
> 
> I have no idea what the difference is, as the branch determines the
> program path.
> 

Well, it is the attribute being associated with the program path being ill-defined that is being criticized in that blog post. The difference is that for path-associated, you are saying that a specific statement is likely or unlikely to be executed, for branch-associated, you are saying in which direction a specific branch is likely to go.
September 13
On Fri, 13 Sept 2024, 09:31 Walter Bright via Digitalmars-d, < digitalmars-d@puremagic.com> wrote:

> On 9/11/2024 3:44 PM, Manu wrote:
> > The article given above shows why arbitrary hints given as stand-alone statements in a flow causes nonsense when conflicting annotations appear
> within
> > a flow.
>
> It reminds me of the wretched
>
> __declspec
> __attribute__
> __pragma
> _Pragma
> #pragma
>
> additions to C that don't fit in the grammar in any sane manner
>

Yes, that's exactly why this thread exists. What you describe is the situation we have in D today...

>


September 13
On 9/13/2024 4:56 AM, Timon Gehr wrote:
> Well, it is the attribute being associated with the program path being ill-defined that is being criticized in that blog post. The difference is that for path-associated, you are saying that a specific statement is likely or unlikely to be executed, for branch-associated, you are saying in which direction a specific branch is likely to go.

Ok, thanks for the explanation. The branch predictor on CPUs defaults to a forward branch being considered unlikely, and a backwards branch being considered likely.

I'm sure the CPU designers collected statistics on this before burning this into the branch predictor hardware.

It's a simple and easily remembered rule, and so why dmd behaves as it does.

As for an attribute, I cannot see it being viable for anything other than the `if`, as the undocumented menace of it being applied to anything else is apparent in the blog post.

Hence, if D were to support something along those lines, it would be a keyword such as:

```
ifrarely (i) return 0;
```

as the least ugly approach. But I've been unable to come up with an obviously good keyword for it. And we would need buyin from Iain and Martin.
September 13

On Friday, 13 September 2024 at 18:53:11 UTC, Walter Bright wrote:

>

Hence, if D were to support something along those lines, it would be a keyword such as:

ifrarely (i) return 0;

as the least ugly approach. But I've been unable to come up with an obviously good keyword for it. And we would need buyin from Iain and Martin.

ifrarely is nice, but I prefer swift style guard.

func ex(maybe : Int?)
{
    guard let val = maybe else {
        // implicitly unlikely
        print("early exit")
        return
    }

    // look ma - val is in scope here!
    print("val = ", val)
}

ex(maybe:1)
ex(maybe:nil)
September 13
On Saturday, 24 August 2024 at 02:34:04 UTC, Walter Bright wrote:
> I recently learned a syntactical trick on how to do this.
>
> ```
> if (x) return;
> if (y) return;
> if (z) return;
> hotPath();
> ```
>
> Rewrite as:
>
> ```
> do
> {
>     if (x) break;
>     if (y) break;
>     if (z) break;
>     hotPath();
> } while (0);
> ```
>
> Of course, this will also work:
>
> ```
> if (x) goto Lreturn;
> if (y) goto Lreturn;
> if (z) goto Lreturn;
> hotPath();
> Lreturn:
> ```

I almost always do a similar thing as a general coding principle, ie, get all the conditional checks out of the way first, before moving on to the main code path. In most cases, the "hot path" will be the main code path to take, but the main reason I do it in this way (as a general rule), is to make the code easier to understand and manage.

As for the branch predicting, my understanding is that it will depend on the combination of the choice of compiler, and the processor the code is executed on. Some processors, will attempt to execute more than one path simultaneously until the correct path is determined. As for optimizing your code, in my experience, is that in general, there will always be much more effective methods to prioritize than trying to optimize branch predictions, but I suppose it depends entirely on the fine details of what is being attempted.

September 13
On Friday, 13 September 2024 at 18:53:11 UTC, Walter Bright wrote:
> On 9/13/2024 4:56 AM, Timon Gehr wrote:
>> Well, it is the attribute being associated with the program path being ill-defined that is being criticized in that blog post. The difference is that for path-associated, you are saying that a specific statement is likely or unlikely to be executed, for branch-associated, you are saying in which direction a specific branch is likely to go.
>
> Ok, thanks for the explanation. The branch predictor on CPUs defaults to a forward branch being considered unlikely, and a backwards branch being considered likely.

That was pretty much only the Pentiums, older AMDs just assumed branch not taken if wasn't in the BTB already. Newer CPUs, Core2 onwards, Zen, nobody seems to know for sure what they do, but the Intel SDMs do state that the Core architecture doesn't use static prediction. I think Agner Fog says it's essentially random.

September 14
On 14/09/2024 8:50 AM, claptrap wrote:
> On Friday, 13 September 2024 at 18:53:11 UTC, Walter Bright wrote:
>> On 9/13/2024 4:56 AM, Timon Gehr wrote:
>>> Well, it is the attribute being associated with the program path being ill-defined that is being criticized in that blog post. The difference is that for path-associated, you are saying that a specific statement is likely or unlikely to be executed, for branch-associated, you are saying in which direction a specific branch is likely to go.
>>
>> Ok, thanks for the explanation. The branch predictor on CPUs defaults to a forward branch being considered unlikely, and a backwards branch being considered likely.
> 
> That was pretty much only the Pentiums, older AMDs just assumed branch not taken if wasn't in the BTB already. Newer CPUs, Core2 onwards, Zen, nobody seems to know for sure what they do, but the Intel SDMs do state that the Core architecture doesn't use static prediction. I think Agner Fog says it's essentially random.

https://www.agner.org/optimize/microarchitecture.pdf

Not quite random, but certainly has changed to a significantly more complicated design since the 90's.

"
3.8 Branch prediction in Intel Haswell, Broadwell, Skylake, and other Lakes
The branch predictor appears to have been redesigned in the Haswell and later Intel processors, but the design is undocumented.
Reverse engineering has revealed that the branch prediction is using several tables of local and global histories of taken branches [Yavarzadeh, 2023].
The measured throughput for jumps and branches varies between one branch per clock cycle and one branch per two clock cycles for jumps and predicted taken branches.
Predicted not taken branches have an even higher throughput of up to two  branches per clock cycle.
The high throughput for taken branches of one per clock was observed for up to 128 branches with no more than one branch per 16 bytes of code.
The throughput is reduced to one jump per two clock cycles if there is more than one branch instruction per 16 bytes of code. If there are more than 128 branches in the critical part of the code, and if they are spaced by at least 16 bytes, then apparently the first 128 branches
have the high throughput and the remaining have the low throughput.
These observations may indicate that there are two branch prediction methods: a fast method tied to the µop cache and the instruction cache, and a slower method using a branch target buffer.
"