Standard way to supply hints to branches (page 10)

On Friday, 13 September 2024 at 20:59:01 UTC, Richard (Rikki) Andrew Cattermole wrote: > On 14/09/2024 8:50 AM, claptrap wrote: >> On Friday, 13 September 2024 at 18:53:11 UTC, Walter Bright wrote: >>> On 9/13/2024 4:56 AM, Timon Gehr wrote: >>>> Well, it is the attribute being associated with the program path being ill-defined that is being criticized in that blog post. The difference is that for path-associated, you are saying that a specific statement is likely or unlikely to be executed, for branch-associated, you are saying in which direction a specific branch is likely to go. >>> >>> Ok, thanks for the explanation. The branch predictor on CPUs defaults to a forward branch being considered unlikely, and a backwards branch being considered likely. >> >> That was pretty much only the Pentiums, older AMDs just assumed branch not taken if wasn't in the BTB already. Newer CPUs, Core2 onwards, Zen, nobody seems to know for sure what they do, but the Intel SDMs do state that the Core architecture doesn't use static prediction. I think Agner Fog says it's essentially random. > > https://www.agner.org/optimize/microarchitecture.pdf > > Not quite random, but certainly has changed to a significantly more complicated design since the 90's. Read 3.7 "Static prediction in PM and Core2 These processors do not use static prediction. The predictor simply makes a random prediction the first time a branch is seen, depending on what happens to be in the BTB entry that is assigned to the new branch. There is simply a 50% chance of making the right prediction of jump or no jump, but the predicted target is correct." I mean I assume we're talking about static prediction here, because there's no point trying to out think the branch predictor once it's got history for the branch.

On Thursday, 12 September 2024 at 22:59:32 UTC, Manu wrote: > > expect() statements are not a good time. Why not? Expect is known from other languages and is more general (can work with wider range of values than just true/false, and can work with types too), and reduces the problem to the backend domain only: no new parsing rules (don't forget editors and other tooling), and no AST change, etc. >> I think it would hurt D as a whole to >> have special stuff for such a rare thing. >> > > How? > And it's not 'rare'; it's 'niche'. For a microcontroller with no branch > prediction, it's common and essential. > It's literally unworkable to write code for the platform without this tool; > you can't have branches constantly mispredicting. It would help if your arguments would not use so much hyperbole. Obviously it is not unworkable, demonstrated by decades of microcontroller programming without it (C does not have it). The "hurt" I meant is in maintenance of the compiler frontend which already is in quite a bad complexity state (subtle bugs existing and introduced upon almost every change). Adding yet another special case is imo a net loss. -Johan

On 9/13/2024 1:50 PM, claptrap wrote: > That was pretty much only the Pentiums, older AMDs just assumed branch not taken if wasn't in the BTB already. Newer CPUs, Core2 onwards, Zen, nobody seems to know for sure what they do, but the Intel SDMs do state that the Core architecture doesn't use static prediction. I think Agner Fog says it's essentially random. You're not wrong, but Manu was interested in this feature on microcontrollers which apparently have a more primitive system.

Thanks for digging this up. I don't see much hope of integrating that into a code generator. Even worse, using different schemes for multiple processors that are supposedly implementing the same instruction set.

September 14

Re: Standard way to supply hints to branches

Posted by Walter Bright
in reply to Manu

Permalink

Walter Bright

Posted in reply to Manu

Permalink

On 9/11/2024 3:37 PM, Manu wrote:
>     The rule that the code is laid out in the order the programmer wrote it makes
>     the most sense to me. It gives the programmer the control over how it gets
>     executed. The same applies to switch statements - put the most visited case
>     statements first.
> 
> 
> This isn't a matter of opinion. The compilers do what the compilers do, and that's just the way it is.

As a matter of fact, not opinion, this is the way dmd works. How gdc/ldc work with this is up to Iain and Martin. I don't tell them what to do.

> I can't reproduce this claim of yours.
> I couldn't reproduce a case where your hack produced the code you say, and even if I could I would never accept it to be reliable.

Use the -O switch as I suggested and verified to work with gcc.

> ...and even if it DID work reliably, I /still/ wouldn't accept it, because mangling and contorting my code like that is just stupid.

I'm sympathetic to that viewpoint, and generally go for writing the clearest code as a priority. If I really need to micro-optimize a section, I'll write it in inline assembler rather than trying to convince a compiler to do it my way.

There's no guarantee any hints (like `register`) have any effect, either.

> ...and it's not like we're even talking about a trade-off here! An otherwise benign and extremely low-impact hint attribute on a control statement just isn't an edgy or risky move.

It's a complexity issue. The more complex the language, the fewer people will be interested in learning it. (I'm kind of amazed that newbies are still willing to learn C++ with the heft of its spec that literally nobody understands.) The more complex, the more risk of bugs. It is definitely not a low-impact hint. The intermediate code does not support it, such would need to be added in to all the code that deals with the block structure.

I know these things look simple on the surface, but they aren't. As more evidence of that, consider the linked article on what a mess happens when [[likely]] is used in a non-trivial manner.

On 15/09/2024 2:37 PM, Walter Bright wrote: > Thanks for digging this up. I don't see much hope of integrating that into a code generator. Agreed, you won't be able to compete with LLVM and GCC, when they have the cpu designers contributing. > Even worse, using different schemes for multiple processors that are supposedly implementing the same instruction set. The backend does have this knowledge. Normally with GCC and LLVM, you'd give it the specific cpu generation you want to target. Or you can use the JIT option for LLVM and get it sorted out on execution. https://github.com/dlang/dmd/blob/0e8e67097df1a367eec1cff2069166843ad53eb3/compiler/src/dmd/backend/cdef.d#L221 https://wiki.dlang.org/LDC-specific_language_changes#.40.28ldc.attributes.dynamicCompile.29 The dmd backend understanding of cpu generations is a little out of date though! Realistically you're going to be relying on user data to optimize this for dmd. Either from PGO, or annotation in code. I can't see PGO being worth your time to implement. If people care about performance they are simply not going to be using dmd for it. But for annotation in code... you can do this in the glue code, a simple swap of condition and with that path, should work! ``` test; jgt True; False: goto End; True: End: ret; ``` vs ``` test; ge False; True: goto End; False: End: ret; ```

Forums