DIP56 Provide pragma to control function inlining (page 5)

On 2/23/2014 1:53 PM, Walter Bright wrote: > And yes, performance critical code often suffers from bit rot, and changes in > the compiler, and needs to be re-tuned now and then. BTW, just to reiterate, there are *thousands* of optimizations the compiler may or may not do. And yes, performance critical code will often rely on them, and code is often tuned to 'tickle' certain ones. For example, I know a fellow years ago who thought he had invented a spectacular new string processing algorithm. He had the benchmarks to prove it, and published an article with his with/without benchmark. Unfortunately, the without benchmark contained an extra DIV instruction that, due to the vagaries of optimization, the compiler hadn't elided. That DIV had nothing to do with the algorithm, but the benchmark timing differences were totally due to its presence/absence. He would have spotted it if he'd ever looked at the asm generated, and saved himself from some embarrassment. I understand that in an ideal world one should never have to look at asm, but if you're writing high performance code and don't look at asm, the code is never going to beat the competition.

On 2/23/2014 1:41 PM, Namespace wrote: > pragma(inline, true); > pragma(inline, false); > pragma(inline, default); 'default' being a keyword makes for an ugly special case in how pragmas are parsed.

On Sunday, 23 February 2014 at 21:55:11 UTC, Walter Bright wrote: > On 2/23/2014 1:32 PM, Francesco Cattoglio wrote: >> [...] > > I addressed these three messages in another reply to Dmitry. Read that, and you do make a point. I am no expert on optimization, but as far as I could tell, inlining is usually the easiest and most rewarding of the optimizations one can do. I know you kind of hate warnings, but perhaps we could at least get a warning if something cannot be inlined?

On Sunday, 23 February 2014 at 21:53:43 UTC, Walter Bright wrote: > I'm aware of that, but once you add the: > > version(BadCompiler) { } else pragma(inline, true); > > things will never get better for BadCompiler. This is exactly what caused mess with http user agent info when both browsers tried to present web pages better and web devs tried to tune their pages to browsers with distinct features. Now chrome says it's Mozilla, khtml, gecko and safari. But, is that really a problem? I don't think much code relies on compiler intrinsics. If it does perhaps a way to specify attributes in one place and then reference those (like CUSTOM_INLINE define in C) would help.

February 23, 2014

Re: DIP56 Provide pragma to control function inlining

Posted by Dmitry Olshansky
in reply to Walter Bright

Permalink

Dmitry Olshansky

Posted in reply to Walter Bright

Permalink

24-Feb-2014 01:53, Walter Bright пишет:
> On 2/23/2014 1:04 PM, Dmitry Olshansky wrote:
>> That programmer is instantly aware that it can't be done due to some
>> reason.
>> Keep in mind that code changes with time and running
>> profiler/disassembler on
>> every tiny change to make sure the stuff is still inlined is highly
>> counter-productive.
>
> I'm aware of that, but once you add the:
>
>      version(BadCompiler) { } else pragma(inline, true);
>
> things will never get better for BadCompiler. And besides, that line
> looks awful.

You actually going against yourself with this argument - for porting you typically suggest:

version(OS1)
 ...
else version(OS2)
 ...
else
static assert(0);

Why forced_inline is any different then other porting (where you want fail fast).
>>> By the time you get to the point of checking on inlining, you're already
>>> looking at the assembler output, because the function is on the top of
>>> the profile of time wasters, and that's how you take it to the next
>>> level of performance.
>>
>> A one-off activity. Now what guarantees you will have that it will
>> keep getting
>> inlined? Right, nothing.
>
> You're always going to have that issue when optimizing at that level,
> and it will be for a large range of constructs. For example, you may
> need variable x to be enregistered. You may need some construct to be
> implemented as a ROL instruction. You may need a switch to be
> implemented as a binary search.

Let's not detract from original point. ROL is done as an instrinsic, and there are different answers to many of these questions that are BETTER then _always_ triple checking by hand and doing re-writes. Switch may benefit from pragmas as well, and modern compiler allow tweaking it. In fact LLVM allows assigning weights to specify which cases are more probable.

Almost all of listed issues could be addressed better then dancing around disassembler and trying to please PARTICULAR COMPILER for many cases you listed above.

Yes, looking at ASM is important but no not every single case should require the painful cycle of:
compile->disassemble-->re-write-->compile-->...

>>> The trouble with an error message, is what (as the user) can you do
>>> about it?
>> Re-write till compiler loves it, that is what we do today anyway. Else we
>> wouldn't mark it as force_inline in the first place.
>
> In which case there will be two code paths selected with a
> version(BadCompiler). I have a hard time seeing the value in supporting
> both code paths - the programmer would just use the workaround code always.

Your nice tired and true way of doing things is EQUALLY FRAGILE (if not more) and highly coupled to the compiler but only SILENTLY so.

>> With error - yo get a huge advantage - an _instant_ feedback that it
>> doesn't do
>> what you want it to do. Otherwise it gets the extra pleasure of running
>> disassembler to pinpoint your favorite call sites or observing that your
>> profiler shows the same awful stats.
>
> My point is you're going to have to look at the asm of the top functions
> on the profiler stats anyway, or you're wasting your time trying to
> optimize the code.

Like I don't know already, getting in this discussion.

> (Speaking from considerable experience doing that.)

And since you've come to enjoy it as is, you accept no improvements over that process? So you known it's hard fighting the compiler and you decidedly as a samurai reject any help messing with it. I seriously don't get the point.

GCC has force inline, let's look at what GCC does with its always_inline:
http://gcc.gnu.org/ml/gcc-help/2007-01/msg00051.html

Quote of interest:

---

> **5) Could there be any situation, where a function with always_inline
> is _silently_ not embedded?

I hope not.  I don't know of any.

---

> There's a heluva lot more to optimizing effectively than inlining, and
> it takes some back-and-forth tweaking source code and looking at the
> assembler. I gave some examples of that above.

Just because there are other reasons to look at disassembly is not a good reason to forcibly send people to double-check compiler for basic inlining.

> And yes, performance critical code often suffers from bit rot, and
> changes in the compiler, and needs to be re-tuned now and then.

And you accept no safe-guards against this because that is "the true old way"?

> I suspect if the compiler errors out on a failed inline, it'll be much
> less useful than one might think.

On the contrary, at least I may have to spent less time checking that intended optimizations are being done in ASM listings.

-- 
Dmitry Olshansky

On 2/23/2014 3:00 PM, Dmitry Olshansky wrote: > You actually going against yourself with this argument - for porting you > typically suggest: > > version(OS1) > ... > else version(OS2) > ... > else > static assert(0); There's not much choice about that. I also suggest moving such code into separate modules. > Your nice tired and true way of doing things is EQUALLY FRAGILE (if not more) > and highly coupled to the compiler but only SILENTLY so. That's very true. Do you suggest the compiler emit a list of what optimizations it did or did not do? What makes inlining special, as opposed to, say, enregistering particular variables?

On Sunday, 23 February 2014 at 23:49:57 UTC, Walter Bright wrote: > What makes inlining special, as opposed to, say, enregistering particular variables? The difference is it was explicitly told do do something and didn't. That's insubordination. Mike

On Sunday, 23 February 2014 at 21:53:43 UTC, Walter Bright wrote: > I'm aware of that, but once you add the: > > version(BadCompiler) { } else pragma(inline, true); > > things will never get better for BadCompiler. And besides, that line looks awful. If I need to support multiple compilers and if one of them is not good enough, I would first try to figure out which statement causes it to fail, if left with no other alternatives: Manually inline it in the common path for all compilers, _not_ create version blocks. Inspecting asm output doesn't scale well to huge projects. Imagine simply updating the existing codebase to use a new compiler version. Based on my experience, even if we are profiling and benchmarking a lot and have many performance based KPI:s, they will still never be as fine-grained as the functional test coverage. Also not forgetting, some performance issues may only be detected in live usage scenarios on the other side of the earth as the developers doesn't even have access to the needed environment(only imperfect simulations), in those scenarios you are quite grateful for every static compilation error/warning you can get... You are right in that there is nothing special about inlining, but I'd rather add warnings for all other failed optimisation opportunities than not to warn about failed inlining. RVCT for instance has --diag_warning=optimizations, which gives many helpful hints, such as alias issues: please add "restrict", or possible alignment issues etc.

On 2/23/2014 3:55 PM, Mike wrote: > The difference is it was explicitly told do do something and didn't. That's > insubordination. I view this as more in the manner of providing the equivalent of runtime profiling information to the optimizer, in indirectly saying how often a function is executed. Optimizing is a rather complicated process, and particular optimizations very often have weird and unpredictable interactions with other optimizations. For example, in the olden days, C compilers had a 'register' storage class. Optimizers' register allocation strategy was so primitive it needed help. Over time, however, it became apparent that uses of 'register' became bit-rotted due to maintenance, resulting in all the wrong variables being enregistered. Compiler register allocation got a lot better, almost always being better than the users'. Not only that, but with generic code, and optimization rewrites of code, many variables would disappear and new ones would take their place. Different CPUs needed different register allocation strategies. What to do with 'register' then? The result was compilers began to take the 'register' as a hint, and eventually moved to totally ignoring 'register', as it turned out to be a pessimization. I suspect that elevating one particular optimization hint to being an absolute command may not turn out well. Inlining already has performance issues, as it may increase the size of an inner loop beyond what will fit in the cache, for just one unexpected result. For another it may mess up the register allocation of the caller. "Inlining makes it faster" is not always true. Do you really want to weld this in as an absolute requirement in the language?

Forums