February 24, 2014
On 2/23/14, 8:26 PM, Vladimir Panteleev wrote:
> On Monday, 24 February 2014 at 04:14:08 UTC, Andrei Alexandrescu wrote:
>> I'll add an anecdote - in HHVM we owe a lot of speedups to the careful
>> use of "never inline" and "always inline" gcc pragmas IN ADDITION TO
>> the usual "inline" directives. We have factual proof that gcc makes
>> the wrong inline decisions BOTH WAYS if left to decide.
>>
>> If we define pragmas for inlining, "always inline" must mean always
>> inline no questions asked and "never inline" must mean always prevent
>> inlining no questions asked. Anything else would be a frustrating
>> waste of time.
>
> I think there is another, distinct use case for an inline pragma where
> "try to inline" is useful - namely, turning on the equivalent of the
> compiler "-inline" switch for just one function. I believe this is the
> original rationale behind the DIP (enabling inlining for certain
> functions even in debug builds, because otherwise the debug builds
> become so slow as to be unusable). In this case, whether the compiler
> actually succeeds at inlining the function doesn't matter as long as it
> does the same thing as for an optimized (-inline) build.
>
> Thus, I think there should be "try to inline" (same as -inline) and
> "always inline" (failure stops compilation).

Sounds fair enough.

Andrei

February 24, 2014
On Feb 24, 2014 1:15 AM, "Andrei Alexandrescu" < SeeWebsiteForEmail@erdani.org> wrote:
>
> On 2/23/14, 4:07 AM, Walter Bright wrote:
>>
>> http://wiki.dlang.org/DIP56
>>
>> Manu has needed always inlining, and I've needed never inlining. This DIP proposes a simple solution.
>
>
> This makes inlining dependent on previously-seen code. Would that make
parallel compilation more difficult?
>
> I've always thought the obvious/simple way would be an attribute such as
@forceinline and @noinline that applies to individual functions.
>
>
> Andrei
>

GDC already has both of these as a compiler extended attribute (need to
document these!!!)

import gcc.attribute;

@attribute("forceinline") ...

Being backend attributes, you can't enforce that these attributes actually take effect in user code (no static asserts!) - but you have some guarantee in that the backend will complain if it can't apply the attribute - this is good because the compiler will always produce a better diagnostic than some user static assert, always.

Regards
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';


February 24, 2014
On Feb 24, 2014 2:10 AM, "Walter Bright" <newshound2@digitalmars.com> wrote:
>
> On 2/23/2014 5:45 PM, Brad Roberts wrote:
>>
>> At this point, you're starting to argue that the entire DIP isn't
relevant.  I
>> agree with the majority that if you're going to have the directive, then
it
>> needs to be enforcement, not suggestion.
>
>
> 1. It provides information to the compiler about runtime frequency that
it cannot obtain otherwise. This is very useful information for generating better code.
>
> 2. Making it a hard requirement then means the user will have to put
versioning in it. It becomes inherently non-portable. There is no way to predict what some other version of some other compiler on some other system will do.
>
> 3. In the end, the compiler should make the decision. Inlining does not
always result in faster code, as I pointed out in another post.
>
> 4. I don't see that users really are asking for inlining or not. They are
asking for the fastest code. As such, providing hints about usage frequencies are entirely appropriate. Micromanaging the method used is not so appropriate. After all, the reason one uses a compiler in the first place rather than assembler is to not micromanage the actual instructions.
>
>
> Perhaps the lesson is the word 'inline' carries certain expectations with
it, and the feature would be better positioned as something like:
>
>     pragma(usage, often);
>     pragma(usage, rare);

Also known as, hot and cold functions.

Regards
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';


February 24, 2014
24-Feb-2014 03:49, Walter Bright пишет:
> On 2/23/2014 3:00 PM, Dmitry Olshansky wrote:
>> You actually going against yourself with this argument - for porting you
>> typically suggest:
>>
>> version(OS1)
>>   ...
>> else version(OS2)
>>   ...
>> else
>> static assert(0);
>
> There's not much choice about that. I also suggest moving such code into
> separate modules.
>
>
>> Your nice tired and true way of doing things is EQUALLY FRAGILE (if
>> not more)
>> and highly coupled to the compiler but only SILENTLY so.
>
> That's very true. Do you suggest the compiler emit a list of what
> optimizations it did or did not do? What makes inlining special, as
> opposed to, say, enregistering particular variables?
>

GCC has these attributes (including flatten to fully unroll all calls in a function) for a good reason. Let's face the fact that compilers nowhere near perfect with decisions about inlining. Especially so when building libraries.

Inlining is special in the sense that compiler doesn't know (there is not a single hint today in D) if any particular function should be a part of object code (following the ABI and referenced elsewhere) or just a logical piece of code that is reused (using any convenient calling convention or inlined).

Let me turn the question sideways - what if no_inline will be a hint to compiler and it may feel free to inline the function anyway? Will you be happy with such a pragma? It's that simple - you either gain control, or stay with wishy-washy hopes.

As you said in contrast with register allocation (that is ridiculously hard problem) later with time it turned out that trying to pin outsmart the compiler is something people were not good at in general.

-- 
Dmitry Olshansky
February 24, 2014
24-Feb-2014 04:33, Walter Bright пишет:
> On 2/23/2014 3:55 PM, Mike wrote:
>> The difference is it was explicitly told do do something and didn't.
>> That's
>> insubordination.
>
> I view this as more in the manner of providing the equivalent of runtime
> profiling information to the optimizer, in indirectly saying how often a
> function is executed.
>
> Optimizing is a rather complicated process, and particular optimizations
> very often have weird and unpredictable interactions with other
> optimizations.

Speaking of other optimizations.

There is a thing called tail-call. Funnily enough compilers still consider it an optimization whereas in practice the difference usually means "stack overflow" vs "normal execution" for functional-style code. But I'd rather prefer we stay focused on one particular optimization here.

> For example, in the olden days, C compilers had a 'register' storage
> class. Optimizers' register allocation strategy was so primitive it
> needed help. Over time, however, it became apparent that uses of
> 'register' became bit-rotted due to maintenance, resulting in all the
> wrong variables being enregistered. Compiler register allocation got a
> lot better, almost always being better than the users'.

When such a time the compiler can actually produce the best inlining decisions on its own these kind of options may become irrelevant.
However it may need to run profiler on relevant input to understand that and do it all by itself.

> Not only that,
> but with generic code, and optimization rewrites of code, many variables
> would disappear and new ones would take their place. Different CPUs
> needed different register allocation strategies. What to do with
> 'register' then?

Indeed register was tied to something immaterial - a variable, whereas in fact there are plenty of temporaries and induction variables that a programmer can't label.

In contrast the generic code is functions upon functions passed through other tiny functions. This in part what makes inlining so special.

> The result was compilers began to take the 'register' as a hint, and
> eventually moved to totally ignoring 'register', as it turned out to be
> a pessimization.
>
> I suspect that elevating one particular optimization hint to being an
> absolute command may not turn out well. Inlining already has performance
> issues, as it may increase the size of an inner loop beyond what will
> fit in the cache, for just one unexpected result. For another it may
> mess up the register allocation of the caller.

>"Inlining makes it
> faster" is not always true.

Like I'm a bloody idiot. But once your performance problem is (after perusing ASM) particular function not being inlined, dancing around compiler in the DARK until it strikes home (if ever) isn't a viable option.

And with DMD it's like 90% of cases my problem is some critical one-liner not being inlined. In contracts register allocation is mostly fine.
There are some marvelous codegen gems though:
https://d.puremagic.com/issues/show_bug.cgi?id=10932
where compiler moves from ebx to edx via a stack slot for no apparent reason.

> Do you really want to weld this in as an
> absolute requirement in the language?

Aye. That and explicit tail calls but that's a separate matter.
Experimental compilers may choose to issue warnings saying that they basically can't inline (yet or by design).

-- 
Dmitry Olshansky
February 24, 2014
On 2/23/2014 8:18 PM, Andrei Alexandrescu wrote:
> On 2/23/14, 6:12 PM, Walter Bright wrote:
>> On 2/23/2014 5:12 PM, Andrei Alexandrescu wrote:
>>> This makes inlining dependent on previously-seen code. Would that make
>>> parallel
>>> compilation more difficult?
>>
>> I don't understand the question. Inlining always depends on the compiler
>> having seen the function body.
>
> Decision to inline at line 2000 may be caused by a pragma in line 2.

I still don't understand the question. Successfully compiling anything in D can have dependencies on arbitrary other parts of the code. Why would inlining be any different, or be a special problem?

February 24, 2014
On 2/23/2014 6:12 PM, Lionello Lunesu wrote:
> On 23/02/14 20:07, Walter Bright wrote:
>> http://wiki.dlang.org/DIP56
>>
>> Manu has needed always inlining, and I've needed never inlining. This
>> DIP proposes a simple solution.
>
> void A()
> {
> }
>
> void B()
> {
>    pragma(inline, true) A();

No. This would be:
     pragma(inline, true);
     A();
and then B() will be inlined when it is encountered.

> }
>
> void C()
> {
>    B();
> }
>
> Reading that code, I would guess that within B(), the call to A() would get
> inlined. Reading the DIP, it appears that the pragma controls whether B() gets
> inlined.
>
> When the pragma is used outside of the scope at the function declaration it
> would work more like "inline" or "__inline" in C++, correct?

Yes.

February 24, 2014
On Monday, 24 February 2014 at 01:09:46 UTC, Araq wrote:
> Do you mind to back up your "fact" with some numbers? Afaict 'inline' is more common than __attribute__((forceinline)). (Well ok for C code #define is even more common, but most C code is stuck in the 70ies anyway so that doesn't mean anything.)

I can't link you closed projects I have been working on before so you can surely not trust my memories. Normal `inline` is common in headers because you can't have non-inlined function bodies in headers. In actual translation units - only from those who actually expect it to have forceinline effect (I have not met a single case where adding it can make any difference on gcc decision to inline or not). This was my actual point - not that no one uses "inline" but that the very same lax definition has turned it into essentially into no-op, causing necessity for compiler-specific alternative to appear.
February 24, 2014
On Monday, 24 February 2014 at 02:05:31 UTC, Walter Bright wrote:
>     pragma(usage, often);
>     pragma(usage, rare);

This is also useful feature, especially when also applicable to if branches  (I have been using __builtin_expect quite a lot with GCC). But it is different, I think we need both.
February 24, 2014
On 2/24/14, 12:55 AM, Walter Bright wrote:
> On 2/23/2014 8:18 PM, Andrei Alexandrescu wrote:
>> On 2/23/14, 6:12 PM, Walter Bright wrote:
>>> On 2/23/2014 5:12 PM, Andrei Alexandrescu wrote:
>>>> This makes inlining dependent on previously-seen code. Would that make
>>>> parallel
>>>> compilation more difficult?
>>>
>>> I don't understand the question. Inlining always depends on the compiler
>>> having seen the function body.
>>
>> Decision to inline at line 2000 may be caused by a pragma in line 2.
>
> I still don't understand the question. Successfully compiling anything
> in D can have dependencies on arbitrary other parts of the code. Why
> would inlining be any different, or be a special problem?

Probably it makes no difference, sorry for the distraction.

Andrei