February 03, 2015
"Tobias Pankrath"  wrote in message news:zdsqgbuoobnhnjrtpjve@forum.dlang.org...

> Why couldn't he just copy paste the functions code?

Why would he want to do that? 

February 03, 2015
On Tuesday, 3 February 2015 at 10:15:38 UTC, Daniel Murphy wrote:
> "Tobias Pankrath"  wrote in message news:zdsqgbuoobnhnjrtpjve@forum.dlang.org...
>
>> Why couldn't he just copy paste the functions code?
>
> Why would he want to do that?

Let me rephrase the question: Why should inlining a function be impossible, if it can be done by a simple AST transformation?
February 03, 2015
On 2/3/2015 1:49 AM, Daniel Murphy wrote:
> This doesn't make sense to me, because even if a function is 'hot' it still
> shouldn't be inlined if inlining is turned off.

'hot' can be interpreted to be inline even if inlining is turned off. (That is what Manu wanted.)


>> There are literally thousands of optimizations applied. Plucking exactly one
>> out and elevating it to a do-or-die status, ignoring the other 999, is a false
>> god. There's far more to a programmer reorganizing his code to make it run
>> faster than just sprinkling it with "forceinline" pixie dust.
>
> Nobody is suggesting that.  forceinline if for when either a) the function is a
> trivial wrapper and should always always be expanded inline (ie where macros are
> typically used in C) or b) the compiler's heuristics have failed and
> profiling/inspecting the generated code has shown that the function should be
> inlined.

It's still elevating inlining above all other optimizations (and inlining is nothing more than just another optimization). For example, register allocation is critical to code performance, and the optimizer frequently doesn't do the best job of it.


>> There is a lot of value to telling the compiler where the hot and cold parts
>> are, because those cannot be statically determined. But exactly how to achieve
>> that goal really should be left up to the compiler implementer. Doing a better
>> or worse job of that is a quality of implementation issue, not a language
>> specification issue.
>
> Yes and no.  It is still useful to have a way to tell the compiler exactly what
> to do, when needed.  Eg we can allocate arrays on the stack, even though the
> compiler could theoretically move heap allocations there without user intervention.

Back in the olden days, with dmc you could individually turn various optimizations on and off. I finally gave up on that because it was useful to nobody. The 'register' keyword was dropped because although it could be used to do better register allocation, in reality it was so misused it would just make things worse.

Like I said, there are thousands of optimizations in the compiler. They all interact with each other in usually unexpected ways. Focusing on just one in isolation is not likely to yield best results. But with hot-or-not, instead you are giving the compiler useful information to guide its heuristics.

It's like the old joke where a captain is asked by a colonel how he'd get a flagpole raised. The captain replied with a detailed set of instructions. The colonel said wrong answer, the correct response would be for the captain to say: "Sergeant, get that flag pole raised!"

Hot-or-not gives information to guide the heuristics of the compiler's decisions.

For a related example, the compiler assumes that loops are executed 10 times when weighting variables for who gets enregistered. Giving hot-or-not guidance may raise it to 20 for hot, and lower it to 1 for not. There are many places in the optimizer where a cost function is used, not just inlining decisions.


>> Perhaps the fault here is calling it pragma(inline,true). Perhaps if it was
>> pragma(hot) and pragma(cold) instead?
>
> That would indeed be a better name, but it still wouldn't be what people are
> asking for.

I understand. And I suggest instead they ask me to "get that flagpole raised, sergeant!"
February 03, 2015
On 2/3/2015 1:56 AM, Daniel Murphy wrote:
> I don't expect this to be a huge problem, because most functions marked with
> forceinline would be trivial.
>
> eg. setREG(ubyte val) { volatileStore(cast(ubyte*)0x1234, val); }
>
> This function only exists to give a nicer interface to the register.  If the
> compiler can't inline it, I want to know about it at compilation time rather
> than later.
>
> Again, it's for those cases that would just be done with macros in C.  Where the
> code should always be inlined but doing it manually the source would lead to
> maintenance problems.

To not inline trivial functions when presented with forceinline would indeed be perverse, and while legally possible as I've said before no compiler writer would do that. Even dmd (!) has no trouble at all inlining trivial functions.

But the trouble is, people will use forceinline on very non-trivial functions, and functions where it would actually make things worse, etc., and then to have the compiler error out on them would not be productive.

See the Rust link I provided on experience with the use and misuse of forceinline.

February 03, 2015
"Tobias Pankrath"  wrote in message news:vzgszrvcxxpethbdlyro@forum.dlang.org...

> Let me rephrase the question: Why should inlining a function be impossible, if it can be done by a simple AST transformation?

It's not impossible, dmd's inliner just can't currently do it.  The transformation isn't all that simple either. 

February 03, 2015
On 2/3/2015 2:10 AM, Daniel Murphy wrote:
> The inliner in dmd fails to inline many constructs, loops for example.

It will inline a loop if the function is called at the statement level. The trouble with inlining a loop inside an expression is that is not expressible in the expression tree used in the back end.

Obviously, inlining functions with loops tend to have lower payoffs anyway, because the loop time swamps the function call overhead. Inlining a loop can even make things worse, because the loop variables may not get priority for enregistering whereas they would if in a separate function.

I.e. it is not a trivial issue of "inlining is faster".

> It would succeed on all of the cases relevant to wrapping mmio.

Yup. I understand the concern that a compiler would opt out of inlining those if it legally could, but I just cannot see that happening in reality. Modern compilers have been inlining for 25 years now, and they're not likely to just stop doing it. It's as unlikely as the compiler failing to rewrite:

   x *= 32;

into:

   x <<= 5;
February 03, 2015
"Walter Bright"  wrote in message news:maq7f1$2hka$1@digitalmars.com...

> On 2/3/2015 1:49 AM, Daniel Murphy wrote:
> > This doesn't make sense to me, because even if a function is 'hot' it still
> > shouldn't be inlined if inlining is turned off.
>
> 'hot' can be interpreted to be inline even if inlining is turned off. (That is what Manu wanted.)

It's just a naming thing, it's not important.

> It's still elevating inlining above all other optimizations (and inlining is nothing more than just another optimization). For example, register allocation is critical to code performance, and the optimizer frequently doesn't do the best job of it.

So what?  It's a pragma used in low-level code.  Some C/C++ compilers provide similar hints for loop unrolling, vectorization, etc.  It's certainly not worth a keyword or any major language changes, but a pragma doesn't cost anything to add.  D has inline assembly for a similar reason - sometimes the programmer knows best.

> Back in the olden days, with dmc you could individually turn various optimizations on and off. I finally gave up on that because it was useful to nobody. The 'register' keyword was dropped because although it could be used to do better register allocation, in reality it was so misused it would just make things worse.

And yet you kept -inline as a separate flag in dmd.

> Like I said, there are thousands of optimizations in the compiler. They all interact with each other in usually unexpected ways. Focusing on just one in isolation is not likely to yield best results. But with hot-or-not, instead you are giving the compiler useful information to guide its heuristics.

Hot-or-not is certainly useful, and probably much more widely useful than forceinline.  But that doesn't mean forceinline isn't useful.

> It's like the old joke where a captain is asked by a colonel how he'd get a flagpole raised. The captain replied with a detailed set of instructions. The colonel said wrong answer, the correct response would be for the captain to say: "Sergeant, get that flag pole raised!"
>
> Hot-or-not gives information to guide the heuristics of the compiler's decisions.
>
> For a related example, the compiler assumes that loops are executed 10 times when weighting variables for who gets enregistered. Giving hot-or-not guidance may raise it to 20 for hot, and lower it to 1 for not. There are many places in the optimizer where a cost function is used, not just inlining decisions.

Yes, this information is useful.  So is forceinline.

> I understand. And I suggest instead they ask me to "get that flagpole raised, sergeant!"

We have inline assembler because sometimes being explicit is what's needed. I would consider using forceinline in the same situations where inline assembly is a viable option.  eg interfacing with hardware, computation kernels 

February 03, 2015
On 2/1/2015 9:48 PM, Walter Bright wrote:
> On 2/1/2015 9:21 PM, Daniel Murphy wrote:
>    struct Ports {
>      static ubyte B() { return volatileLoad(cast(ubyte *)0x0025); }
>      static void B(ubyte value) { volatileStore(cast(ubyte *)0x0025, value); }
>    }

A somewhat more refined version:

  import core.bitop;

  template Ports(T, uint address) {
    @property T B() { return volatileLoad(cast(T *)address); }
    @property void B(T value) { volatileStore(cast(T *)address, value); }
  }

  alias Ports!(uint, 0x1234) MyPort;

  uint test(uint x) {
   MyPort.B(x);
   MyPort.B(x);
   return MyPort.B();
  }

Compiling with:

  dmd -c foo -O -release -inline

gives:

  _D3foo4testFkZk:
            push    EAX
            mov     ECX,01234h
            mov     [ECX],EAX
            mov     [ECX],EAX   // the redundant store was not optimized away!
            mov     EAX,[ECX]   // nor was the common subexpression removed
            add     ESP,4
            ret

See the volatile semantics noted in the comments.
February 03, 2015
"Walter Bright"  wrote in message news:maq8ao$2idu$1@digitalmars.com...

> Yup. I understand the concern that a compiler would opt out of inlining those if it legally could, but I just cannot see that happening in reality. Modern compilers have been inlining for 25 years now, and they're not likely to just stop doing it.

No, the problem is that the code might accidentally contain a construct that is not inlineable.  The user will expect it to be inlined, but the compiler will silently fail.

eg

void myWrapperFunc()
{
   callSomeFunc(999, 123, "something");
}

This function will not be inlined if callSomeFunc has a default arugment that calls alloca, for example.  If a hidden failure becomes a compiler error, the user can trivially correct the problem. 

February 03, 2015
On 03.02.15 10:35, Walter Bright wrote:
> On 2/3/2015 1:11 AM, Mike wrote:
>> Another way of putting it:  Does pragma(inline, true) simply allow the
>> user to
>> compiler parts of their source file with -inline?
> 
> Yes.
> 

Eh, yes :)

I see now, errors/warnings are invasive compared to this simple, useful addition. And undesirable generally.