February 03, 2015
On 2/2/2015 8:36 PM, Daniel Murphy wrote:
> The user can modify the code to allow it to be inlined.  There are a huge number
> of constructs that cause dmd's inliner to completely give up.  If a function
> _must_ be inlined, the compiler needs to give an error if it fails.

A separate message with a pragmatic difficulty with your suggestion.

Different compilers will have different inlining capabilities. Different versions of the same compiler may behave differently. This means that sometimes a user may get a compilation failure, sometimes not. It's highly brittle.

So enter the workaround code. Different compilers and different versions will require different workaround code. Is this really reasonable for users to put up with? And will they really want to be running the workaround code when they upgrade the compiler and now it could have inlined it?
February 03, 2015
On Tuesday, 3 February 2015 at 08:28:42 UTC, Walter Bright wrote:
>
> The compiler offers a -inline switch, which will inline everything it can. Performance oriented code will use that switch.
>
> pragma(inline,true) tells the compiler that this function is 'hot', and pragma(inline, false) that this function is 'cold'. Knowing the hot and cold paths enables the optimizer to do a better job.
>

Assume I'm creating a bare-metal program with 2 functions: an entry point `void _start` and a function that puts a byte in a MMIO UART's send buffer `void send(byte b)`. `_start` calls `send`.  There is no phobos, druntime, or any other libraries.  It is just my "test.d" source file only.  (Please don't knit-pick this with irrelevant technicalities)

scenario A)
compile test.d with -inline
`_start` is pragma(inline, false)
`send` is pragma(inline, true)     -- this is redundant, yes?

scenario B)
compile with -inline
`_start` is pragma(inline, false)
`send` is pragma(inline)

scenario C)
compile without -inline
`_start` is pragma(inline, false)
`send` is pragma(inline, true)

All things being equal, will there be any difference between the resulting binaries for each of these scenarios?

Another way of putting it:  Does pragma(inline, true) simply allow the user to compiler parts of their source file with -inline?

Mike
February 03, 2015
Some perspective from a Rust developer:

https://mail.mozilla.org/pipermail/rust-dev/2013-May/004272.html

February 03, 2015
On 2/3/2015 1:11 AM, Mike wrote:
> All things being equal, will there be any difference between the resulting
> binaries for each of these scenarios?

No.

> Another way of putting it:  Does pragma(inline, true) simply allow the user to
> compiler parts of their source file with -inline?

Yes.

pragma(inline, false) paradoxically can be used to improve performance. Consider:

  if (cond)
    foo();
  else
    bar();

If cond is nearly always false, then foo() is rarely executed. If the compiler inlines it, it will likely take away registers from being used to inline bar(), and bar() needs those registers. By marking foo() as not inlinable, it won't consume those registers. (Also, inlining foo() may consume much code, making for a less efficient jump around it and making it less likely for the hot code to fit in the cache.)

This is why I'm beginning to think a pragma(hot, true/false) might be a better approach, as there are more optimizations that can be done better if the compiler knows which branches are hot or not.
February 03, 2015
On Tuesday, 3 February 2015 at 09:36:57 UTC, Walter Bright wrote:
> On 2/3/2015 1:11 AM, Mike wrote:
>> All things being equal, will there be any difference between the resulting
>> binaries for each of these scenarios?
>
> No.
>
>> Another way of putting it:  Does pragma(inline, true) simply allow the user to
>> compiler parts of their source file with -inline?
>
> Yes.
>
> pragma(inline, false) paradoxically can be used to improve performance. Consider:
>
>   if (cond)
>     foo();
>   else
>     bar();
>
> If cond is nearly always false, then foo() is rarely executed. If the compiler inlines it, it will likely take away registers from being used to inline bar(), and bar() needs those registers. By marking foo() as not inlinable, it won't consume those registers. (Also, inlining foo() may consume much code, making for a less efficient jump around it and making it less likely for the hot code to fit in the cache.)
>
> This is why I'm beginning to think a pragma(hot, true/false) might be a better approach, as there are more optimizations that can be done better if the compiler knows which branches are hot or not.

I think you're misunderstanding each other.

As far as I understand it, Johannes doesn't care much about inline for optimizations. He wants to easily access a fixed memory location for MMIO. Now you're telling him to use volatileLoad and volatileStore to do this which may work but only has a bearable syntax if wrapped. But for his embedded work he needs to be sure that the wrapping is undone and thus needs either pragma(force_inline) or pragma(address).

You're against force_inline, but now you're moving the goal posts by arguing against force_inline in the general case of code optimization. But that's not the problem here, we're talking MMIO with addresses embedded in the instruction stream.

Besides this: Why should a compiler that has an inliner fail to inline a function marked with force_inline? The result may be undesirable but it should always work at least?


February 03, 2015
"Walter Bright"  wrote in message news:maq0rp$2ar8$1@digitalmars.com...

> I'd like to reexamine those assumptions, and do a little rewinding.
>
> The compiler offers a -inline switch, which will inline everything it can. Performance oriented code will use that switch.
>
> So why doesn't the compiler inline everything anyway? Because there's a downside - it can make code difficult to symbolically debug, and it makes for
difficulties in getting good profile data.
>
> Manu was having a problem, though. He wanted inlining turned off globally so he could debug his code, but have it left on for a few functions where not inlining them would make the debug version too slow.
>
> pragma(inline,true) tells the compiler that this function is 'hot', and pragma(inline, false) that this function is 'cold'. Knowing the hot and cold paths enables the optimizer to do a better job.

This doesn't make sense to me, because even if a function is 'hot' it still shouldn't be inlined if inlining is turned off.

> There are literally thousands of optimizations applied. Plucking exactly one out and elevating it to a do-or-die status, ignoring the other 999, is a false god. There's far more to a programmer reorganizing his code to make it run faster than just sprinkling it with "forceinline" pixie dust.

Nobody is suggesting that.  forceinline if for when either a) the function is a trivial wrapper and should always always be expanded inline (ie where macros are typically used in C) or b) the compiler's heuristics have failed and profiling/inspecting the generated code has shown that the function should be inlined.

> There is a lot of value to telling the compiler where the hot and cold parts are, because those cannot be statically determined. But exactly how to achieve that goal really should be left up to the compiler implementer. Doing a better or worse job of that is a quality of implementation issue, not a language specification issue.

Yes and no.  It is still useful to have a way to tell the compiler exactly what to do, when needed.  Eg we can allocate arrays on the stack, even though the compiler could theoretically move heap allocations there without user intervention.

> Perhaps the fault here is calling it pragma(inline,true). Perhaps if it was pragma(hot) and pragma(cold) instead?

That would indeed be a better name, but it still wouldn't be what people are asking for. 

February 03, 2015
"Walter Bright"  wrote in message news:maq10s$2avu$1@digitalmars.com...

> A separate message with a pragmatic difficulty with your suggestion.
>
> Different compilers will have different inlining capabilities. Different versions of the same compiler may behave differently. This means that sometimes a user may get a compilation failure, sometimes not. It's highly brittle.
>
> So enter the workaround code. Different compilers and different versions will require different workaround code. Is this really reasonable for users to put up with? And will they really want to be running the workaround code when they upgrade the compiler and now it could have inlined it?

I don't expect this to be a huge problem, because most functions marked with forceinline would be trivial.

eg. setREG(ubyte val) { volatileStore(cast(ubyte*)0x1234, val); }

This function only exists to give a nicer interface to the register.  If the compiler can't inline it, I want to know about it at compilation time rather than later.

Again, it's for those cases that would just be done with macros in C.  Where the code should always be inlined but doing it manually the source would lead to maintenance problems. 

February 03, 2015
"Walter Bright"  wrote in message news:maq48d$2enr$1@digitalmars.com... 

> Some perspective from a Rust developer:
> 
> https://mail.mozilla.org/pipermail/rust-dev/2013-May/004272.html

I think that's mostly an argument against misuse of forceinline.
February 03, 2015
"Tobias Pankrath"  wrote in message news:cumpcsdbtreytdxxcnut@forum.dlang.org...

> Besides this: Why should a compiler that has an inliner fail to inline a function marked with force_inline? The result may be undesirable but it should always work at least?

The inliner in dmd fails to inline many constructs, loops for example.  It would succeed on all of the cases relevant to wrapping mmio. 

February 03, 2015
On Tuesday, 3 February 2015 at 10:10:43 UTC, Daniel Murphy wrote:
> "Tobias Pankrath"  wrote in message news:cumpcsdbtreytdxxcnut@forum.dlang.org...
>
>> Besides this: Why should a compiler that has an inliner fail to inline a function marked with force_inline? The result may be undesirable but it should always work at least?
>
> The inliner in dmd fails to inline many constructs, loops for example.  It would succeed on all of the cases relevant to wrapping mmio.

Why couldn't he just copy paste the functions code?