June 09, 2020
On Tue, Jun 9, 2020 at 11:30 AM Stanislav Blinov via Digitalmars-d < digitalmars-d@puremagic.com> wrote:

> On Tuesday, 9 June 2020 at 00:36:18 UTC, Manu wrote:
>
> > What's funny is, in most cases, whether the function is
> > ACTUALLY inlined is
> > not really interesting in 2020.
> > What inline allows is control over the binary environment as I
> > describe. I
> > read it these days as "inline to the calling CU" rather than
> > "inline to the
> > calling function".
> >
> > There are cases where inline is really important, and I do want an error if it fails; for instance, if you have a leaf function (does not allocate any stack memory), it's only possible to make calls from that function where the callee is inlined... and if inlining fails, your caller will lose its no-stack-frame requirement. I've had this come up numerous times, and in those cases, a really-strong-does-make-compile-error inline would be useful, but C++ doesn't have anything like that.
>
> Maybe it's a case where a clear disambiguation is in order? E.g. make a new
>
> pragma(local);
>
> ...which would instruct the compiler to do what you're describing. Or, perhaps, expand the range of options for the existing pragma(inline), from the current bool to an enum of behaviors.
>

I've suggested that before, and I think that's what I'd encourage:
  pragma(inline, never) do not inline
  pragma(inline, true) like C/C++, emit to calling CU, hint preference the
optimiser (if it is capable of receiving hints)
  pragma(inline, force) same as true, but error when it fails


June 09, 2020
On 09/06/2020 1:40 PM, Manu wrote:
> I've suggested that before, and I think that's what I'd encourage:
>    pragma(inline, never) do not inline
>    pragma(inline, true) like C/C++, emit to calling CU, hint preference the optimiser (if it is capable of receiving hints)
>    pragma(inline, force) same as true, but error when it fails

+1
June 09, 2020
On Tuesday, 9 June 2020 at 00:36:18 UTC, Manu wrote:
> On Tue, Jun 9, 2020 at 1:00 AM Steven Schveighoffer via Digitalmars-d < digitalmars-d@puremagic.com> wrote:
>
>>[...]
>
> What's funny is, in most cases, whether the function is ACTUALLY inlined is
> not really interesting in 2020.
> What inline allows is control over the binary environment as I describe. I
> read it these days as "inline to the calling CU" rather than "inline to the
> calling function".
>
> [...]

Even when forcing inlining with __attribute__((always_inline)) ? (in gcc).
June 09, 2020
On Monday, 8 June 2020 at 06:14:44 UTC, Manu wrote:
> 1. I only want the function to be present in the CALLING binary. I do not want an inline function present in the local binary where it was defined (unless it was called internally). I do not want a linker to see the inline function symbols and be able to link to them externally. [This is about linkage and controlling the binary or distribution environment]

Yes!

> 2. I am unhappy that the optimiser chose to not inline a function call, and I want to override that judgement. [This is about micro-optimisation]

Yes!

> 3. I want to treat the function like an AST macro; I want the function inserted at the callsite, and I want to have total confidence in this mechanic. [This is about articulate mechanical control over code-gen; ie, I know necessary facts about the execution context/callstack that I expect to maintain]

Yes!!!

> I think these are the 3 broad categories of behaviour I have ever wanted
> control over.

The same for me.

I have the same experience. Moreover, non-AST inlining has the worst optimization abilities comparing with AST. Even if a function is inlined it is often inlined badly ignoring some optimization attributes, local SIMD and FMA instructions, and better loop unrolling patterns (better doesn't mean larger).

AST-like inlining is a critical and killer feature.
June 09, 2020
On 6/8/2020 5:35 AM, Jan Hönig wrote:
> In C++, consider an []operator. There would be a lot of function calls inside a kernel (some function with lot of loops, one billion iterations of the inner most loop easily). If then I have some kind of stencil or any array accesses, calling a function each time a top of resolving the current pointer would be very costly.

I infer what you're talking about is functions that are "strongly connected" should be located near each other in memory so they'll both be in the cache at the same time.

The best way to achieve this is by runtime profiling, and using the profiling data to group together strongly connected function in the executable. The Digital Mars C/C++ compiler would do this, and it was a nice optimization.

Trying to do it by hand isn't likely to be very effective.
June 09, 2020
On 6/8/2020 7:09 AM, Manu wrote:
> On Mon, Jun 8, 2020 at 8:20 PM Walter Bright via Digitalmars-d <digitalmars-d@puremagic.com <mailto:digitalmars-d@puremagic.com>> wrote:
> 
>     On 6/7/2020 11:14 PM, Manu wrote:
>      > I think a first part of the conversation to understand, is that since D
>     doesn't
>      > really have first-class `inline` (just a pragma, assumed to be low-level
>      > compiler control), I think most people bring their conceptual definition
>     over
>      > from C/C++, and that definition is a little odd (although it is immensely
>      > useful), but it's not like what D does.
> 
>     C/C++ inline has always been a hint to the compiler, not a command.
> 
> 
> It's not a hint at all. It's a mechanical tool; it marks symbols with internal linkage, and it also doesn't emit them if it's never referenced.
> The compiler may not choose to ignore that behaviour,
The C/C++ inline semantics revolve around the mechanics of .h files because it doesn't have modules. These reasons are irrelevant for D.

> it's absolutely necessary, and very important.

For .h files, sure. Why for D, though?

>     Why does it matter where it is emitted? Why would you want multiple copies of
>     the same function in the binary?
> I want zero copies if it's never called. That's very important.

Why are 0 or N copies fine, but 1 is not?


> I also want copies to appear locally when it is referenced; inline functions should NOT require that you link something to get the code... that's not inline at all.

Why? What problem are you solving?


>>     Why? What is the problem with the emission of one copy where it was defined?
> That's the antithesis of inline. If I wanted that, I wouldn't mark it inline.
> I don't want a binary full of code that shouldn't be there. It's very important to be able to control what code is in your binaries.

I know I'm being boring, but why is it important? Also, check out -gc-sections:

https://gcc.gnu.org/onlinedocs/gnat_ugn/Compilation-options.html

Which is a general solution, not a kludge.


> If it's not referenced, it doesn't exist.

Executables (on virtual memory systems) are not loaded into memory and then run. They are memory-mapped into memory, and then pages are read off of disk on demand. Unmapped code consumes neither memory nor resources.


>>     The PR I have on this makes it an informational warning. You can choose to be
>>     notified if inlining fails.
> That's not sufficient though for all use cases. This is a different kind of inline (I think it's 'force inline').

The default, and pragma(inline,true) are sufficient for all use cases except which ones?

> This #3 mechanic is rare, and #1/2 are overwhelmingly common. You don't want a sea of warnings to apply to cases of 1/2.

You won't get a sea of warnings unless you put pragma(inline,true) on a sea of functions that can't be inlined.

> I think it's important to be able to distinguish #3 from the other 2 cases.

Why?


>>   At its root, inlining is an optimization, like deciding which variables go into
>>   registers.
> No, actually... it's not. It's not an 'optimisation' in any case except maaaaybe #2; it's about control of the binary output and code generation.

Inlining is 100% about optimization.


> Low level control of code generation is important in native languages; that's why we're here.

Optimizing things that don't matter is wasting your valuable time. Optimizing things that are more effectively and thoroughly done with the linker (-gc-sections) - it's like chipping wood with a hatchet rather than a woodchipper.
June 09, 2020
On Tuesday, 9 June 2020 at 09:29:47 UTC, Walter Bright wrote:
> On 6/8/2020 7:09 AM, Manu wrote:
>> No, actually... it's not. It's not an 'optimisation' in any case except maaaaybe #2; it's about control of the binary output and code generation.
>
> Inlining is 100% about optimization.
>

We are not talking about inlining.

The whole idea behind this thread is to explain that, c++ use of "inline" does not actually mean that a function is to be inlined.

What we are talking about is a non-hacky way to say, I want a copy of this function in my object file, regardless of the module it was defined in.

You _can_ get the same effect by doing this:
---
string I_need_this_function()(string x)
{
    return "x: " ~ x;
}
---

But that introduces a template which comes with another set of problems.
Such that you don't get type-checking of the body if it's not used.


June 09, 2020
On Tuesday, 9 June 2020 at 00:36:18 UTC, Manu wrote:
> for instance, if you have a leaf function (does not allocate any stack memory), it's only possible to make calls from that function where the callee is inlined... and if inlining fails, your caller will lose its no-stack-frame requirement.

Out of interest: What does implementing a function without a stack frame enable?


June 09, 2020
On 6/8/2020 7:54 AM, Stanislav Blinov wrote:
> On Monday, 8 June 2020 at 14:45:46 UTC, H. S. Teoh wrote:
> 
>> Could you just use LTO for this?  LDC's LTO, for example, lets the linker discard unreferenced symbols.
> 
> LTO is a tool that attempts to solve a problem that does not need to exist. This could be said about linkers in general.

In the separate compilation model, compilers know nothing about what other compilation units may or may not call. This is why elision of unreferenced symbols belongs in the linker.
June 09, 2020
On Tuesday, 9 June 2020 at 10:26:45 UTC, Walter Bright wrote:
> On 6/8/2020 7:54 AM, Stanislav Blinov wrote:
>> On Monday, 8 June 2020 at 14:45:46 UTC, H. S. Teoh wrote:
>> 
>>> Could you just use LTO for this?  LDC's LTO, for example, lets the linker discard unreferenced symbols.
>> 
>> LTO is a tool that attempts to solve a problem that does not need to exist. This could be said about linkers in general.
>
> In the separate compilation model, compilers know nothing about what other compilation units may or may not call. This is why elision of unreferenced symbols belongs in the linker.

That is no reason for another [instance of the same] compiler not to have access to symbol table, and some form of source representation, at compile time, obviating the need for (much of) linker's work. LTO does what compilers ought to be doing.