Jump to page: 1 25  
Page
Thread overview
inlining...
Mar 14, 2014
Manu
Mar 14, 2014
John Colvin
Mar 14, 2014
w0rp
Mar 14, 2014
duh
Mar 14, 2014
Ethan
Mar 14, 2014
Manu
Mar 14, 2014
John Colvin
Mar 14, 2014
bearophile
Mar 14, 2014
Manu
Mar 14, 2014
Nick Sabalausky
Mar 14, 2014
Chris Williams
Mar 14, 2014
Jacob Carlborg
Mar 14, 2014
Michel Fortin
Mar 14, 2014
Jacob Carlborg
Mar 15, 2014
Manu
Mar 19, 2014
Manu
Mar 19, 2014
Manu
Mar 19, 2014
Manu
Mar 20, 2014
Manu
Mar 20, 2014
Manu
Mar 20, 2014
Manu
Mar 20, 2014
ponce
Mar 24, 2014
Puming
Mar 20, 2014
Jacob Carlborg
Mar 15, 2014
Manu
Mar 15, 2014
Daniel Murphy
Mar 15, 2014
Manu
Mar 17, 2014
Manu
Mar 17, 2014
Manu
Mar 18, 2014
Manu
Mar 14, 2014
David Gileadi
Mar 14, 2014
Paulo Pinto
Mar 14, 2014
David Gileadi
March 14, 2014
So, I'm constantly running into issues with not having control over inline. I've run into it again doing experiments in preparation for my dconf talk...

I have identified 2 cases which come up regularly:
 1. A function that should always be inline unconditionally (std.simd is
effectively blocked on this)
 2. A particular invocation of a function should be inlined for this call
only

The first case it just about having control over code gen. Some functions should effectively be macros or pseudo-intrinsics (ie, intrinsic wrappers in std.simd, beauty wrappers around asm code, etc), and I don't ever want to see a symbol appear in the binary.

My suggestion is introduction of __forceinline or something like it. We need this.


The second case is interesting, and I've found it comes up a few times on
different occasions.
In my current instance, I'm trying to build generic framework to perform
efficient composable data processing, and a basic requirement is that the
components are inlined, such that the optimiser can interleave the work
properly.

Let's imagine I have a template which implements a work loop, which wants
to call a bunch of work elements it receives by alias. The issue is, each
of those must be inlined, for this call instance only, and there's no way
to do this.
I'm gonna draw the line at stringified code to use with mixin; I hate that,
and I don't want to encourage use of mixin or stringified code in
user-facing API's as a matter of practise. Also, some of these work
elements might be useful functions in their own right, which means they can
indeed be a function existing somewhere else that shouldn't itself be
attributed as __forceinline.

What are the current options to force that some code is inlined?

My feeling is that an ideal solution would be something like an enhancement
which would allow the 'mixin' keyword to be used with regular function
calls.
What this would do is 'mix in' the function call at this location, ie,
effectively inline that particular call, and it leverages a keyword and
concept that we already have. It would obviously produce a compile error of
the code is not available.

I quite like this idea, but there is a potential syntactical problem; how to assign the return value?

int func(int y) { return y*y+10; }

int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get in
the way' if the output
int output = mixin(func(10)); // now i feel paren spammy...
mixin(int output = func(10)); // this doesn't feel right...

My feeling is the first is the best, but I'm not sure about that grammatically.


The other thing that comes to mind is that it seems like this might make a case for AST macros... but I think that's probably overkill for this situation, and I'm not confident we're ever gonna attempt to crack that nut. I'd like to see something practical and unobjectionable preferably.


This problem is fairly far reaching; phobos receives a lot of lambdas these
days, which I've found don't reliably inline and interfere with the
optimisers ability to optimise the code.
There was some discussion about a code unrolling API some time back, and
this would apply there (the suggested solution used string mixins! >_<).
Debug build performance is a problem which would be improved with this
feature.


March 14, 2014
On Friday, 14 March 2014 at 06:21:27 UTC, Manu wrote:
> So, I'm constantly running into issues with not having control over inline.
> I've run into it again doing experiments in preparation for my dconf talk...
>
> I have identified 2 cases which come up regularly:
>  1. A function that should always be inline unconditionally (std.simd is
> effectively blocked on this)
>  2. A particular invocation of a function should be inlined for this call
> only
>
> The first case it just about having control over code gen. Some functions
> should effectively be macros or pseudo-intrinsics (ie, intrinsic wrappers
> in std.simd, beauty wrappers around asm code, etc), and I don't ever want
> to see a symbol appear in the binary.
>
> My suggestion is introduction of __forceinline or something like it. We
> need this.
>
>
> The second case is interesting, and I've found it comes up a few times on
> different occasions.
> In my current instance, I'm trying to build generic framework to perform
> efficient composable data processing, and a basic requirement is that the
> components are inlined, such that the optimiser can interleave the work
> properly.
>
> Let's imagine I have a template which implements a work loop, which wants
> to call a bunch of work elements it receives by alias. The issue is, each
> of those must be inlined, for this call instance only, and there's no way
> to do this.
> I'm gonna draw the line at stringified code to use with mixin; I hate that,
> and I don't want to encourage use of mixin or stringified code in
> user-facing API's as a matter of practise. Also, some of these work
> elements might be useful functions in their own right, which means they can
> indeed be a function existing somewhere else that shouldn't itself be
> attributed as __forceinline.
>
> What are the current options to force that some code is inlined?
>
> My feeling is that an ideal solution would be something like an enhancement
> which would allow the 'mixin' keyword to be used with regular function
> calls.
> What this would do is 'mix in' the function call at this location, ie,
> effectively inline that particular call, and it leverages a keyword and
> concept that we already have. It would obviously produce a compile error of
> the code is not available.
>
> I quite like this idea, but there is a potential syntactical problem; how
> to assign the return value?
>
> int func(int y) { return y*y+10; }
>
> int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get in
> the way' if the output
> int output = mixin(func(10)); // now i feel paren spammy...
> mixin(int output = func(10)); // this doesn't feel right...
>
> My feeling is the first is the best, but I'm not sure about that
> grammatically.
>
>
> The other thing that comes to mind is that it seems like this might make a
> case for AST macros... but I think that's probably overkill for this
> situation, and I'm not confident we're ever gonna attempt to crack that
> nut. I'd like to see something practical and unobjectionable preferably.
>
>
> This problem is fairly far reaching; phobos receives a lot of lambdas these
> days, which I've found don't reliably inline and interfere with the
> optimisers ability to optimise the code.
> There was some discussion about a code unrolling API some time back, and
> this would apply there (the suggested solution used string mixins! >_<).
> Debug build performance is a problem which would be improved with this
> feature.

As much as I like the idea:

Something always tells me this is the compilers job... What clever reasoning are you applying that the compiler's inliner can't? It seems like a different situation to say SIMD code, where correctly structuring loops can require a lot of gymnastics that the compiler can't or won't (floating point conformance) do. The inlining decision seems easily automatable in comparison.

I understand that unoptimised builds for debugging are a problem, but a sensible compiler let's you hand pick your optimisation passes.

In short: why are compilers not good enough at this that the programmer needs to be involved?
March 14, 2014
On Friday, 14 March 2014 at 08:03:04 UTC, John Colvin wrote:
> As much as I like the idea:
>
> Something always tells me this is the compilers job... What clever reasoning are you applying that the compiler's inliner can't? It seems like a different situation to say SIMD code, where correctly structuring loops can require a lot of gymnastics that the compiler can't or won't (floating point conformance) do. The inlining decision seems easily automatable in comparison.
>
> I understand that unoptimised builds for debugging are a problem, but a sensible compiler let's you hand pick your optimisation passes.
>
> In short: why are compilers not good enough at this that the programmer needs to be involved?

I think it's possible for a programmer to make a better decision about what to do than a compiler. Clearly the compiler isn't smart enough to make the right decisions for Manu now, so I think it would be acceptable to at least insert functionality to give him that control now until the compiler can. There is the question of whether or not it's possible for a compiler to make the right decisions in the right places, but I'm not experienced enough to address that.
March 14, 2014
> Something always tells me this is the compilers job... What clever reasoning are you applying that the compiler's inliner can't? It seems like a different situation to say SIMD code, where correctly structuring loops can require a lot of gymnastics that the compiler can't or won't (floating point conformance) do. The inlining decision seems easily automatable in comparison.
>
> I understand that unoptimised builds for debugging are a problem, but a sensible compiler let's you hand pick your optimisation passes.
>
> In short: why are compilers not good enough at this that the programmer needs to be involved?

 No compiler gets this right 100% of the time, so if it is the compilers job they are failing. Most C++ compilers will sometimes require use of forceinline with SSE intrinsics.

 Unless it has PGO support the compiler has no idea about the runtime usage of that code. It wouldn't know which code the program spends 90% of its time in so it just applies general heuristics when deciding to inline.

 What I'd like is the ability to set a inline level per function.

Something like 0 being always inline, and 10 being never inline.

 Unless specified otherwise, the default would be 5

So if you want forceinline behavior

  inline(0) vec3 dot(vec3 a, vec3 b); //always inlined
  inline(10) vec3 cross(vec3 a, vec3 b); //never inlined

And override it at callsite--

inline(10) auto v = dot(a,b);



March 14, 2014
On Friday, 14 March 2014 at 08:03:04 UTC, John Colvin wrote:
> Something always tells me this is the compilers job

If all methods are virtual by default, how can the compiler
inline the code? Properties are a great example where I'd want to
both final and inline them in quite a few cases. In those cases,
the existence of inline would negate the need for final entirely
because being a virtual method would never come in to the
equation.

This would also apply to UFCS functions, which I use to wrap D
types such as strings in to C++ interface vtables without making
the programmer jump through a bunch of hoops.

Inline in Microsoft's compiler is always considered a strong
hint. There are cases where even __forceinline won't actually
inline a function if the compiler decides you're on crack. I
assume this would be the case here, and you'd just be helping
inform the compiler what you want inlined in case it slips up and
gets it wrong.
March 14, 2014
On 14 March 2014 18:03, John Colvin <john.loughran.colvin@gmail.com> wrote:

> As much as I like the idea:
>
> Something always tells me this is the compilers job... What clever reasoning are you applying that the compiler's inliner can't? It seems like a different situation to say SIMD code, where correctly structuring loops can require a lot of gymnastics that the compiler can't or won't (floating point conformance) do. The inlining decision seems easily automatable in comparison.
>
> I understand that unoptimised builds for debugging are a problem, but a sensible compiler let's you hand pick your optimisation passes.
>
> In short: why are compilers not good enough at this that the programmer needs to be involved?
>

The compiler applies generalised heuristics, which are certainly for the
'common' case, whatever that happens to be.
The compiler simply doesn't know what you're doing, so it's very hard for
the compiler to do anything really intelligent.

Inlining heuristics are fickle, and they also don't know what you're
actually trying to do.
Is a function 'long'? How long is 'long'? Is the function 'hot'? Do we
prefer code size or execution speed? Is the function called only from this
location, or is it used in many locations? Etc.
Inlining is one of the most fuzzy pieces of logic in the compiler, and
relies on a lot of information that is impossible for the compiler to
deduce, so it applies heuristics to try and do a decent job, but it's
certainly not perfect.

I argue, nothing so fickle can exist in the language without having a manual override. Especially not in a native language.

In my current case, the functions I need to inline are not exactly trivial.
They're really pushing the boundaries of the compilers inliner heuristics,
and then I'm calling a series of such functions that operate on parallel
data.
If they don't inline, the performance equals the sum of the functions plus
some overhead. If they all inline, the performance is equal to only the
longest one, and no overhead (the others fill in pipeline gaps).
Further, some of these functions embed some shared work... if they don't
inline, this work is repeated. If they do inline, the redundant repeated
work is eliminated.

My experiments with std.algorithm were a failure. I realised quickly that I
couldn't rely on the inliner to do a satisfactory job, and the optimiser
was unable to do it's job properly.
std.algorithm could really benefit from the mixin suggestion since things
like predicate functions are always trivial, usually supplied as little
lambdas, and inlining isn't reliable. Especially in the debug builds.
Something like algorithm loop sugar shouldn't run heaps worse than an
explicit loop just because it happens to be implemented by a generic
function.


March 14, 2014
On Friday, 14 March 2014 at 11:04:34 UTC, Manu wrote:
> On 14 March 2014 18:03, John Colvin <john.loughran.colvin@gmail.com> wrote:
>
>> As much as I like the idea:
>>
>> Something always tells me this is the compilers job... What clever
>> reasoning are you applying that the compiler's inliner can't? It seems like
>> a different situation to say SIMD code, where correctly structuring loops
>> can require a lot of gymnastics that the compiler can't or won't (floating
>> point conformance) do. The inlining decision seems easily automatable in
>> comparison.
>>
>> I understand that unoptimised builds for debugging are a problem, but a
>> sensible compiler let's you hand pick your optimisation passes.
>>
>> In short: why are compilers not good enough at this that the programmer
>> needs to be involved?
>>
>
> The compiler applies generalised heuristics, which are certainly for the
> 'common' case, whatever that happens to be.
> The compiler simply doesn't know what you're doing, so it's very hard for
> the compiler to do anything really intelligent.
>
> Inlining heuristics are fickle, and they also don't know what you're
> actually trying to do.
> Is a function 'long'? How long is 'long'? Is the function 'hot'? Do we
> prefer code size or execution speed? Is the function called only from this
> location, or is it used in many locations? Etc.
> Inlining is one of the most fuzzy pieces of logic in the compiler, and
> relies on a lot of information that is impossible for the compiler to
> deduce, so it applies heuristics to try and do a decent job, but it's
> certainly not perfect.
>
> I argue, nothing so fickle can exist in the language without having a
> manual override. Especially not in a native language.
>
> In my current case, the functions I need to inline are not exactly trivial.
> They're really pushing the boundaries of the compilers inliner heuristics,
> and then I'm calling a series of such functions that operate on parallel
> data.
> If they don't inline, the performance equals the sum of the functions plus
> some overhead. If they all inline, the performance is equal to only the
> longest one, and no overhead (the others fill in pipeline gaps).
> Further, some of these functions embed some shared work... if they don't
> inline, this work is repeated. If they do inline, the redundant repeated
> work is eliminated.
>
> My experiments with std.algorithm were a failure. I realised quickly that I
> couldn't rely on the inliner to do a satisfactory job, and the optimiser
> was unable to do it's job properly.
> std.algorithm could really benefit from the mixin suggestion since things
> like predicate functions are always trivial, usually supplied as little
> lambdas, and inlining isn't reliable. Especially in the debug builds.
> Something like algorithm loop sugar shouldn't run heaps worse than an
> explicit loop just because it happens to be implemented by a generic
> function.

Thanks for the explanations.

Another use case is to aid propogation of compile-time information for optimisation.
A function might look like a poor candidate for inlining for other reasons, but if there's a statically known (to the caller) integer parameter coming in that will be used to decide a loop length, inlining allows that info to be propogated to the callee. Static loop lengths => well optimised loops, with opportunities for optimal unrolling. Even with quite a large function this can be a good choice to inline.

I don't know how good compilers are at taking this sort of thing into account already.
March 14, 2014
John Colvin:

> Another use case is to aid propogation of compile-time information for optimisation.
> A function might look like a poor candidate for inlining for other reasons, but if there's a statically known (to the caller) integer parameter coming in that will be used to decide a loop length, inlining allows that info to be propogated to the callee. Static loop lengths => well optimised loops, with opportunities for optimal unrolling. Even with quite a large function this can be a good choice to inline.

If the function is private in a module, and it's called only from one point (or otherwise the loop count is the same in different calls), I think this optimization can be performed even if the function is not inlined.

Bye,
bearophile
March 14, 2014
On 14 March 2014 22:02, John Colvin <john.loughran.colvin@gmail.com> wrote:
>
>
> Thanks for the explanations.
>
> Another use case is to aid propogation of compile-time information for
> optimisation.
> A function might look like a poor candidate for inlining for other
> reasons, but if there's a statically known (to the caller) integer
> parameter coming in that will be used to decide a loop length, inlining
> allows that info to be propogated to the callee. Static loop lengths =>
> well optimised loops, with opportunities for optimal unrolling. Even with
> quite a large function this can be a good choice to inline.
>

Yup, this is a classic example. Extremely relevant.
And it's precisely the sort of thing that an inline heuristic is likely to
fail at.

I don't know how good compilers are at taking this sort of thing into
> account already.
>

I don't know if they try or not, but I can say from experience that results
are generally unreliable.
I would never depend on the inliner to get this right.


On 14 March 2014 22:08, bearophile <bearophileHUGS@lycos.com> wrote:

> John Colvin:
>
>
> ...
>>
>
> If the function is private in a module, and it's called only from one point (or otherwise the loop count is the same in different calls), I think this optimization can be performed even if the function is not inlined.
>

This is probably true, but I would never rely on it.
You have some carefully tuned code that works well, and then one day, some
random unrelated thing tweaks a balance, and your previously good code is
suddenly slow for unknown reasons.

The point is, there are times when you know your code should be inlined; ie, it's not an 'optimisation', it's an expectation/requirement. A programmer needs to be able to express this.


March 14, 2014
On 2014-03-14 07:21, Manu wrote:
> So, I'm constantly running into issues with not having control over inline.
> I've run into it again doing experiments in preparation for my dconf talk...
>
> I have identified 2 cases which come up regularly:
>   1. A function that should always be inline unconditionally (std.simd
> is effectively blocked on this)
>   2. A particular invocation of a function should be inlined for this
> call only
>
> The first case it just about having control over code gen. Some
> functions should effectively be macros or pseudo-intrinsics (ie,
> intrinsic wrappers in std.simd, beauty wrappers around asm code, etc),
> and I don't ever want to see a symbol appear in the binary.
>
> My suggestion is introduction of __forceinline or something like it. We
> need this.

Haven't we already agreed a pragma for force inline should be implemented. Or is that something I have dreamed?

> The second case is interesting, and I've found it comes up a few times
> on different occasions.
> In my current instance, I'm trying to build generic framework to perform
> efficient composable data processing, and a basic requirement is that
> the components are inlined, such that the optimiser can interleave the
> work properly.
>
> Let's imagine I have a template which implements a work loop, which
> wants to call a bunch of work elements it receives by alias. The issue
> is, each of those must be inlined, for this call instance only, and
> there's no way to do this.
> I'm gonna draw the line at stringified code to use with mixin; I hate
> that, and I don't want to encourage use of mixin or stringified code in
> user-facing API's as a matter of practise. Also, some of these work
> elements might be useful functions in their own right, which means they
> can indeed be a function existing somewhere else that shouldn't itself
> be attributed as __forceinline.
>
> What are the current options to force that some code is inlined?
>
> My feeling is that an ideal solution would be something like an
> enhancement which would allow the 'mixin' keyword to be used with
> regular function calls.
> What this would do is 'mix in' the function call at this location, ie,
> effectively inline that particular call, and it leverages a keyword and
> concept that we already have. It would obviously produce a compile error
> of the code is not available.
>
> I quite like this idea, but there is a potential syntactical problem;
> how to assign the return value?
>
> int func(int y) { return y*y+10; }
>
> int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get

I think this is the best syntax of these three alternatives.

> in the way' if the output
> int output = mixin(func(10)); // now i feel paren spammy...

This syntax can't work. It's already interpreted calling "func" and use the result as a string mixin.

> mixin(int output = func(10)); // this doesn't feel right...

No.

> My feeling is the first is the best, but I'm not sure about that
> grammatically.

Yeah, I agree.

> The other thing that comes to mind is that it seems like this might make
> a case for AST macros... but I think that's probably overkill for this
> situation, and I'm not confident we're ever gonna attempt to crack that
> nut. I'd like to see something practical and unobjectionable preferably.

AST macros would solve it. It could solve the first use case as well. I would not implement AST macros just to support force inline but we have many other uses cases as well. I would have implement AST macros a long time ago. Hopefully this would avoid the need to create new language features in some cases.

First use case, just define a macro that returns the AST for the content of the function you would create.

macro func (Ast!(int) a)
{
    return <[ $a * $a; ]>;
}

int output = func(10); // always inlined

Second use case, define a macro, "inline", that takes the function you want to call as a parameter. The macro will basically inline the body.

macro inline (T, U...) (Ast!(T function (U) func)
{
    // this would probably be more complicated
    return func.body;
}

int output = func(10); // not inlined
int output = inline(func(10)); // always inlined

> This problem is fairly far reaching; phobos receives a lot of lambdas
> these days, which I've found don't reliably inline and interfere with
> the optimisers ability to optimise the code.

I thought since lambdas are passed as template parameters they would always be inlined.

-- 
/Jacob Carlborg
« First   ‹ Prev
1 2 3 4 5