Thread overview | ||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
March 14, 2014 inlining... | ||||
---|---|---|---|---|
| ||||
Attachments:
| So, I'm constantly running into issues with not having control over inline. I've run into it again doing experiments in preparation for my dconf talk... I have identified 2 cases which come up regularly: 1. A function that should always be inline unconditionally (std.simd is effectively blocked on this) 2. A particular invocation of a function should be inlined for this call only The first case it just about having control over code gen. Some functions should effectively be macros or pseudo-intrinsics (ie, intrinsic wrappers in std.simd, beauty wrappers around asm code, etc), and I don't ever want to see a symbol appear in the binary. My suggestion is introduction of __forceinline or something like it. We need this. The second case is interesting, and I've found it comes up a few times on different occasions. In my current instance, I'm trying to build generic framework to perform efficient composable data processing, and a basic requirement is that the components are inlined, such that the optimiser can interleave the work properly. Let's imagine I have a template which implements a work loop, which wants to call a bunch of work elements it receives by alias. The issue is, each of those must be inlined, for this call instance only, and there's no way to do this. I'm gonna draw the line at stringified code to use with mixin; I hate that, and I don't want to encourage use of mixin or stringified code in user-facing API's as a matter of practise. Also, some of these work elements might be useful functions in their own right, which means they can indeed be a function existing somewhere else that shouldn't itself be attributed as __forceinline. What are the current options to force that some code is inlined? My feeling is that an ideal solution would be something like an enhancement which would allow the 'mixin' keyword to be used with regular function calls. What this would do is 'mix in' the function call at this location, ie, effectively inline that particular call, and it leverages a keyword and concept that we already have. It would obviously produce a compile error of the code is not available. I quite like this idea, but there is a potential syntactical problem; how to assign the return value? int func(int y) { return y*y+10; } int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get in the way' if the output int output = mixin(func(10)); // now i feel paren spammy... mixin(int output = func(10)); // this doesn't feel right... My feeling is the first is the best, but I'm not sure about that grammatically. The other thing that comes to mind is that it seems like this might make a case for AST macros... but I think that's probably overkill for this situation, and I'm not confident we're ever gonna attempt to crack that nut. I'd like to see something practical and unobjectionable preferably. This problem is fairly far reaching; phobos receives a lot of lambdas these days, which I've found don't reliably inline and interfere with the optimisers ability to optimise the code. There was some discussion about a code unrolling API some time back, and this would apply there (the suggested solution used string mixins! >_<). Debug build performance is a problem which would be improved with this feature. |
March 14, 2014 Re: inlining... | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On Friday, 14 March 2014 at 06:21:27 UTC, Manu wrote:
> So, I'm constantly running into issues with not having control over inline.
> I've run into it again doing experiments in preparation for my dconf talk...
>
> I have identified 2 cases which come up regularly:
> 1. A function that should always be inline unconditionally (std.simd is
> effectively blocked on this)
> 2. A particular invocation of a function should be inlined for this call
> only
>
> The first case it just about having control over code gen. Some functions
> should effectively be macros or pseudo-intrinsics (ie, intrinsic wrappers
> in std.simd, beauty wrappers around asm code, etc), and I don't ever want
> to see a symbol appear in the binary.
>
> My suggestion is introduction of __forceinline or something like it. We
> need this.
>
>
> The second case is interesting, and I've found it comes up a few times on
> different occasions.
> In my current instance, I'm trying to build generic framework to perform
> efficient composable data processing, and a basic requirement is that the
> components are inlined, such that the optimiser can interleave the work
> properly.
>
> Let's imagine I have a template which implements a work loop, which wants
> to call a bunch of work elements it receives by alias. The issue is, each
> of those must be inlined, for this call instance only, and there's no way
> to do this.
> I'm gonna draw the line at stringified code to use with mixin; I hate that,
> and I don't want to encourage use of mixin or stringified code in
> user-facing API's as a matter of practise. Also, some of these work
> elements might be useful functions in their own right, which means they can
> indeed be a function existing somewhere else that shouldn't itself be
> attributed as __forceinline.
>
> What are the current options to force that some code is inlined?
>
> My feeling is that an ideal solution would be something like an enhancement
> which would allow the 'mixin' keyword to be used with regular function
> calls.
> What this would do is 'mix in' the function call at this location, ie,
> effectively inline that particular call, and it leverages a keyword and
> concept that we already have. It would obviously produce a compile error of
> the code is not available.
>
> I quite like this idea, but there is a potential syntactical problem; how
> to assign the return value?
>
> int func(int y) { return y*y+10; }
>
> int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get in
> the way' if the output
> int output = mixin(func(10)); // now i feel paren spammy...
> mixin(int output = func(10)); // this doesn't feel right...
>
> My feeling is the first is the best, but I'm not sure about that
> grammatically.
>
>
> The other thing that comes to mind is that it seems like this might make a
> case for AST macros... but I think that's probably overkill for this
> situation, and I'm not confident we're ever gonna attempt to crack that
> nut. I'd like to see something practical and unobjectionable preferably.
>
>
> This problem is fairly far reaching; phobos receives a lot of lambdas these
> days, which I've found don't reliably inline and interfere with the
> optimisers ability to optimise the code.
> There was some discussion about a code unrolling API some time back, and
> this would apply there (the suggested solution used string mixins! >_<).
> Debug build performance is a problem which would be improved with this
> feature.
As much as I like the idea:
Something always tells me this is the compilers job... What clever reasoning are you applying that the compiler's inliner can't? It seems like a different situation to say SIMD code, where correctly structuring loops can require a lot of gymnastics that the compiler can't or won't (floating point conformance) do. The inlining decision seems easily automatable in comparison.
I understand that unoptimised builds for debugging are a problem, but a sensible compiler let's you hand pick your optimisation passes.
In short: why are compilers not good enough at this that the programmer needs to be involved?
|
March 14, 2014 Re: inlining... | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On Friday, 14 March 2014 at 08:03:04 UTC, John Colvin wrote:
> As much as I like the idea:
>
> Something always tells me this is the compilers job... What clever reasoning are you applying that the compiler's inliner can't? It seems like a different situation to say SIMD code, where correctly structuring loops can require a lot of gymnastics that the compiler can't or won't (floating point conformance) do. The inlining decision seems easily automatable in comparison.
>
> I understand that unoptimised builds for debugging are a problem, but a sensible compiler let's you hand pick your optimisation passes.
>
> In short: why are compilers not good enough at this that the programmer needs to be involved?
I think it's possible for a programmer to make a better decision about what to do than a compiler. Clearly the compiler isn't smart enough to make the right decisions for Manu now, so I think it would be acceptable to at least insert functionality to give him that control now until the compiler can. There is the question of whether or not it's possible for a compiler to make the right decisions in the right places, but I'm not experienced enough to address that.
|
March 14, 2014 Re: inlining... | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin |
> Something always tells me this is the compilers job... What clever reasoning are you applying that the compiler's inliner can't? It seems like a different situation to say SIMD code, where correctly structuring loops can require a lot of gymnastics that the compiler can't or won't (floating point conformance) do. The inlining decision seems easily automatable in comparison.
>
> I understand that unoptimised builds for debugging are a problem, but a sensible compiler let's you hand pick your optimisation passes.
>
> In short: why are compilers not good enough at this that the programmer needs to be involved?
No compiler gets this right 100% of the time, so if it is the compilers job they are failing. Most C++ compilers will sometimes require use of forceinline with SSE intrinsics.
Unless it has PGO support the compiler has no idea about the runtime usage of that code. It wouldn't know which code the program spends 90% of its time in so it just applies general heuristics when deciding to inline.
What I'd like is the ability to set a inline level per function.
Something like 0 being always inline, and 10 being never inline.
Unless specified otherwise, the default would be 5
So if you want forceinline behavior
inline(0) vec3 dot(vec3 a, vec3 b); //always inlined
inline(10) vec3 cross(vec3 a, vec3 b); //never inlined
And override it at callsite--
inline(10) auto v = dot(a,b);
|
March 14, 2014 Re: inlining... | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On Friday, 14 March 2014 at 08:03:04 UTC, John Colvin wrote:
> Something always tells me this is the compilers job
If all methods are virtual by default, how can the compiler
inline the code? Properties are a great example where I'd want to
both final and inline them in quite a few cases. In those cases,
the existence of inline would negate the need for final entirely
because being a virtual method would never come in to the
equation.
This would also apply to UFCS functions, which I use to wrap D
types such as strings in to C++ interface vtables without making
the programmer jump through a bunch of hoops.
Inline in Microsoft's compiler is always considered a strong
hint. There are cases where even __forceinline won't actually
inline a function if the compiler decides you're on crack. I
assume this would be the case here, and you'd just be helping
inform the compiler what you want inlined in case it slips up and
gets it wrong.
|
March 14, 2014 Re: inlining... | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin Attachments:
| On 14 March 2014 18:03, John Colvin <john.loughran.colvin@gmail.com> wrote: > As much as I like the idea: > > Something always tells me this is the compilers job... What clever reasoning are you applying that the compiler's inliner can't? It seems like a different situation to say SIMD code, where correctly structuring loops can require a lot of gymnastics that the compiler can't or won't (floating point conformance) do. The inlining decision seems easily automatable in comparison. > > I understand that unoptimised builds for debugging are a problem, but a sensible compiler let's you hand pick your optimisation passes. > > In short: why are compilers not good enough at this that the programmer needs to be involved? > The compiler applies generalised heuristics, which are certainly for the 'common' case, whatever that happens to be. The compiler simply doesn't know what you're doing, so it's very hard for the compiler to do anything really intelligent. Inlining heuristics are fickle, and they also don't know what you're actually trying to do. Is a function 'long'? How long is 'long'? Is the function 'hot'? Do we prefer code size or execution speed? Is the function called only from this location, or is it used in many locations? Etc. Inlining is one of the most fuzzy pieces of logic in the compiler, and relies on a lot of information that is impossible for the compiler to deduce, so it applies heuristics to try and do a decent job, but it's certainly not perfect. I argue, nothing so fickle can exist in the language without having a manual override. Especially not in a native language. In my current case, the functions I need to inline are not exactly trivial. They're really pushing the boundaries of the compilers inliner heuristics, and then I'm calling a series of such functions that operate on parallel data. If they don't inline, the performance equals the sum of the functions plus some overhead. If they all inline, the performance is equal to only the longest one, and no overhead (the others fill in pipeline gaps). Further, some of these functions embed some shared work... if they don't inline, this work is repeated. If they do inline, the redundant repeated work is eliminated. My experiments with std.algorithm were a failure. I realised quickly that I couldn't rely on the inliner to do a satisfactory job, and the optimiser was unable to do it's job properly. std.algorithm could really benefit from the mixin suggestion since things like predicate functions are always trivial, usually supplied as little lambdas, and inlining isn't reliable. Especially in the debug builds. Something like algorithm loop sugar shouldn't run heaps worse than an explicit loop just because it happens to be implemented by a generic function. |
March 14, 2014 Re: inlining... | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On Friday, 14 March 2014 at 11:04:34 UTC, Manu wrote:
> On 14 March 2014 18:03, John Colvin <john.loughran.colvin@gmail.com> wrote:
>
>> As much as I like the idea:
>>
>> Something always tells me this is the compilers job... What clever
>> reasoning are you applying that the compiler's inliner can't? It seems like
>> a different situation to say SIMD code, where correctly structuring loops
>> can require a lot of gymnastics that the compiler can't or won't (floating
>> point conformance) do. The inlining decision seems easily automatable in
>> comparison.
>>
>> I understand that unoptimised builds for debugging are a problem, but a
>> sensible compiler let's you hand pick your optimisation passes.
>>
>> In short: why are compilers not good enough at this that the programmer
>> needs to be involved?
>>
>
> The compiler applies generalised heuristics, which are certainly for the
> 'common' case, whatever that happens to be.
> The compiler simply doesn't know what you're doing, so it's very hard for
> the compiler to do anything really intelligent.
>
> Inlining heuristics are fickle, and they also don't know what you're
> actually trying to do.
> Is a function 'long'? How long is 'long'? Is the function 'hot'? Do we
> prefer code size or execution speed? Is the function called only from this
> location, or is it used in many locations? Etc.
> Inlining is one of the most fuzzy pieces of logic in the compiler, and
> relies on a lot of information that is impossible for the compiler to
> deduce, so it applies heuristics to try and do a decent job, but it's
> certainly not perfect.
>
> I argue, nothing so fickle can exist in the language without having a
> manual override. Especially not in a native language.
>
> In my current case, the functions I need to inline are not exactly trivial.
> They're really pushing the boundaries of the compilers inliner heuristics,
> and then I'm calling a series of such functions that operate on parallel
> data.
> If they don't inline, the performance equals the sum of the functions plus
> some overhead. If they all inline, the performance is equal to only the
> longest one, and no overhead (the others fill in pipeline gaps).
> Further, some of these functions embed some shared work... if they don't
> inline, this work is repeated. If they do inline, the redundant repeated
> work is eliminated.
>
> My experiments with std.algorithm were a failure. I realised quickly that I
> couldn't rely on the inliner to do a satisfactory job, and the optimiser
> was unable to do it's job properly.
> std.algorithm could really benefit from the mixin suggestion since things
> like predicate functions are always trivial, usually supplied as little
> lambdas, and inlining isn't reliable. Especially in the debug builds.
> Something like algorithm loop sugar shouldn't run heaps worse than an
> explicit loop just because it happens to be implemented by a generic
> function.
Thanks for the explanations.
Another use case is to aid propogation of compile-time information for optimisation.
A function might look like a poor candidate for inlining for other reasons, but if there's a statically known (to the caller) integer parameter coming in that will be used to decide a loop length, inlining allows that info to be propogated to the callee. Static loop lengths => well optimised loops, with opportunities for optimal unrolling. Even with quite a large function this can be a good choice to inline.
I don't know how good compilers are at taking this sort of thing into account already.
|
March 14, 2014 Re: inlining... | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | John Colvin:
> Another use case is to aid propogation of compile-time information for optimisation.
> A function might look like a poor candidate for inlining for other reasons, but if there's a statically known (to the caller) integer parameter coming in that will be used to decide a loop length, inlining allows that info to be propogated to the callee. Static loop lengths => well optimised loops, with opportunities for optimal unrolling. Even with quite a large function this can be a good choice to inline.
If the function is private in a module, and it's called only from one point (or otherwise the loop count is the same in different calls), I think this optimization can be performed even if the function is not inlined.
Bye,
bearophile
|
March 14, 2014 Re: inlining... | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin Attachments:
| On 14 March 2014 22:02, John Colvin <john.loughran.colvin@gmail.com> wrote: > > > Thanks for the explanations. > > Another use case is to aid propogation of compile-time information for > optimisation. > A function might look like a poor candidate for inlining for other > reasons, but if there's a statically known (to the caller) integer > parameter coming in that will be used to decide a loop length, inlining > allows that info to be propogated to the callee. Static loop lengths => > well optimised loops, with opportunities for optimal unrolling. Even with > quite a large function this can be a good choice to inline. > Yup, this is a classic example. Extremely relevant. And it's precisely the sort of thing that an inline heuristic is likely to fail at. I don't know how good compilers are at taking this sort of thing into > account already. > I don't know if they try or not, but I can say from experience that results are generally unreliable. I would never depend on the inliner to get this right. On 14 March 2014 22:08, bearophile <bearophileHUGS@lycos.com> wrote: > John Colvin: > > > ... >> > > If the function is private in a module, and it's called only from one point (or otherwise the loop count is the same in different calls), I think this optimization can be performed even if the function is not inlined. > This is probably true, but I would never rely on it. You have some carefully tuned code that works well, and then one day, some random unrelated thing tweaks a balance, and your previously good code is suddenly slow for unknown reasons. The point is, there are times when you know your code should be inlined; ie, it's not an 'optimisation', it's an expectation/requirement. A programmer needs to be able to express this. |
March 14, 2014 Re: inlining... | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On 2014-03-14 07:21, Manu wrote: > So, I'm constantly running into issues with not having control over inline. > I've run into it again doing experiments in preparation for my dconf talk... > > I have identified 2 cases which come up regularly: > 1. A function that should always be inline unconditionally (std.simd > is effectively blocked on this) > 2. A particular invocation of a function should be inlined for this > call only > > The first case it just about having control over code gen. Some > functions should effectively be macros or pseudo-intrinsics (ie, > intrinsic wrappers in std.simd, beauty wrappers around asm code, etc), > and I don't ever want to see a symbol appear in the binary. > > My suggestion is introduction of __forceinline or something like it. We > need this. Haven't we already agreed a pragma for force inline should be implemented. Or is that something I have dreamed? > The second case is interesting, and I've found it comes up a few times > on different occasions. > In my current instance, I'm trying to build generic framework to perform > efficient composable data processing, and a basic requirement is that > the components are inlined, such that the optimiser can interleave the > work properly. > > Let's imagine I have a template which implements a work loop, which > wants to call a bunch of work elements it receives by alias. The issue > is, each of those must be inlined, for this call instance only, and > there's no way to do this. > I'm gonna draw the line at stringified code to use with mixin; I hate > that, and I don't want to encourage use of mixin or stringified code in > user-facing API's as a matter of practise. Also, some of these work > elements might be useful functions in their own right, which means they > can indeed be a function existing somewhere else that shouldn't itself > be attributed as __forceinline. > > What are the current options to force that some code is inlined? > > My feeling is that an ideal solution would be something like an > enhancement which would allow the 'mixin' keyword to be used with > regular function calls. > What this would do is 'mix in' the function call at this location, ie, > effectively inline that particular call, and it leverages a keyword and > concept that we already have. It would obviously produce a compile error > of the code is not available. > > I quite like this idea, but there is a potential syntactical problem; > how to assign the return value? > > int func(int y) { return y*y+10; } > > int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get I think this is the best syntax of these three alternatives. > in the way' if the output > int output = mixin(func(10)); // now i feel paren spammy... This syntax can't work. It's already interpreted calling "func" and use the result as a string mixin. > mixin(int output = func(10)); // this doesn't feel right... No. > My feeling is the first is the best, but I'm not sure about that > grammatically. Yeah, I agree. > The other thing that comes to mind is that it seems like this might make > a case for AST macros... but I think that's probably overkill for this > situation, and I'm not confident we're ever gonna attempt to crack that > nut. I'd like to see something practical and unobjectionable preferably. AST macros would solve it. It could solve the first use case as well. I would not implement AST macros just to support force inline but we have many other uses cases as well. I would have implement AST macros a long time ago. Hopefully this would avoid the need to create new language features in some cases. First use case, just define a macro that returns the AST for the content of the function you would create. macro func (Ast!(int) a) { return <[ $a * $a; ]>; } int output = func(10); // always inlined Second use case, define a macro, "inline", that takes the function you want to call as a parameter. The macro will basically inline the body. macro inline (T, U...) (Ast!(T function (U) func) { // this would probably be more complicated return func.body; } int output = func(10); // not inlined int output = inline(func(10)); // always inlined > This problem is fairly far reaching; phobos receives a lot of lambdas > these days, which I've found don't reliably inline and interfere with > the optimisers ability to optimise the code. I thought since lambdas are passed as template parameters they would always be inlined. -- /Jacob Carlborg |
Copyright © 1999-2021 by the D Language Foundation