March 19, 2014
On Wednesday, 19 March 2014 at 08:35:53 UTC, Manu wrote:
> The idea of eliminating common sub-expressions suggests that there _are_
> common sub-expressions, which aren't affected by the function arguments.
> This case is highly unusual in my experience.

Not if you delay optimization until profiling and focus on higher level structures during initial implementation. Or use composing (like generic programming).

If you hand optimize right from the start then you might be right, but if you never call a function with the same parameters then you are doing premature optimization IMHO.

>> When the function is generated code (not hand written).
>
> I'm not sue what you mean here?

Code that is generated by a tool (or composable templates or whatever) tend to be repetitive and suboptimal. I.e. boiler plate code that looks like it was written by a monkey…

>> You already have that. It is sugar.
>>
>
> I don't already have it, otherwise I'd be making use of it. D has no control over the inliner.

I meant that if you have explicit inline hints like C++ then you also have call-site inlining if you want to.

> I also don't think that suggestion of yours works. I suspect the compiler
> will see the outer function as a trivial wrapper which will fall within the
> compilers normal inline heuristics, and it will all inline anyway.

That should be considered a bug if it is called from more than one location.
March 19, 2014
On 19 March 2014 19:16, <7d89a89974b0ff40.invalid@internationalized.invalid>wrote:

> On Wednesday, 19 March 2014 at 08:35:53 UTC, Manu wrote:
>
>> The idea of eliminating common sub-expressions suggests that there _are_ common sub-expressions, which aren't affected by the function arguments. This case is highly unusual in my experience.
>>
>
> Not if you delay optimization until profiling and focus on higher level structures during initial implementation. Or use composing (like generic programming).
>
> If you hand optimize right from the start then you might be right, but if you never call a function with the same parameters then you are doing premature optimization IMHO.
>

Okay, do you have use cases for any of this stuff? Are you just making it
up, or do you have significant experience to say this is what you need?
I can say for a fact, that recursive inline would make almost everything I
want it for much more annoying. I would find myself doing stupid stuff to
fight the recursive inliner in every instance.

 When the function is generated code (not hand written).
>>>
>>
>> I'm not sue what you mean here?
>>
>
> Code that is generated by a tool (or composable templates or whatever) tend to be repetitive and suboptimal. I.e. boiler plate code that looks like it was written by a monkey…
>

I'm not sure where the value is... why would you want to inline this?

 You already have that. It is sugar.
>>>
>>>
>> I don't already have it, otherwise I'd be making use of it. D has no control over the inliner.
>>
>
> I meant that if you have explicit inline hints like C++ then you also have call-site inlining if you want to.


I still don't follow. C++ doesn't have call-site inlining. C/C++ has macros, and there is no way to achieve the same functionality in D right now, that's a key motivation for the proposal.

 I also don't think that suggestion of yours works. I suspect the compiler
>> will see the outer function as a trivial wrapper which will fall within
>> the
>> compilers normal inline heuristics, and it will all inline anyway.
>>
>
> That should be considered a bug if it is called from more than one location.
>

Seriously, you're making 'inline' about 10 times more complicated than it
should ever be.
If you ask me, I have no value in recursive inlining, infact, that would
anger me considerably.

By making this hard, you're also making it equally unlikely. Let inline exist first, then if/when it doesn't suit your use cases, argue for the details.


March 19, 2014
On Wednesday, 19 March 2014 at 12:35:30 UTC, Manu wrote:
> Okay, do you have use cases for any of this stuff? Are you just making it
> up, or do you have significant experience to say this is what you need?

I don't need anything, I hand optimize prematurely. And I don't want to do that.

But yes, inner loops benefits from exhaustive inlining because you get to move common expressions out of the loop or change them to delta increments. It is only when you trash the caches that inlining does not pay off.

I do it by hand. I don't want to do it by hand.

> If you ask me, I have no value in recursive inlining, infact, that would
> anger me considerably.

Why? You could always set the depth to 1, or make 1 the default.

And it isn't difficult to implement.
March 20, 2014
On 20 March 2014 06:23, <7d89a89974b0ff40.invalid@internationalized.invalid>wrote:

> On Wednesday, 19 March 2014 at 12:35:30 UTC, Manu wrote:
>
>> Okay, do you have use cases for any of this stuff? Are you just making it up, or do you have significant experience to say this is what you need?
>>
>
> I don't need anything, I hand optimize prematurely. And I don't want to do that.
>
> But yes, inner loops benefits from exhaustive inlining because you get to move common expressions out of the loop or change them to delta increments. It is only when you trash the caches that inlining does not pay off.
>
> I do it by hand. I don't want to do it by hand.
>
>
>  If you ask me, I have no value in recursive inlining, infact, that would
>> anger me considerably.
>>
>
> Why? You could always set the depth to 1, or make 1 the default.
>
> And it isn't difficult to implement.
>

The problem is upside down. If you want to inline multiple levels, you
start from the leaves and move downwards, not from the root moving upwards
(leaving a bunch of leaves perhaps not inlined), which is what you're
really suggesting.
Inlining should be strictly deliberate, there's nothing to say that every
function called in a tree should be inlined. There's a high probability
there's one/some that shouldn't be among a few that should.

Remember too, that call-site inlining isn't the only method, there would also be always-inline. I think always-inline is what you want for some decidedly trivial functions (although these will probably be heuristically inlined anyway), not call-site inlining. I just don't see how recursive call-site inlining is appropriate, considering that call trees are often complex, subject to change, and may even call functions that you don't have source for. You can cascade the mixin keyword if you want to, that's very simple. I'd be highly surprised if you ever encountered a call tree where you wanted to inline everything (and the optimiser didn't do it for you). As soon as you encounter a single function in the tree that shouldn't be inlined, then you'll be forced to do it one level at a time anyway.


March 20, 2014
On Thursday, 20 March 2014 at 02:08:16 UTC, Manu wrote:
> The problem is upside down. If you want to inline multiple levels, you
> start from the leaves and move downwards, not from the root moving upwards

Yes, that is true in cases where leaves are frequently visited. Good point. I am most interested in full inlining, but the heuristics should probably start with the leaves for people not interested in that. Agree.

Anyway, in the case of ray tracing (or any search structure) I could see the value of having the opposite in combination with CTFE/partial evaluation.

Example: Define a static scene (of objects) and let the compiler turn it into "a state machine" of code.

Another example: Define an array of data, use partial evaluation to turn it into a binary tree, then turn the binary tree into code.

> Inlining should be strictly deliberate, there's nothing to say that every
> function called in a tree should be inlined. There's a high probability
> there's one/some that shouldn't be among a few that should.

In the case of a long running loop it does not really matter. What it does get you is a chance to use generic code (or libraries) and then do a first-resort optimization. I basically see it as a time-saving feature (programmers time). A tool for cutting development costs.

> Remember too, that call-site inlining isn't the only method, there would
> also be always-inline...

Yes, that is the first. I have in another thread some time ago suggested a solution that use weighted inlining to aid compiler heuristics:

http://forum.dlang.org/thread/szjkyfpnachnnyknnfwp@forum.dlang.org#post-szjkyfpnachnnyknnfwp:40forum.dlang.org

As you can see I also suggested call-site inlining, so I am fully behind you in this. :-) Lack of inlining and GC are my main objections to D.

> I think always-inline is what you want for some
> decidedly trivial functions (although these will probably be heuristically
> inlined anyway), not call-site inlining.

I agree. Compiler heuristics can change. It is desirable to be able to express intent no matter what the current heuristics are.

> I just don't see how recursive
> call-site inlining is appropriate, considering that call trees are often
> complex, subject to change, and may even call functions that you don't have
> source for.

You should not use it blindly.

> You can cascade the mixin keyword if you want to, that's very simple.

Not if you build the innerloop using generic components. I want this

inline_everything while(conditon){
statement;
statement;
}

> I'd be highly surprised if you ever encountered a call tree where
> you wanted to inline everything (and the optimiser didn't do it for you).

Not if you move to high-level programming using prewritten code and only go low level after profiling.

> As soon as you encounter a single function in the tree that shouldn't be
> inlined, then you'll be forced to do it one level at a time anyway.

But then you have to change the libraries you are using!?

Nothing prevents you to introduce exceptions as an extension though. I want inline(0.5) as default, but also be able to write inline(1) for inline always and inline(0) for inline never.

func1(){} // implies inline(0.5) weighting
inline func2(){} // same as inline(1) weighting, inline always
inline(0.75) fun31(){} // increase the heuristics weighting
inline(0) func4(){} // never-ever inline

Ola.
March 20, 2014
I just want to add these reasons for having inlining despite having compiler heuristics:

1. If you compile for embedded or PNACL on the web, you want a small executable. That means the heuristics should not inline if it increase the code size unless the programmer specified it in the code. (Or that you specify a target size, and do compiler re-runs until it fits.)

2. If you use profile guided opimization you should inline based on call frequency, but the input set might have missed some scenarios and you should be able to overrule the profile by explicit inlining in code where you know that it matters. (e.g. tight loop in an exception handler)
March 20, 2014
On 20 March 2014 18:35, <7d89a89974b0ff40.invalid@internationalized.invalid>wrote:

> On Thursday, 20 March 2014 at 02:08:16 UTC, Manu wrote:
>
>> The problem is upside down. If you want to inline multiple levels, you start from the leaves and move downwards, not from the root moving upwards
>>
>
> Yes, that is true in cases where leaves are frequently visited. Good point. I am most interested in full inlining, but the heuristics should probably start with the leaves for people not interested in that. Agree.
>
> Anyway, in the case of ray tracing (or any search structure) I could see the value of having the opposite in combination with CTFE/partial evaluation.
>
> Example: Define a static scene (of objects) and let the compiler turn it into "a state machine" of code.
>
> Another example: Define an array of data, use partial evaluation to turn it into a binary tree, then turn the binary tree into code.
>
>
>  Inlining should be strictly deliberate, there's nothing to say that every
>> function called in a tree should be inlined. There's a high probability there's one/some that shouldn't be among a few that should.
>>
>
> In the case of a long running loop it does not really matter. What it does get you is a chance to use generic code (or libraries) and then do a first-resort optimization. I basically see it as a time-saving feature (programmers time). A tool for cutting development costs.
>
>  Remember too, that call-site inlining isn't the only method, there would
>> also be always-inline...
>>
>
> Yes, that is the first. I have in another thread some time ago suggested a solution that use weighted inlining to aid compiler heuristics:
>
> http://forum.dlang.org/thread/szjkyfpnachnnyknnfwp@forum.dlang.org#post- szjkyfpnachnnyknnfwp:40forum.dlang.org
>
> As you can see I also suggested call-site inlining, so I am fully behind you in this. :-) Lack of inlining and GC are my main objections to D.
>
>
>  I think always-inline is what you want for some
>> decidedly trivial functions (although these will probably be heuristically inlined anyway), not call-site inlining.
>>
>
> I agree. Compiler heuristics can change. It is desirable to be able to express intent no matter what the current heuristics are.
>
>
>  I just don't see how recursive
>> call-site inlining is appropriate, considering that call trees are often
>> complex, subject to change, and may even call functions that you don't
>> have
>> source for.
>>
>
> You should not use it blindly.
>
>
>  You can cascade the mixin keyword if you want to, that's very simple.
>>
>
> Not if you build the innerloop using generic components. I want this
>
> inline_everything while(conditon){
> statement;
> statement;
>
> }
>
>  I'd be highly surprised if you ever encountered a call tree where
>> you wanted to inline everything (and the optimiser didn't do it for you).
>>
>
> Not if you move to high-level programming using prewritten code and only go low level after profiling.
>
>
>  As soon as you encounter a single function in the tree that shouldn't be
>> inlined, then you'll be forced to do it one level at a time anyway.
>>
>
> But then you have to change the libraries you are using!?
>
> Nothing prevents you to introduce exceptions as an extension though. I
> want inline(0.5) as default, but also be able to write inline(1) for inline
> always and inline(0) for inline never.
>
> func1(){} // implies inline(0.5) weighting
> inline func2(){} // same as inline(1) weighting, inline always
> inline(0.75) fun31(){} // increase the heuristics weighting
> inline(0) func4(){} // never-ever inline
>
> Ola.
>

I'm sorry. I really can't support any of these wildly complex ideas. I just
don't feel they're useful, and they're not very well founded.
A numeric weight? What scale is it in? I'm not sure of any
'standard-inline-weight-measure' that any programmer would be able to
intuitively gauge the magic number against. That will simply never be
agreed by the devs.
It also doesn't make much sense... different platforms will assign very
different weights and different heuristics at the inliner. It's not a
numeric quantity; it's a complex determination whether a function is a good
candidate or not.
The value you specify is likely highly context sensitive and probably not
portable. Heuristic based Inlining should be left to the optimiser to
decide.

And I totally object to recursive inlining. It has a kind of absolute
nature that removes control all the way down the call tree, and I don't
feel it's likely that you would often (ever?) want to explicitly inline an
entire call tree.
If you want to inline a second level, then write mixin in the second level.
Recurse.
You are talking about generic code as if this isn't appropriate, but I
specifically intend to use this in generic code very similar to what you
suggest; so I don't see the incompatibility.
I think you're saying like manually specifying it all the way down the call
tree is inconvenient, but I would argue that manually specifying
*exclusions* throughout the call tree after specifying a recursive inline
is even more inconvenient. It requires more language (a feature to mark an
exclusion), has a kind of obtuse double-negative logic about it, and it's
equally invasive to your code.

If you can prove that single level call-site inlining doesn't satisfy your needs at some later time, make a proposal then, along with your real-world use cases. But by throwing it in this thread right now, you're kinda just killing the thread, and making it very unlikely that anything will happen at all, which is annoying, because I REALLY need this (I've been trying to motivate inline support for over 3 years), and I get the feeling you're just throwing hypotheticals around.

You're still fairly new here, but be aware that feature requests will become exponentially less likely to be accepted with every degree of complexity added. By making this seem hard, you're also making it almost certain not to happen, which isn't in either of our interest.

My OP suggestion is the simplest solution I can conceive which will definitely satisfy all the real-world use cases that I've ever encountered. Is predictable, portable, simple.


March 20, 2014
On Thursday, 20 March 2014 at 12:31:33 UTC, Manu wrote:
> I'm sorry. I really can't support any of these wildly complex ideas.

They aren't actually complex, except tail-call optimization (but that is well understood).

> If you want to inline a second level, then write mixin in the second level.

You might as well do copy-paste then. You cannot add inlining to an imported library without modifying it.

> at all, which is annoying, because I REALLY need this (I've been trying to
> motivate inline support for over 3 years), and I get the feeling you're
> just throwing hypotheticals around.

You need inlining, agree, but not 1 level mixin. Because you can do that with regular inlining.
March 20, 2014
Please note that 1 level mixin is not sufficient in the case of libraries. In too many cases you will not inline the function that does the work, only the interface wrapper.
March 20, 2014
On 21 March 2014 00:10, <7d89a89974b0ff40.invalid@internationalized.invalid>wrote:

> Please note that 1 level mixin is not sufficient in the case of libraries. In too many cases you will not inline the function that does the work, only the interface wrapper.
>

I don't think I would ever want to inline the whole call tree of a library.
I've certainly never wanted to do anything like that in 20 years or so, and
I've worked on some really performance critical systems, like amiga,
dreamcast, ps2.
It still sounds really sketchy. If the function that does the work is a few
levels deep, then there is probably a good reason for that. What if there's
an error check that writes log output or something? Or some branch that
leads to other uncommon paths?

I think you're making this problem up. Can you demonstrate where this has
been a problem for you in the past?
The call tree would have to be so very particular for this to be
appropriate, and then you say this is a library, which you have no control
over... so the call tree is just perfect by chance? What if the library
changes?