Thread overview | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
February 19, 2017 Force inline | ||||
---|---|---|---|---|
| ||||
Is it possible to force a function to be inlined? Comparing a C++ and a D program, the main difference in speed (about 20-30%) is, because I manage to force g++ to inline a function while I do not find any means to do the same on D. |
February 19, 2017 Re: Force inline | ||||
---|---|---|---|---|
| ||||
Posted in reply to berni | Dne 19.2.2017 v 20:19 berni via Digitalmars-d-learn napsal(a): > Is it possible to force a function to be inlined? > > Comparing a C++ and a D program, the main difference in speed (about 20-30%) is, because I manage to force g++ to inline a function while I do not find any means to do the same on D. yes https://wiki.dlang.org/DIP56 |
February 19, 2017 Re: Force inline | ||||
---|---|---|---|---|
| ||||
Posted in reply to berni | On Sunday, 19 February 2017 at 19:19:25 UTC, berni wrote: > Is it possible to force a function to be inlined? https://dlang.org/spec/pragma.html#inline |
February 19, 2017 Re: Force inline | ||||
---|---|---|---|---|
| ||||
Posted in reply to berni | Dne 19.2.2017 v 20:19 berni via Digitalmars-d-learn napsal(a): > Is it possible to force a function to be inlined? > > Comparing a C++ and a D program, the main difference in speed (about 20-30%) is, because I manage to force g++ to inline a function while I do not find any means to do the same on D. https://dlang.org/spec/pragma.html#inline |
February 19, 2017 Re: Force inline | ||||
---|---|---|---|---|
| ||||
Posted in reply to berni | On Sunday, 19 February 2017 at 19:19:25 UTC, berni wrote:
> Is it possible to force a function to be inlined?
>
> Comparing a C++ and a D program, the main difference in speed (about 20-30%) is, because I manage to force g++ to inline a function while I do not find any means to do the same on D.
Or make it as template, maybe...
void foo()() {
}
|
February 20, 2017 Re: Force inline | ||||
---|---|---|---|---|
| ||||
Posted in reply to Daniel Kozak | On Sunday, 19 February 2017 at 20:00:00 UTC, Daniel Kozak wrote: > Dne 19.2.2017 v 20:19 berni via Digitalmars-d-learn napsal(a): > >> Is it possible to force a function to be inlined? >> >> Comparing a C++ and a D program, the main difference in speed (about 20-30%) is, because I manage to force g++ to inline a function while I do not find any means to do the same on D. > yes > https://wiki.dlang.org/DIP56 pragma(inline, true) doesn't work out well: >int bar; > >void main(string[] args) >{ > if (foo()) {} >} > >bool foo() >{ > pragma(inline, true) > > if (bar==1) return false; > if (bar==2) return false; > > return true; >} with > dmd -inline test.d I get > test.d(8): Error: function test.foo cannot inline function When I remove -inline, it compiles, but seems not to inline. I cannot tell from this small example, but with the large program, there is no speed gain. It also compiles with -inline when I remove the "if (bar==2)...". I guess, it's now really inlining, but the function is ridiculously short... I havn't tried the approach with templates yet, due to my lack of understanding templates. |
February 20, 2017 Re: Force inline | ||||
---|---|---|---|---|
| ||||
Posted in reply to berni | On Monday, February 20, 2017 12:47:43 berni via Digitalmars-d-learn wrote:
> with
>
> > dmd -inline test.d
>
> I get
>
> > test.d(8): Error: function test.foo cannot inline function
>
> When I remove -inline, it compiles, but seems not to inline. I cannot tell from this small example, but with the large program, there is no speed gain.
>
> It also compiles with -inline when I remove the "if (bar==2)...". I guess, it's now really inlining, but the function is ridiculously short...
For better or worse, the whole point of pragma(inline, true) is to produce an error when the compiler fails to inline the function. It doesn't force inlining in any way. So, the fact that it produces an error means that the compiler can't inline that function. And it's not going to inline if you're not using -inline.
The reality of the matter is that the inliner in the D frontend needs some serious work. So, it's not going to do a very good job. It's better than nothing, but in comparison to what you'd see with your typical C++ compiler, it just isn't as good. Also, there are a number of compiler bugs that get triggered when both -O and -inline are enabled. So, you're likely better off just using -O for now.
Regardless, if performance is your #1 concern, then I would suggest that you compile with ldc and not dmd. dmd is great for fast compilation and therefore it's great for development. However, while it produces decent binaries, and it may very well do certain optimizations better than the gcc or llvm backends do, on the whole, dmd's optimizer really can't compare with those of gcc or llvm. ldc almost always produces a faster binary than dmd does (though it does take longer to compile).
- Jonathan M Davis
|
February 20, 2017 Re: Force inline | ||||
---|---|---|---|---|
| ||||
Posted in reply to berni | On Monday, 20 February 2017 at 12:47:43 UTC, berni wrote: > pragma(inline, true) doesn't work out well: > >>int bar; >> >>void main(string[] args) >>{ >> if (foo()) {} >>} >> >>bool foo() >>{ >> pragma(inline, true) >> >> if (bar==1) return false; >> if (bar==2) return false; >> >> return true; >>} > > with > >> dmd -inline test.d > > I get > >> test.d(8): Error: function test.foo cannot inline function Because dmd's semantic analysis determined that it doesn't know how to inline the function and since you insisted that it must be inlined, you received an error. This is an issue with dmd. ldc2 happily inlines your function: --- $ ldc2 --version LDC - the LLVM D compiler (1.1.0): based on DMD v2.071.2 and LLVM 3.9.1 built with DMD64 D Compiler v2.072.2 Default target: x86_64-pc-linux-gnu $ ldc2 -c test.d $ objdump -dr test.o test.o: file format elf64-x86-64 Disassembly of section .text._Dmain: 0000000000000000 <_Dmain>: 0: 53 push %rbx 1: 48 83 ec 20 sub $0x20,%rsp 5: 48 89 7c 24 10 mov %rdi,0x10(%rsp) a: 48 89 74 24 18 mov %rsi,0x18(%rsp) f: 66 48 8d 3d 00 00 00 data16 lea 0x0(%rip),%rdi # 17 <_Dmain+0x17> 16: 00 13: R_X86_64_TLSGD _D4test3bari-0x4 17: 66 66 48 e8 00 00 00 data16 data16 callq 1f <_Dmain+0x1f> 1e: 00 1b: R_X86_64_PLT32 __tls_get_addr-0x4 1f: 8b 18 mov (%rax),%ebx 21: 83 fb 01 cmp $0x1,%ebx 24: 75 0a jne 30 <_Dmain+0x30> 26: 31 c0 xor %eax,%eax 28: 88 c1 mov %al,%cl 2a: 88 4c 24 0f mov %cl,0xf(%rsp) 2e: eb 29 jmp 59 <_Dmain+0x59> 30: 66 48 8d 3d 00 00 00 data16 lea 0x0(%rip),%rdi # 38 <_Dmain+0x38> 37: 00 34: R_X86_64_TLSGD _D4test3bari-0x4 38: 66 66 48 e8 00 00 00 data16 data16 callq 40 <_Dmain+0x40> 3f: 00 3c: R_X86_64_PLT32 __tls_get_addr-0x4 40: 8b 18 mov (%rax),%ebx 42: 83 fb 02 cmp $0x2,%ebx 45: 75 0a jne 51 <_Dmain+0x51> 47: 31 c0 xor %eax,%eax 49: 88 c1 mov %al,%cl 4b: 88 4c 24 0f mov %cl,0xf(%rsp) 4f: eb 08 jmp 59 <_Dmain+0x59> 51: b0 01 mov $0x1,%al 53: 88 44 24 0f mov %al,0xf(%rsp) 57: eb 00 jmp 59 <_Dmain+0x59> 59: 8a 44 24 0f mov 0xf(%rsp),%al 5d: a8 01 test $0x1,%al 5f: 75 02 jne 63 <_Dmain+0x63> 61: eb 02 jmp 65 <_Dmain+0x65> 63: eb 00 jmp 65 <_Dmain+0x65> 65: 31 c0 xor %eax,%eax 67: 48 83 c4 20 add $0x20,%rsp 6b: 5b pop %rbx 6c: c3 retq --- > > When I remove -inline, it compiles, but seems not to inline. I cannot tell from this small example, but with the large program, there is no speed gain. I'd suggest inspecting the generated assembly in order to determine whether your function was inlined or not (see above using objdump for Linux). > > It also compiles with -inline when I remove the "if (bar==2)...". I guess, it's now really inlining, but the function is ridiculously short... I don't know, but I'd guess that the length of a function is not as important for the consideration of being inlined as its semantics. |
February 20, 2017 Re: Force inline | ||||
---|---|---|---|---|
| ||||
Posted in reply to Moritz Maxeiner | Moritz Maxeiner wrote:
> I don't know, but I'd guess that the length of a function is not as important for the consideration of being inlined as its semantics.
yep. basically, dmd doesn't like anything other than very simple if/else conditions. sometimes it likes
if (cond0) return n0; else if (cond1) return n1; ...
more than the same code without else.
don't even try to inline loops. ;-)
anyway, in my real-life code inlining never worth the MASSIVELY increased compile times: speedup is never actually noticeable. if "dmd -O" doesn't satisfy your needs, there is usually no reason to trying "-inline", it is better to switch to ldc/gdc.
|
February 20, 2017 Re: Force inline | ||||
---|---|---|---|---|
| ||||
On Mon, Feb 20, 2017 at 05:16:15AM -0800, Jonathan M Davis via Digitalmars-d-learn wrote: [...] > Regardless, if performance is your #1 concern, then I would suggest that you compile with ldc and not dmd. [...] +1. If you are concerned about performance enough to worry whether the compiler will inline something, it's time to use gdc or ldc. Dmd's inliner is rudimentary at best, and its optimizer, while serviceable, is not up to par with gdc or ldc's optimizers. If you want top performance, use gdc / ldc. IME gdc -O3 consistently produces code that runs about 20-30% faster than code produced by dmd -O (even with -inline). Sometimes I've seen performance gains of up to 40-50%. This is especially likely when your code consists of deep call trees involving small(ish) functions: I've looked at the assembly output before and it seems that dmd's inliner just gives up too easily, thus missing the opportunities for further reductions and further inlining. Even after discounting the inliner, though, I find that gdc is simply better at loop optimization than dmd, such as hoisting, strength reduction, unrolling, etc.. So if your code involves complex loops, expect gdc -O3 to produce better code than dmd. Well, "better" may be debatable, but certainly gdc is far more aggressive at optimizing loops (and optimizing in general) than dmd, and I find in the cases I've looked at that aggressive optimization often leads to further optimization opportunities, whereas if the optimizer is too conservative, opportunities are missed that may lead to other opportunities, so the resulting code can end up being vastly different in performance. Having said all that, though, have you used a profiler to determine whether or not your performance bottleneck is really at the function in question? I find that 90% of the time what I truly believe should be inlined actually doesn't make much difference; the bottleneck is usually somewhere else that I didn't expect. I used to spend lots of time trying to hyper-optimize everything, only to discover later that 90% of my efforts have been wasted on gaining a meager 1% of performance, whereas if I had just used a profiler in the first place, I would have gotten a 50% performance improvement with only 10% of the effort. T -- Tech-savvy: euphemism for nerdy. |
Copyright © 1999-2021 by the D Language Foundation