Jump to page: 1 2
Thread overview
Force inline
Feb 19, 2017
berni
Feb 19, 2017
Daniel Kozak
Feb 20, 2017
berni
Feb 20, 2017
Jonathan M Davis
Feb 21, 2017
Johan Engelen
Feb 21, 2017
Daniel Kozak
Feb 21, 2017
Daniel Kozak
Feb 20, 2017
Moritz Maxeiner
Feb 20, 2017
ketmar
Feb 21, 2017
berni
Feb 20, 2017
H. S. Teoh
Feb 20, 2017
ketmar
Feb 19, 2017
ag0aep6g
Feb 19, 2017
Daniel Kozak
Feb 19, 2017
Satoshi
February 19, 2017
Is it possible to force a function to be inlined?

Comparing a C++ and a D program, the main difference in speed (about 20-30%) is, because I manage to force g++ to inline a function while I do not find any means to do the same on D.
February 19, 2017
Dne 19.2.2017 v 20:19 berni via Digitalmars-d-learn napsal(a):

> Is it possible to force a function to be inlined?
>
> Comparing a C++ and a D program, the main difference in speed (about 20-30%) is, because I manage to force g++ to inline a function while I do not find any means to do the same on D.
yes
https://wiki.dlang.org/DIP56
February 19, 2017
On Sunday, 19 February 2017 at 19:19:25 UTC, berni wrote:
> Is it possible to force a function to be inlined?

https://dlang.org/spec/pragma.html#inline
February 19, 2017
Dne 19.2.2017 v 20:19 berni via Digitalmars-d-learn napsal(a):

> Is it possible to force a function to be inlined?
>
> Comparing a C++ and a D program, the main difference in speed (about 20-30%) is, because I manage to force g++ to inline a function while I do not find any means to do the same on D.
https://dlang.org/spec/pragma.html#inline

February 19, 2017
On Sunday, 19 February 2017 at 19:19:25 UTC, berni wrote:
> Is it possible to force a function to be inlined?
>
> Comparing a C++ and a D program, the main difference in speed (about 20-30%) is, because I manage to force g++ to inline a function while I do not find any means to do the same on D.


Or make it as template, maybe...


void foo()() {

}
February 20, 2017
On Sunday, 19 February 2017 at 20:00:00 UTC, Daniel Kozak wrote:
> Dne 19.2.2017 v 20:19 berni via Digitalmars-d-learn napsal(a):
>
>> Is it possible to force a function to be inlined?
>>
>> Comparing a C++ and a D program, the main difference in speed (about 20-30%) is, because I manage to force g++ to inline a function while I do not find any means to do the same on D.
> yes
> https://wiki.dlang.org/DIP56

pragma(inline, true) doesn't work out well:

>int bar;
>
>void main(string[] args)
>{
>    if (foo()) {}
>}
> 
>bool foo()
>{
>    pragma(inline, true)
>
>    if (bar==1) return false;
>    if (bar==2) return false;
>
>    return true;
>}

with

> dmd -inline test.d

I get

> test.d(8): Error: function test.foo cannot inline function

When I remove -inline, it compiles, but seems not to inline. I cannot tell from this small example, but with the large program, there is no speed gain.

It also compiles with -inline when I remove the "if (bar==2)...". I guess, it's now really inlining, but the function is ridiculously short...

I havn't tried the approach with templates yet, due to my lack of understanding templates.
February 20, 2017
On Monday, February 20, 2017 12:47:43 berni via Digitalmars-d-learn wrote:
> with
>
> > dmd -inline test.d
>
> I get
>
> > test.d(8): Error: function test.foo cannot inline function
>
> When I remove -inline, it compiles, but seems not to inline. I cannot tell from this small example, but with the large program, there is no speed gain.
>
> It also compiles with -inline when I remove the "if (bar==2)...". I guess, it's now really inlining, but the function is ridiculously short...

For better or worse, the whole point of pragma(inline, true) is to produce an error when the compiler fails to inline the function. It doesn't force inlining in any way. So, the fact that it produces an error means that the compiler can't inline that function. And it's not going to inline if you're not using -inline.

The reality of the matter is that the inliner in the D frontend needs some serious work. So, it's not going to do a very good job. It's better than nothing, but in comparison to what you'd see with your typical C++ compiler, it just isn't as good. Also, there are a number of compiler bugs that get triggered when both -O and -inline are enabled. So, you're likely better off just using -O for now.

Regardless, if performance is your #1 concern, then I would suggest that you compile with ldc and not dmd. dmd is great for fast compilation and therefore it's great for development. However, while it produces decent binaries, and it may very well do certain optimizations better than the gcc or llvm backends do, on the whole, dmd's optimizer really can't compare with those of gcc or llvm. ldc almost always produces a faster binary than dmd does (though it does take longer to compile).

- Jonathan M Davis

February 20, 2017
On Monday, 20 February 2017 at 12:47:43 UTC, berni wrote:
> pragma(inline, true) doesn't work out well:
>
>>int bar;
>>
>>void main(string[] args)
>>{
>>    if (foo()) {}
>>}
>> 
>>bool foo()
>>{
>>    pragma(inline, true)
>>
>>    if (bar==1) return false;
>>    if (bar==2) return false;
>>
>>    return true;
>>}
>
> with
>
>> dmd -inline test.d
>
> I get
>
>> test.d(8): Error: function test.foo cannot inline function

Because dmd's semantic analysis determined that it doesn't know how to inline the function and since you insisted that it must be inlined, you received an error. This is an issue with dmd. ldc2 happily inlines your function:

---
$ ldc2 --version
LDC - the LLVM D compiler (1.1.0):
  based on DMD v2.071.2 and LLVM 3.9.1
  built with DMD64 D Compiler v2.072.2
  Default target: x86_64-pc-linux-gnu
$ ldc2 -c test.d
$ objdump -dr test.o
test.o:     file format elf64-x86-64


Disassembly of section .text._Dmain:

0000000000000000 <_Dmain>:
   0:	53                   	push   %rbx
   1:	48 83 ec 20          	sub    $0x20,%rsp
   5:	48 89 7c 24 10       	mov    %rdi,0x10(%rsp)
   a:	48 89 74 24 18       	mov    %rsi,0x18(%rsp)
   f:	66 48 8d 3d 00 00 00 	data16 lea 0x0(%rip),%rdi        # 17 <_Dmain+0x17>
  16:	00
			13: R_X86_64_TLSGD	_D4test3bari-0x4
  17:	66 66 48 e8 00 00 00 	data16 data16 callq 1f <_Dmain+0x1f>
  1e:	00
			1b: R_X86_64_PLT32	__tls_get_addr-0x4
  1f:	8b 18                	mov    (%rax),%ebx
  21:	83 fb 01             	cmp    $0x1,%ebx
  24:	75 0a                	jne    30 <_Dmain+0x30>
  26:	31 c0                	xor    %eax,%eax
  28:	88 c1                	mov    %al,%cl
  2a:	88 4c 24 0f          	mov    %cl,0xf(%rsp)
  2e:	eb 29                	jmp    59 <_Dmain+0x59>
  30:	66 48 8d 3d 00 00 00 	data16 lea 0x0(%rip),%rdi        # 38 <_Dmain+0x38>
  37:	00
			34: R_X86_64_TLSGD	_D4test3bari-0x4
  38:	66 66 48 e8 00 00 00 	data16 data16 callq 40 <_Dmain+0x40>
  3f:	00
			3c: R_X86_64_PLT32	__tls_get_addr-0x4
  40:	8b 18                	mov    (%rax),%ebx
  42:	83 fb 02             	cmp    $0x2,%ebx
  45:	75 0a                	jne    51 <_Dmain+0x51>
  47:	31 c0                	xor    %eax,%eax
  49:	88 c1                	mov    %al,%cl
  4b:	88 4c 24 0f          	mov    %cl,0xf(%rsp)
  4f:	eb 08                	jmp    59 <_Dmain+0x59>
  51:	b0 01                	mov    $0x1,%al
  53:	88 44 24 0f          	mov    %al,0xf(%rsp)
  57:	eb 00                	jmp    59 <_Dmain+0x59>
  59:	8a 44 24 0f          	mov    0xf(%rsp),%al
  5d:	a8 01                	test   $0x1,%al
  5f:	75 02                	jne    63 <_Dmain+0x63>
  61:	eb 02                	jmp    65 <_Dmain+0x65>
  63:	eb 00                	jmp    65 <_Dmain+0x65>
  65:	31 c0                	xor    %eax,%eax
  67:	48 83 c4 20          	add    $0x20,%rsp
  6b:	5b                   	pop    %rbx
  6c:	c3                   	retq
---

>
> When I remove -inline, it compiles, but seems not to inline. I cannot tell from this small example, but with the large program, there is no speed gain.

I'd suggest inspecting the generated assembly in order to determine whether your function was inlined or not (see above using objdump for Linux).

>
> It also compiles with -inline when I remove the "if (bar==2)...". I guess, it's now really inlining, but the function is ridiculously short...

I don't know, but I'd guess that the length of a function is not as important for the consideration of being inlined as its semantics.
February 20, 2017
Moritz Maxeiner wrote:

> I don't know, but I'd guess that the length of a function is not as important for the consideration of being inlined as its semantics.
yep. basically, dmd doesn't like anything other than very simple if/else conditions. sometimes it likes

 if (cond0) return n0; else if (cond1) return n1; ...

more than the same code without else.

don't even try to inline loops. ;-)

anyway, in my real-life code inlining never worth the MASSIVELY increased compile times: speedup is never actually noticeable. if "dmd -O" doesn't satisfy your needs, there is usually no reason to trying "-inline", it is better to switch to ldc/gdc.
February 20, 2017
On Mon, Feb 20, 2017 at 05:16:15AM -0800, Jonathan M Davis via Digitalmars-d-learn wrote: [...]
> Regardless, if performance is your #1 concern, then I would suggest that you compile with ldc and not dmd.
[...]

+1.  If you are concerned about performance enough to worry whether the compiler will inline something, it's time to use gdc or ldc.  Dmd's inliner is rudimentary at best, and its optimizer, while serviceable, is not up to par with gdc or ldc's optimizers.  If you want top performance, use gdc / ldc.

IME gdc -O3 consistently produces code that runs about 20-30% faster than code produced by dmd -O (even with -inline).  Sometimes I've seen performance gains of up to 40-50%. This is especially likely when your code consists of deep call trees involving small(ish) functions: I've looked at the assembly output before and it seems that dmd's inliner just gives up too easily, thus missing the opportunities for further reductions and further inlining.  Even after discounting the inliner, though, I find that gdc is simply better at loop optimization than dmd, such as hoisting, strength reduction, unrolling, etc..  So if your code involves complex loops, expect gdc -O3 to produce better code than dmd.

Well, "better" may be debatable, but certainly gdc is far more aggressive at optimizing loops (and optimizing in general) than dmd, and I find in the cases I've looked at that aggressive optimization often leads to further optimization opportunities, whereas if the optimizer is too conservative, opportunities are missed that may lead to other opportunities, so the resulting code can end up being vastly different in performance.

Having said all that, though, have you used a profiler to determine whether or not your performance bottleneck is really at the function in question?  I find that 90% of the time what I truly believe should be inlined actually doesn't make much difference; the bottleneck is usually somewhere else that I didn't expect.  I used to spend lots of time trying to hyper-optimize everything, only to discover later that 90% of my efforts have been wasted on gaining a meager 1% of performance, whereas if I had just used a profiler in the first place, I would have gotten a 50% performance improvement with only 10% of the effort.


T

-- 
Tech-savvy: euphemism for nerdy.
« First   ‹ Prev
1 2