Jump to page: 1 2
Thread overview
Optimize away immediately-called delegate literals?
Mar 11, 2012
Nick Sabalausky
Mar 11, 2012
H. S. Teoh
[OT] Hanlon's Razor (Was: Optimize away immediately-called delegate literals?)
Mar 11, 2012
Nick Sabalausky
Mar 12, 2012
H. S. Teoh
Mar 12, 2012
Peter Alexander
Mar 13, 2012
H. S. Teoh
Mar 13, 2012
Brad Roberts
Mar 13, 2012
Nick Sabalausky
Mar 13, 2012
Brad Roberts
Mar 13, 2012
Nick Sabalausky
March 11, 2012
Suppose you have a delegate literal and immediately call it:

auto a = x + (){ doStuff(); return y; }() + z;

Does DMD ever (or always?) optimize away a delegate if it's executed immediately and never stored into a variable? If not, can it, and would it be a simple change? Is something like this already on the table?


March 11, 2012
On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
> Suppose you have a delegate literal and immediately call it:
> 
> auto a = x + (){ doStuff(); return y; }() + z;
> 
> Does DMD ever (or always?) optimize away a delegate if it's executed immediately and never stored into a variable? If not, can it, and would it be a simple change? Is something like this already on the table?
[...]

I've always wondered about whether delegates passed to opApply ever get inlined. Seems to be pretty silly to do the entire function pointer + context pointer and full function call thing, if both opApply and the delegate are very simple. Especially if opApply is not much more than a for loop of some sort, perhaps with a line or two of private member access code inserted.


T

-- 
Never ascribe to malice that which is adequately explained by incompetence. -- Napoleon Bonaparte
March 11, 2012
"H. S. Teoh" <hsteoh@quickfur.ath.cx> wrote in message news:mailman.455.1331448575.4860.digitalmars-d@puremagic.com...
>
> Never ascribe to malice that which is adequately explained by incompetence. -- Napoleon Bonaparte

Pardon me veering offtopic at one of your taglines yet again, but I have to say, this is one that I've believed very strongly in for years. It's practically a mantra of mine, a big part of my own personal philosophy. I've never heard it attributed to Napoleon, though. I always knew it as "Hanlon's Razor", somewhat of a corollary to Occam's Razor.


March 12, 2012
On Sun, Mar 11, 2012 at 03:03:39AM -0400, Nick Sabalausky wrote:
> "H. S. Teoh" <hsteoh@quickfur.ath.cx> wrote in message news:mailman.455.1331448575.4860.digitalmars-d@puremagic.com...
> >
> > Never ascribe to malice that which is adequately explained by incompetence. -- Napoleon Bonaparte
> 
> Pardon me veering offtopic at one of your taglines yet again, but I have to say, this is one that I've believed very strongly in for years. It's practically a mantra of mine, a big part of my own personal philosophy. I've never heard it attributed to Napoleon, though. I always knew it as "Hanlon's Razor", somewhat of a corollary to Occam's Razor.
[...]

I guess my sources aren't that reliable. I just copy-n-pasted that quote from somebody else's sig. :-)


T

-- 
Some ideas are so stupid that only intellectuals could believe them. -- George Orwell
March 12, 2012
On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
> On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
>> Suppose you have a delegate literal and immediately call it:
>> 
>> auto a = x + (){ doStuff(); return y; }() + z;
>> 
>> Does DMD ever (or always?) optimize away a delegate if it's executed
>> immediately and never stored into a variable? If not, can it, and
>> would it be a simple change? Is something like this already on the
>> table?
> [...]
>
> I've always wondered about whether delegates passed to opApply ever get
> inlined.

Don't wonder. Find out!

import std.stdio;
void doStuff() { writeln("Howdy!"); }
void main() {
    int x = 1, y = 2, z = 3;
    auto a = x + (){ doStuff(); return y; }() + z;
    writeln(a);
}

$ dmd test.d -O -release -inline

__Dmain:
000000010000106c	pushq	%rbp
000000010000106d	movq	%rsp,%rbp
0000000100001070	pushq	%rax
0000000100001071	pushq	%rbx
0000000100001072	movq	$0x000000000000000c,%rdi
000000010000107c	callq	0x1000237f0	; symbol stub for: __d_allocmemory
0000000100001081	movq	%rax,%rbx
0000000100001084	movq	$0x00000000,(%rbx)
000000010000108b	movl	$0x00000002,0x08(%rbx)
0000000100001092	movq	%rbx,%rdi
0000000100001095	call	*0x0002318c(%rip)
000000010000109c	leal	0x04(%rax),%edx
000000010000109f	movl	$0x0000000a,%esi
00000001000010a4	leaq	0x00033eed(%rip),%rdi
00000001000010ab	callq	0x10002319c	; symbol stub for: _D3std5stdio4File14__T5writeTiTaZ5writeMFiaZv
00000001000010b0	xorl	%eax,%eax
00000001000010b2	popq	%rbx
00000001000010b3	movq	%rbp,%rsp
00000001000010b6	popq	%rbp
00000001000010b7	ret

In short. No! It doesn't currently inline in this case.

Even if the lambda just returns a constant, it doesn't get inlined.
March 13, 2012
On Tue, Mar 13, 2012 at 12:15:02AM +0100, Peter Alexander wrote:
> On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
> >On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
> >>Suppose you have a delegate literal and immediately call it:
> >>
> >>auto a = x + (){ doStuff(); return y; }() + z;
> >>
> >>Does DMD ever (or always?) optimize away a delegate if it's executed immediately and never stored into a variable? If not, can it, and would it be a simple change? Is something like this already on the table?
> >[...]
> >
> >I've always wondered about whether delegates passed to opApply
> >ever get
> >inlined.
> 
> Don't wonder. Find out!
> 
> import std.stdio;
> void doStuff() { writeln("Howdy!"); }
> void main() {
>     int x = 1, y = 2, z = 3;
>     auto a = x + (){ doStuff(); return y; }() + z;
>     writeln(a);
> }
> 
> $ dmd test.d -O -release -inline
> 
> __Dmain:
> 000000010000106c	pushq	%rbp
> 000000010000106d	movq	%rsp,%rbp
> 0000000100001070	pushq	%rax
> 0000000100001071	pushq	%rbx
> 0000000100001072	movq	$0x000000000000000c,%rdi
> 000000010000107c	callq	0x1000237f0	; symbol stub for:
> __d_allocmemory
> 0000000100001081	movq	%rax,%rbx
> 0000000100001084	movq	$0x00000000,(%rbx)
> 000000010000108b	movl	$0x00000002,0x08(%rbx)
> 0000000100001092	movq	%rbx,%rdi
> 0000000100001095	call	*0x0002318c(%rip)
> 000000010000109c	leal	0x04(%rax),%edx
> 000000010000109f	movl	$0x0000000a,%esi
> 00000001000010a4	leaq	0x00033eed(%rip),%rdi
> 00000001000010ab	callq	0x10002319c	; symbol stub for:
> _D3std5stdio4File14__T5writeTiTaZ5writeMFiaZv
> 00000001000010b0	xorl	%eax,%eax
> 00000001000010b2	popq	%rbx
> 00000001000010b3	movq	%rbp,%rsp
> 00000001000010b6	popq	%rbp
> 00000001000010b7	ret
> 
> In short. No! It doesn't currently inline in this case.
> 
> Even if the lambda just returns a constant, it doesn't get inlined.

Hmph.

I tried this code:

	import std.stdio;
	struct A {
		int[] data;
		int opApply(int delegate(ref int) dg) {
			foreach (d; data) {
				if (dg(d)) return 1;
			}
			return 0;
		}
	}
	void main() {
		A a;
		int n = 0;

		a.data = [1,2,3,4,5];
		foreach (d; a) {
			n++;
		}
	}

With both dmd and gdc, the delegate is never inlined. :-(  Compiling with gdc -O3 causes opApply to get inlined and loop-unrolled, but the call to the delegate is still there. With dmd -O, even opApply is not inlined, and the code is generally much longer per loop iteration than gdc -O3.

Here's the code generated by gdc for the foreach() loop in main():

  404839:	48 89 c3             	mov    %rax,%rbx
  40483c:	48 8b 04 24          	mov    (%rsp),%rax
  404840:	48 8d 74 24 3c       	lea    0x3c(%rsp),%rsi
  404845:	48 8d 7c 24 20       	lea    0x20(%rsp),%rdi
  40484a:	48 89 03             	mov    %rax,(%rbx)
  40484d:	48 8b 44 24 08       	mov    0x8(%rsp),%rax
  404852:	48 89 43 08          	mov    %rax,0x8(%rbx)
  404856:	8b 44 24 10          	mov    0x10(%rsp),%eax
  40485a:	89 43 10             	mov    %eax,0x10(%rbx)
  40485d:	8b 03                	mov    (%rbx),%eax
  40485f:	89 44 24 3c          	mov    %eax,0x3c(%rsp)
  404863:	ff d5                	callq  *%rbp
  404865:	85 c0                	test   %eax,%eax
  404867:	75 58                	jne    4048c1 <_Dmain+0xd1>
  404869:	8b 43 04             	mov    0x4(%rbx),%eax
  40486c:	48 8d 74 24 3c       	lea    0x3c(%rsp),%rsi
  404871:	48 8d 7c 24 20       	lea    0x20(%rsp),%rdi
  404876:	89 44 24 3c          	mov    %eax,0x3c(%rsp)
  40487a:	ff d5                	callq  *%rbp
  40487c:	85 c0                	test   %eax,%eax
  40487e:	75 41                	jne    4048c1 <_Dmain+0xd1>
  404880:	8b 43 08             	mov    0x8(%rbx),%eax
  404883:	48 8d 74 24 3c       	lea    0x3c(%rsp),%rsi
  404888:	48 8d 7c 24 20       	lea    0x20(%rsp),%rdi
  40488d:	89 44 24 3c          	mov    %eax,0x3c(%rsp)
  404891:	ff d5                	callq  *%rbp
  404893:	85 c0                	test   %eax,%eax
  404895:	75 2a                	jne    4048c1 <_Dmain+0xd1>
  404897:	8b 43 0c             	mov    0xc(%rbx),%eax
  40489a:	48 8d 74 24 3c       	lea    0x3c(%rsp),%rsi
  40489f:	48 8d 7c 24 20       	lea    0x20(%rsp),%rdi
  4048a4:	89 44 24 3c          	mov    %eax,0x3c(%rsp)
  4048a8:	ff d5                	callq  *%rbp
  4048aa:	85 c0                	test   %eax,%eax
  4048ac:	75 13                	jne    4048c1 <_Dmain+0xd1>
  4048ae:	8b 43 10             	mov    0x10(%rbx),%eax
  4048b1:	48 8d 74 24 3c       	lea    0x3c(%rsp),%rsi
  4048b6:	48 8d 7c 24 20       	lea    0x20(%rsp),%rdi
  4048bb:	89 44 24 3c          	mov    %eax,0x3c(%rsp)
  4048bf:	ff d5                	callq  *%rbp
  4048c1:	48 83 c4 48          	add    $0x48,%rsp
  4048c5:	31 c0                	xor    %eax,%eax
  4048c7:	5b                   	pop    %rbx
  4048c8:	5d                   	pop    %rbp
  4048c9:	c3                   	retq

Notice that each loop iteration is only 7 instructions, with array elements loaded directly via an offset.  The loop in opApply generated by dmd looks like this:

08049314 <_D4test1A7opApplyMFDFKiZiZi>:
 8049314:	55                   	push   %ebp
 8049315:	8b ec                	mov    %esp,%ebp
 8049317:	83 ec 18             	sub    $0x18,%esp
 804931a:	53                   	push   %ebx
 804931b:	56                   	push   %esi
 804931c:	89 45 f8             	mov    %eax,-0x8(%ebp)
 804931f:	83 7d f8 00          	cmpl   $0x0,-0x8(%ebp)
 8049323:	75 1f                	jne    8049344 <_D4test1A7opApplyMFDFKiZiZi+0x30>
 8049325:	6a 06                	push   $0x6
 8049327:	ff 35 d4 a3 05 08    	pushl  0x805a3d4
 804932d:	ff 35 d0 a3 05 08    	pushl  0x805a3d0
 8049333:	ff 35 ec a3 05 08    	pushl  0x805a3ec
 8049339:	ff 35 e8 a3 05 08    	pushl  0x805a3e8
 804933f:	e8 8c 04 00 00       	call   80497d0 <_d_assert_msg>
 8049344:	8b 45 f8             	mov    -0x8(%ebp),%eax
 8049347:	8b 50 04             	mov    0x4(%eax),%edx
 804934a:	8b 00                	mov    (%eax),%eax
 804934c:	89 45 e8             	mov    %eax,-0x18(%ebp)
 804934f:	89 55 ec             	mov    %edx,-0x14(%ebp)
 8049352:	c7 45 f0 00 00 00 00 	movl   $0x0,-0x10(%ebp)
 8049359:	8b 4d f0             	mov    -0x10(%ebp),%ecx
 804935c:	3b 4d e8             	cmp    -0x18(%ebp),%ecx
 804935f:	73 41                	jae    80493a2 <_D4test1A7opApplyMFDFKiZiZi+0x8e>
 8049361:	8b 5d f0             	mov    -0x10(%ebp),%ebx
 8049364:	3b 5d e8             	cmp    -0x18(%ebp),%ebx
 8049367:	72 0a                	jb     8049373 <_D4test1A7opApplyMFDFKiZiZi+0x5f>
 8049369:	b8 07 00 00 00       	mov    $0x7,%eax
 804936e:	e8 b9 00 00 00       	call   804942c <_D4test7__arrayZ>
 8049373:	8b 55 ec             	mov    -0x14(%ebp),%edx
 8049376:	8b 45 e8             	mov    -0x18(%ebp),%eax
 8049379:	8b 34 9a             	mov    (%edx,%ebx,4),%esi
 804937c:	89 75 f4             	mov    %esi,-0xc(%ebp)
 804937f:	8d 4d f4             	lea    -0xc(%ebp),%ecx
 8049382:	51                   	push   %ecx
 8049383:	8b 45 08             	mov    0x8(%ebp),%eax
 8049386:	8b 55 0c             	mov    0xc(%ebp),%edx
 8049389:	8b 5d 08             	mov    0x8(%ebp),%ebx
 804938c:	ff d2                	call   *%edx
 804938e:	85 c0                	test   %eax,%eax
 8049390:	74 0b                	je     804939d <_D4test1A7opApplyMFDFKiZiZi+0x89>
 8049392:	b8 01 00 00 00       	mov    $0x1,%eax
 8049397:	5e                   	pop    %esi
 8049398:	5b                   	pop    %ebx
 8049399:	c9                   	leave
 804939a:	c2 08 00             	ret    $0x8
 804939d:	ff 45 f0             	incl   -0x10(%ebp)
 80493a0:	eb b7                	jmp    8049359 <_D4test1A7opApplyMFDFKiZiZi+0x45>
 80493a2:	31 c0                	xor    %eax,%eax
 80493a4:	5e                   	pop    %esi
 80493a5:	5b                   	pop    %ebx
 80493a6:	c9                   	leave
 80493a7:	c2 08 00             	ret    $0x8
 80493aa:	90                   	nop
 80493ab:	90                   	nop

The loop body is significantly longer, about 22 instructions when the exit branch isn't taken, and includes a call to a bounds check routine per iteration.

I suppose the gdc case has been heavily optimized by gcc's optimizing backend. :-) Though I'm kinda disappointed that it didn't inline the trivial delegate. Or is that because of the way the front-end generates the AST?


T

-- 
Nothing in the world is more distasteful to a man than to take the path that leads to himself. -- Herman Hesse
March 13, 2012
On 3/12/2012 4:15 PM, Peter Alexander wrote:
> On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
>> On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
>>> Suppose you have a delegate literal and immediately call it:
>>>
>>> auto a = x + (){ doStuff(); return y; }() + z;
>>>
>>> Does DMD ever (or always?) optimize away a delegate if it's executed immediately and never stored into a variable? If not, can it, and would it be a simple change? Is something like this already on the table?
>> [...]
>>
>> I've always wondered about whether delegates passed to opApply ever get inlined.
> 
> Don't wonder. Find out!
> 
> import std.stdio;
> void doStuff() { writeln("Howdy!"); }
> void main() {
>     int x = 1, y = 2, z = 3;
>     auto a = x + (){ doStuff(); return y; }() + z;
>     writeln(a);
> }

See also: bug 4440

The patch in there, if it hasn't bit rotten to badly (I suspect it has) will handle _this_ case.  But almost no other case of inlining delegates.

It'd be a good area for someone who wants an interesting and non-trivial problem to dig into.  It wouldn't touch all that much of the codebase as the inliner is fairly self-contained.  At least, that's what I recall from when I looked at this stuff a couple years ago.

Later,
Brad
March 13, 2012
"Peter Alexander" <peter.alexander.au@gmail.com> wrote in message news:thetmhnnbeepmxgusntr@forum.dlang.org...
> On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
>> On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
>>> Suppose you have a delegate literal and immediately call it:
>>>
>>> auto a = x + (){ doStuff(); return y; }() + z;
>>>
>>> Does DMD ever (or always?) optimize away a delegate if it's executed immediately and never stored into a variable? If not, can it, and would it be a simple change? Is something like this already on the table?
>> [...]
>>
>> I've always wondered about whether delegates passed to opApply ever get inlined.
>
> Don't wonder. Find out!
>
> import std.stdio;
> void doStuff() { writeln("Howdy!"); }
> void main() {
>     int x = 1, y = 2, z = 3;
>     auto a = x + (){ doStuff(); return y; }() + z;
>     writeln(a);
> }
>
> $ dmd test.d -O -release -inline
>
> __Dmain:
> 000000010000106c pushq %rbp
[...]

I keep forgetting I can do that. ;)

>
> In short. No! It doesn't currently inline in this case.
>
> Even if the lambda just returns a constant, it doesn't get inlined.

Darn.

The reason this came up is I've gotten back into some more work on HaxeD (Haxe -> D converter). Haxe allows statements to be used as expressions. For example:

x = 5 + if(cond) { foo(); 1; } else 2;

At the moment, I'm converting those by just tossing them into an immediately-called anonymous delegate:

x = 5 + (){ ...blah... }();

But as I was afraid of, there's a performance hit with that. So eventually, I'll have to either do some fancier reworking:

// Something like
typeof(1) __tmp1;
if(cond) { foo(); __tmp1 = 1; } else __tmp1 = 2;
x = 5 + __tmp1;

...which might even run into some problems with order-of-execution. Or delegate inlining would have to get added to DMD.

Now that I actually look (heh :) ), it looks like there's an old Buzilla issue for it with an apperently overly-limited patch:

http://d.puremagic.com/issues/show_bug.cgi?id=4440

Unfortunately, according to Brad Roberts in the last comment: "Getting delegate inlining in dmd is going to take serious work." (Ugh, I wish Windows had an easy way to copy/paste text *without* the styling.)


March 13, 2012
"Brad Roberts" <braddr@puremagic.com> wrote in message news:mailman.582.1331607753.4860.digitalmars-d@puremagic.com...
> On 3/12/2012 4:15 PM, Peter Alexander wrote:
>> On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
>>> On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
>>>> Suppose you have a delegate literal and immediately call it:
>>>>
>>>> auto a = x + (){ doStuff(); return y; }() + z;
>>>>
>>>> Does DMD ever (or always?) optimize away a delegate if it's executed immediately and never stored into a variable? If not, can it, and would it be a simple change? Is something like this already on the table?
>>> [...]
>>>
>>> I've always wondered about whether delegates passed to opApply ever get inlined.
>>
>> Don't wonder. Find out!
>>
>> import std.stdio;
>> void doStuff() { writeln("Howdy!"); }
>> void main() {
>>     int x = 1, y = 2, z = 3;
>>     auto a = x + (){ doStuff(); return y; }() + z;
>>     writeln(a);
>> }
>
> See also: bug 4440
>
> The patch in there, if it hasn't bit rotten to badly (I suspect it has)
> will handle _this_ case.  But almost no other
> case of inlining delegates.
>
> It'd be a good area for someone who wants an interesting and non-trivial
> problem to dig into.  It wouldn't touch all
> that much of the codebase as the inliner is fairly self-contained.  At
> least, that's what I recall from when I looked at
> this stuff a couple years ago.
>

Do you think that patch would be a good starting place for further work, or would a proper solution likely necessitate an entirely different approach?


March 13, 2012
On 3/12/2012 8:10 PM, Nick Sabalausky wrote:
> "Brad Roberts" <braddr@puremagic.com> wrote:
>>
>> See also: bug 4440
>>
>> The patch in there, if it hasn't bit rotten to badly (I suspect it has)
>> will handle _this_ case.  But almost no other
>> case of inlining delegates.
>>
>> It'd be a good area for someone who wants an interesting and non-trivial
>> problem to dig into.  It wouldn't touch all
>> that much of the codebase as the inliner is fairly self-contained.  At
>> least, that's what I recall from when I looked at
>> this stuff a couple years ago.
>>
> 
> Do you think that patch would be a good starting place for further work, or would a proper solution likely necessitate an entirely different approach?
> 

I suspect that, to fix it for a broader set of cases, it'll need much larger changes.  During those changes the simple version in that patch would likely disappear.  But take that all with a huge grain of salt.
« First   ‹ Prev
1 2