Jump to page: 1 2
Thread overview
[Issue 4440] New: inlined delegates produces different asm than straight lined code
Jul 09, 2010
Brad Roberts
Jul 09, 2010
Leandro Lucarella
Jul 11, 2010
Brad Roberts
Jul 11, 2010
Brad Roberts
Jul 11, 2010
Leandro Lucarella
Jul 11, 2010
Brad Roberts
Jul 11, 2010
Leandro Lucarella
Jul 11, 2010
Brad Roberts
[Issue 4440] [patch] Inlining delegate literals
Aug 29, 2010
David Simcha
Aug 29, 2010
David Simcha
Aug 29, 2010
Brad Roberts
July 09, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4440

           Summary: inlined delegates produces different asm than straight
                    lined code
           Product: D
           Version: D1 & D2
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: nobody@puremagic.com
        ReportedBy: braddr@puremagic.com


--- Comment #0 from Brad Roberts <braddr@puremagic.com> 2010-07-08 22:59:51 PDT ---
split from bug 859:

Leandro Lucarella      2010-06-27 18:49:49 PDT

I'm having some performance problems moving some stuff from a lower-level C-style to a higher-lever D-style. Here is an example:

---
int find_if(bool delegate(ref int) predicate)
{
        for (int i = 0; i < 100; i++)
                if (predicate(i))
                        return i;
        return -1;
}

int main()
{
//      for (int i = 0; i < 100; i++)
//              if (i == 99)
//                      return i;
//      return -1;
        return find_if((ref int i) { return i == 99; });
}
---

The program produced by this source executes 4 times more instructions than the more direct (lower-level) version commented out. I would expect DMD to inline all functions/delegates and produce the same asm for both, but that's not the case.

This is a reduced test-case, but I'm working on improving the GC and I'm really hitting this problem. If I use this higher-level style in the GC, a Dil run for generating the Tango docs is 3.33 times slower than the C-ish style used by the current GC.

So I think this is a real problem for D, it's really important to be able to encourage people to use the higher-level D constructs.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 09, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4440


Leandro Lucarella <llucax@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |llucax@gmail.com


--- Comment #1 from Leandro Lucarella <llucax@gmail.com> 2010-07-09 08:38:57 PDT ---
I'm not very keen at reading asm, but I'm seeing a couple of 'call's in the generated code:

---
$ dmd -release -O -gc -inline -c test.d
$ objdump -d test.o

test.o:     file format elf32-i386


Disassembly of section .text:

00000000 <.text>:
   0:    c3                       ret
   1:    60                       pusha
   2:    b8 38 00 00 00           mov    $0x38,%eax
   7:    b9 00 00 00 00           mov    $0x0,%ecx
   c:    8b 11                    mov    (%ecx),%edx
   e:    89 10                    mov    %edx,(%eax)
  10:    89 01                    mov    %eax,(%ecx)
  12:    61                       popa
  13:    c3                       ret

Disassembly of section .text._D4test7find_ifFDFKiZbZi:

00000000 <_D4test7find_ifFDFKiZbZi>:
   0:    55                       push   %ebp
   1:    8b ec                    mov    %esp,%ebp
   3:    83 ec 0c                 sub    $0xc,%esp
   6:    53                       push   %ebx
   7:    8b 55 0c                 mov    0xc(%ebp),%edx
   a:    8b 45 08                 mov    0x8(%ebp),%eax
   d:    c7 45 fc 00 00 00 00     movl   $0x0,-0x4(%ebp)
  14:    89 d3                    mov    %edx,%ebx
  16:    8d 4d fc                 lea    -0x4(%ebp),%ecx
  19:    8b 45 08                 mov    0x8(%ebp),%eax
  1c:    51                       push   %ecx
  1d:    ff d3                    call   *%ebx
  1f:    84 c0                    test   %al,%al
  21:    74 0a                    je     2d <_D4test7find_ifFDFKiZbZi+0x2d>
  23:    8b 45 fc                 mov    -0x4(%ebp),%eax
  26:    5b                       pop    %ebx
  27:    8b e5                    mov    %ebp,%esp
  29:    5d                       pop    %ebp
  2a:    c2 08 00                 ret    $0x8
  2d:    ff 45 fc                 incl   -0x4(%ebp)
  30:    83 7d fc 64              cmpl   $0x64,-0x4(%ebp)
  34:    7c e0                    jl     16 <_D4test7find_ifFDFKiZbZi+0x16>
  36:    5b                       pop    %ebx
  37:    8b e5                    mov    %ebp,%esp
  39:    b8 ff ff ff ff           mov    $0xffffffff,%eax
  3e:    5d                       pop    %ebp
  3f:    c2 08 00                 ret    $0x8
  42:    90                       nop
  43:    90                       nop

Disassembly of section .text._Dmain:

00000000 <_Dmain>:
   0:    55                       push   %ebp
   1:    8b ec                    mov    %esp,%ebp
   3:    b9 00 00 00 00           mov    $0x0,%ecx
   8:    51                       push   %ecx
   9:    31 c0                    xor    %eax,%eax
   b:    50                       push   %eax
   c:    e8 fc ff ff ff           call   d <_Dmain+0xd>
  11:    5d                       pop    %ebp
  12:    c3                       ret
  13:    90                       nop

Disassembly of section .text._D4test4mainFZi12__dgliteral1MFKiZb:

00000000 <_D4test4mainFZi12__dgliteral1MFKiZb>:
   0:    55                       push   %ebp
   1:    8b ec                    mov    %esp,%ebp
   3:    50                       push   %eax
   4:    8b 4d 08                 mov    0x8(%ebp),%ecx
   7:    b8 01 00 00 00           mov    $0x1,%eax
   c:    83 39 63                 cmpl   $0x63,(%ecx)
   f:    74 02                    je     13
<_D4test4mainFZi12__dgliteral1MFKiZb+0x13>
  11:    31 c0                    xor    %eax,%eax
  13:    8b e5                    mov    %ebp,%esp
  15:    5d                       pop    %ebp
  16:    c2 04 00                 ret    $0x4
---

Am I understanding it all wrong?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 11, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4440


Brad Roberts <braddr@puremagic.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
         AssignedTo|nobody@puremagic.com        |braddr@puremagic.com


--- Comment #2 from Brad Roberts <braddr@puremagic.com> 2010-07-11 10:13:14 PDT ---
Reducing the test case further:
extern(C) int printf(const char*, ...);

bool if_delegate(int i, bool delegate(ref int) predicate)
{
    return predicate(i);
}

bool if_nodelegate(int i)
{
    return i == 99;
}

void main()
{
    foreach(i; 1 .. 100)
        if (if_delegate(i, (ref int j) { return j == 99; }))
            printf("i = %d\n", i);

    foreach(i; 1 .. 100)
        if (if_nodelegate(i))
            printf("i = %d\n", i);
}

In the previous version of the code, the use of the for loops makes the cost of the functions too high for the inliner to bother inlining find_if into main. Also,in the non-inlined version of the function, there's no way for it to inline the delegate since it's a variable, not a constant thing.

This changed version reduces the problem to: will it inline the delegate?  The answer is still no, but allows that specific problem to be explored.

Leandro, does this stray too far away from the underlying code that led you to file this bug?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 11, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4440



--- Comment #3 from Brad Roberts <braddr@puremagic.com> 2010-07-11 11:19:18 PDT ---
Ok.. still doesn't inline the delegate in this version:

extern(C) int printf(const char*, ...);

void main()
{
    auto d = (ref int j) { return j == 99; };

    foreach(i; 1 .. 100)
        if (d(i))
            printf("i = %d\n", i);
}

There's nothing in the dmd frontend inliner that even tries to deal with this sort of inlining right now.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 11, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4440



--- Comment #4 from Leandro Lucarella <llucax@gmail.com> 2010-07-11 11:27:57 PDT ---
(In reply to comment #2)
> Reducing the test case further:
> extern(C) int printf(const char*, ...);
> 
> bool if_delegate(int i, bool delegate(ref int) predicate)
> {
>     return predicate(i);
> }
> 
> bool if_nodelegate(int i)
> {
>     return i == 99;
> }
> 
> void main()
> {
>     foreach(i; 1 .. 100)
>         if (if_delegate(i, (ref int j) { return j == 99; }))
>             printf("i = %d\n", i);
> 
>     foreach(i; 1 .. 100)
>         if (if_nodelegate(i))
>             printf("i = %d\n", i);
> }
> 
> In the previous version of the code, the use of the for loops makes the cost of the functions too high for the inliner to bother inlining find_if into main. Also,in the non-inlined version of the function, there's no way for it to inline the delegate since it's a variable, not a constant thing.
>
> This changed version reduces the problem to: will it inline the delegate?  The answer is still no, but allows that specific problem to be explored.
> 
> Leandro, does this stray too far away from the underlying code that led you to file this bug?


Yes, the loop is important, because this happens a lot with opApply(), which
almost always includes a loop, and when I use (x having an opApply() method):
foreach (a; x) {
   // some code
}
I almost always want // some code to be inlined, since it's not even a function
(from the POV of the programmer), but maybe that should be another bug report,
I really don't know.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 11, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4440



--- Comment #5 from Brad Roberts <braddr@puremagic.com> 2010-07-11 11:31:38 PDT ---
Ok.  I'll keep that in mind.  One step at a time for now though; need to get _any_ delegate inlining to happen before worrying about popping up the layers up to an opApply function.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 11, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4440



--- Comment #6 from Leandro Lucarella <llucax@gmail.com> 2010-07-11 13:22:56 PDT ---
(In reply to comment #5)
> Ok.  I'll keep that in mind.  One step at a time for now though; need to get _any_ delegate inlining to happen before worrying about popping up the layers up to an opApply function.

Agreed, thanks for giving it some time to this :)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 11, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4440



--- Comment #7 from Brad Roberts <braddr@puremagic.com> 2010-07-11 14:39:39 PDT ---
Created an attachment (id=690)
add support for inlining function literals

This patch adds support for inlining function literals.  This won't cover many cases as most hide behind a variable, but it's a building block in the right direction:

The test case for this patch.. very contrived:

extern(C) int printf(const char*, ...);

void main()
{
    int i = 99;
    if ((delegate(ref int j) { return j == 99; })(i))
        printf("i = %d\n", i);
}

same result if the keyword 'delegate' is omitted.

It passes an abbreviated (only used ARGS="-O -inline -release") run of the test
suite.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 29, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4440


David Simcha <dsimcha@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |4264
            Summary|[patch] inlined delegates   |[patch] Inlining delegate
                   |produces different asm than |literals
                   |straight lined code         |


--- Comment #8 from David Simcha <dsimcha@yahoo.com> 2010-08-29 13:55:31 PDT ---
Blocks 4264 because I need delegate literal inlining to get decent performance out of opApply-based map, filter, etc.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 29, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4440



--- Comment #9 from David Simcha <dsimcha@yahoo.com> 2010-08-29 14:26:50 PDT ---
I just tried this patch out and on second thought I'm not sure it's worth integrating as-is because it's so limited.  It won't even handle the case where a function that calls a delegate gets inlined and the delegate passed from the call site is a literal.  Even the following doesn't get inlined:

bool evaluateDelegate(bool delegate() dg) {
    return dg();
}

void main() {
    evaluateDelegate({return false;});
}


Here's the disassembly of main():

__Dmain PROC NEAR
;  COMDEF __Dmain
        sub     esp, 24                                 ; 0000 _ 83. EC, 18
        xor     eax, eax                                ; 0003 _ 31. C0
        mov     ecx, offset _D5test94mainFZv12__dgliteral1MFZb; 0005 _ B9,
00000000(segrel)
        push    ebx                                     ; 000A _ 53
        mov     edx, ecx                                ; 000B _ 8B. D1
        mov     dword ptr [esp+4H], eax                 ; 000D _ 89. 44 24, 04
        mov     eax, dword ptr [esp+4H]                 ; 0011 _ 8B. 44 24, 04
        mov     ebx, dword ptr [esp+4H]                 ; 0015 _ 8B. 5C 24, 04
        mov     dword ptr [esp+8H], ecx                 ; 0019 _ 89. 4C 24, 08
        call    edx                                     ; 001D _ FF. D2
        xor     eax, eax                                ; 001F _ 31. C0
        pop     ebx                                     ; 0021 _ 5B
        add     esp, 24                                 ; 0022 _ 83. C4, 18
        ret                                             ; 0025 _ C3
__Dmain ENDP

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2