Thread overview
[Issue 9920] New: [Optimizer] Use mul/imul for integer division by constant
Apr 11, 2013
Dmitry Olshansky
Apr 11, 2013
Dmitry Olshansky
Apr 11, 2013
Dmitry Olshansky
Apr 11, 2013
Dmitry Olshansky
Apr 14, 2013
Walter Bright
Apr 16, 2013
yebblies
Apr 16, 2013
Walter Bright
April 11, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9920

           Summary: [Optimizer] Use mul/imul for integer division by
                    constant
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: DMD
        AssignedTo: nobody@puremagic.com
        ReportedBy: dmitry.olsh@gmail.com


--- Comment #0 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2013-04-11 08:44:06 PDT ---
Created an attachment (id=1208)
fast divison by constant for uint values

It's a common knowledge and is totally expected for any modern compiler to re-write divisions by constant to double word width _multiplication_ by a specific constant followed by shift.

DMD still DOESN'T DO it.

Attached is an example of this optimization done via meta-programming in D along with test and a benchmark. On average (on my PC) the mul version is around 3 times faster then compiler-generated div. On LDC and GDC results are that compiler-generate version is a bit faster.

Compiler can easily do a better job especially with 64bit values (as 2x64 accumulator is completely unaccessible for the programmer).

For full description of one such technique see Agner Fog's manuals on assembly optimizations: http://www.agner.org/optimize/optimizing_assembly.pdf

See the chapter 16. "Problematic instructions", section on DIV/IDIV is 16.9.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 11, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9920


bearophile_hugs@eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs@eml.cc


--- Comment #1 from bearophile_hugs@eml.cc 2013-04-11 11:00:21 PDT ---
Dupe of 5607 ?

(Even if it's a dupe you have good code in attach)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 11, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9920



--- Comment #2 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2013-04-11 11:16:38 PDT ---
(In reply to comment #1)
> Dupe of 5607 ?

5607 is a special case of this enhancement.
This one is generalization since any division by constant can be optimized.
Using small divisors only allows some more specialized varations of this
optimization (less accurate but enough for numbers in question).

> 
> (Even if it's a dupe you have good code in attach)

I'd say this one supersedes 5607, so how about merge your example here and close 5607 ? (the 2 have to be merged anyway since the root cause is the same)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 11, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9920



--- Comment #3 from bearophile_hugs@eml.cc 2013-04-11 11:33:55 PDT ---
(In reply to comment #2)

> any division by constant can be optimized.

I didn't know this.


> I'd say this one supersedes 5607, so how about merge your example here and close 5607 ? (the 2 have to be merged anyway since the root cause is the same)

If you think this supersedes 5607, then copy what you think should be copied, and close it.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 11, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9920



--- Comment #4 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2013-04-11 12:28:08 PDT ---
*** Issue 5607 has been marked as a duplicate of this issue. ***

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 11, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9920



--- Comment #5 from Dmitry Olshansky <dmitry.olsh@gmail.com> 2013-04-11 12:33:09 PDT ---
Another data point taken from issue 5607 on div/mod heavy program seems to confirm my estimate of 3x+ speed difference with mul-trick applied.

The program is bound on div/mul operaiton so other differences of GCC vs DMC codegen has little effect.

---------------------------------

Timings, n = 100_000_000:
  GCC:  3.13 s
  DMD: 11.69 s


dmd -O -release -inline testd.d
gcc -O3 -s testc.c -o testc.exe

32 bit
GCC 4.5.2
DMD 2.052beta

---------------------------------

// D2 code
import std.c.stdio: printf;

int main() {
    uint total = 0;
    uint i;
    for (i = 0; i < 100000000; i++) {
        uint n = i;
        uint res = n % 10;
        n /= 10;
        while (n != 0) {
            res = (n % 10) + (10 * res);
            n /= 10;
        }
        total += res;
    }
    printf("%u\n", total); // 461784529
    return 0;
}

---------------------------------

// C code
#include "stdio.h"
typedef unsigned long uint;

int main() {
    uint total = 0;
    uint i;
    for (i = 0; i < 100000000; i++) {
        uint n = i;
        uint res = n % 10;
        n /= 10;
        while (n != 0) {
            res = (n % 10) + (10 * res);
            n /= 10;
        }
        total += res;
    }
    printf("%u\n", total); // 461784529
    return 0;
}

---------------------------------

DMD inner loop:

L1C:    lea    EBX,[EBX*4][EBX]
        mov    EAX,ESI
        mov    ECX,0Ah
        xor    EDX,EDX
        add    EBX,EBX
        div    ECX
        add    EBX,EDX
        test EAX,EAX
        mov    ESI,EAX
        jne    L1C

---------------------------------

GCC inner loop:

L7:
    movl    %ecx, %eax
    mull    %esi
    leal    (%ebx,%ebx,4), %ebx
    shrl    $3, %edx
    leal    (%edx,%edx,4), %eax
    addl    %eax, %eax
    subl    %eax, %ecx
    testl    %edx, %edx
    leal    (%ecx,%ebx,2), %ebx
    movl    %edx, %ecx
    jne    L7

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 14, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9920


Walter Bright <bugzilla@digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugzilla@digitalmars.com
            Version|D2                          |D1 & D2


--- Comment #6 from Walter Bright <bugzilla@digitalmars.com> 2013-04-14 01:19:01 PDT ---
https://github.com/D-Programming-Language/dmd/pull/1893

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 16, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9920


yebblies <yebblies@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |yebblies@gmail.com


--- Comment #7 from yebblies <yebblies@gmail.com> 2013-04-16 15:18:14 EST ---
D2 commit: https://github.com/D-Programming-Language/dmd/commit/ccb0c9b37d3340f63cf52148b540e282bcdc3ba4

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 16, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9920



--- Comment #8 from bearophile_hugs@eml.cc 2013-04-16 10:18:30 PDT ---
(In reply to comment #5)

> Timings, n = 100_000_000:
>   GCC:  3.13 s
>   DMD: 11.69 s

The new timings are acceptable:

GCC: 3.03 s
DMD: 3.74 s


dmd -O -release -noboundscheck -inline
gcc -Ofast -flto -s

gcc V.4.8


The asm of the D version below seems to contains a bit too many "mov":

---------------------------------

New DMD inner loop:

L28:    lea    EBX, [EBX*4][EBX]
        mov    EAX, ESI
        mov    EDX, 0CCCCCCCDh
        mul    EDX
        shr    EDX, 3
        mov    ECX, ESI
        imul EAX, EDX, 0Ah
        sub    ECX, EAX
        mov    EAX, EDX
        add    EBX, EBX
        mov    EDX, ECX
        add    EBX, EDX
        test EAX, EAX
        mov    ESI, EAX
        jne    L28

---------------------------------

GCC:

L5:
    movl    %edi, %eax
    mull    %esi
    movl    %edx, %ebx
    shrl    $3, %ebx
    leal    (%ebx,%ebx,4), %eax
    movl    %ebx, %ecx
    addl    %eax, %eax
    movl    %edi, %ebx
    subl    %eax, %ebx
    testl    %ecx, %ecx
    je    L2
    .p2align 4,,7
L4:
    movl    %ecx, %eax
    mull    %esi
    leal    (%ebx,%ebx,4), %ebx
    shrl    $3, %edx
    leal    (%edx,%edx,4), %eax
    addl    %eax, %eax
    subl    %eax, %ecx
    testl    %edx, %edx
    leal    (%ecx,%ebx,2), %ebx
    movl    %edx, %ecx
    jne    L4
    addl    $1, %edi
    addl    %ebx, 12(%esp)
    cmpl    $100000000, %edi
    jne    L5

---------------------------------

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 16, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9920


Walter Bright <bugzilla@digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------