Jump to page: 1 2
Thread overview
[Issue 6658] New: Slow short array equality
Apr 01, 2013
Vladimir Panteleev
Apr 01, 2013
Vladimir Panteleev
Apr 02, 2013
Iain Buclaw
Apr 02, 2013
Vladimir Panteleev
Apr 02, 2013
Vladimir Panteleev
Apr 02, 2013
Vladimir Panteleev
Apr 02, 2013
Iain Buclaw
Apr 02, 2013
Vladimir Panteleev
Apr 02, 2013
Iain Buclaw
Apr 02, 2013
Vladimir Panteleev
September 12, 2011
http://d.puremagic.com/issues/show_bug.cgi?id=6658

           Summary: Slow short array equality
           Product: D
           Version: D2
          Platform: x86
        OS/Version: Windows
            Status: NEW
          Keywords: performance
          Severity: enhancement
          Priority: P2
         Component: DMD
        AssignedTo: nobody@puremagic.com
        ReportedBy: bearophile_hugs@eml.cc


--- Comment #0 from bearophile_hugs@eml.cc 2011-09-12 16:20:16 PDT ---
Equality comparisons between very short arrays is quite slow in D/DMD. When the arrays are both fixed-sized and short, I suggest to inline item comparisons instead of calling a runtime function, to speed up the code significantly.

A benchmark:


bool eq1(T, size_t N)(const ref T[N] a, const ref T[N] b) pure nothrow {
    return a == b;
}

bool eq2(T, size_t N)(const ref T[N] a, const ref T[N] b) pure nothrow {
    foreach (size_t i; 0 .. a.length)
        if (a[i] != b[i])
            return false;
    return true;
}

void main() {
    import std.stdio, std.random;
    enum size_t N = 3;
    enum size_t M = 100;
    rndGen.seed(1);

    int[N][M] data;
    foreach (ref d; data)
        foreach (ref x; d)
            x = uniform(0, 4);

    int total;
    foreach (i; 0 .. 20_000)
        foreach (j; 0 .. M)
            foreach (k; 0 .. M)
                static if (1)
                    total += eq1(data[j], data[k]); // 11.5 seconds
                else
                    total += eq2(data[j], data[k]); // 2.45 seconds

    writeln(total);
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 31, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6658



--- Comment #1 from bearophile_hugs@eml.cc 2013-03-31 03:29:16 PDT ---
After closing Issue 9477  the situation is changed a little.

On the same PC with DMD 2.063alpha the run-time with eq2() is 2.04 seconds and
the run-time with eq1() is 6.37 seconds (3.12X).

The asm produced by dmd (-O -release -inline -noboundscheck) for the two
versions:


eq1:
        push EAX
        mov ECX, 0Ch
        push ESI
        mov ESI, 0Ch[ESP]
        push EDI
        mov EDI, EAX
        xor EAX, EAX
        rep
        cmpsb
        je  L19
        sbb EAX, EAX
        sbb EAX, 0FFFFFFFFh
L19:    neg EAX
        sbb EAX, EAX
        inc EAX
        pop EDI
        pop ESI
        pop ECX
        ret 4


eq2:
        push EBX
        mov EDX, EAX
        mov ECX, 8[ESP]
        xor EBX, EBX
L9:     mov EAX, [EBX*4][ECX]
        cmp EAX, [EBX*4][EDX]
        je  L17
        pop EBX
        xor EAX, EAX
        ret 4
L17:    inc EBX
        cmp EBX, 3
        jb  L9
        pop EBX
        mov EAX, 1
        ret 4


Is cmpsb efficient on modern CPUs? Maybe the answer is negative: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 01, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6658


Vladimir Panteleev <thecybershadow@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |thecybershadow@gmail.com


--- Comment #2 from Vladimir Panteleev <thecybershadow@gmail.com> 2013-04-02 00:56:15 EEST ---
Can we close this issue as a duplicate of issue 5282?

> Is cmpsb efficient on modern CPUs?

As the linked discussion mentions, it's not as efficient for short, fixed-size data blocks. Backends other than DMD might take advantage of a known array length, but DMD does not.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 01, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6658



--- Comment #3 from bearophile_hugs@eml.cc 2013-04-01 16:07:15 PDT ---
(In reply to comment #2)
> Can we close this issue as a duplicate of issue 5282?
> 
> > Is cmpsb efficient on modern CPUs?
> 
> As the linked discussion mentions, it's not as efficient for short, fixed-size data blocks. Backends other than DMD might take advantage of a known array length, but DMD does not.

Do you know why DMD can't be smarter here? The compiler knows the types and lengths, so it knows when one of such situations happens. At worst calling a template function defined in the object module is able to solve the situation.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 01, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6658



--- Comment #4 from Vladimir Panteleev <thecybershadow@gmail.com> 2013-04-02 02:10:35 EEST ---
It can, it's just that no one has implemented such an optimization yet. Keep in mind that the patch would be against DMDBE (assuming the strategy would be to optimize the memcmp intrinsic), so it would only improve the non-free, non-redistributable part of one D compiler. D users who want their programs to be fast should use GDC or LDC. In contrast, the patch to issue 9477 was in the frontend, so it benefits all of GDC, LDC and DMD.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 02, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6658


Iain Buclaw <ibuclaw@ubuntu.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ibuclaw@ubuntu.com


--- Comment #5 from Iain Buclaw <ibuclaw@ubuntu.com> 2013-04-02 00:32:11 PDT ---
(In reply to comment #4)
> It can, it's just that no one has implemented such an optimization yet. Keep in mind that the patch would be against DMDBE (assuming the strategy would be to optimize the memcmp intrinsic), so it would only improve the non-free, non-redistributable part of one D compiler. D users who want their programs to be fast should use GDC or LDC. In contrast, the patch to issue 9477 was in the frontend, so it benefits all of GDC, LDC and DMD.

I'd just like to say that statement is false.  It wasn't a patch in the frontend, and no one but DMD benefits.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 02, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6658



--- Comment #6 from Vladimir Panteleev <thecybershadow@gmail.com> 2013-04-02 10:35:39 EEST ---
(In reply to comment #5)
> I'd just like to say that statement is false.  It wasn't a patch in the frontend, and no one but DMD benefits.

Really??

We're talking about this, right? https://github.com/D-Programming-Language/dmd/commit/c984175cf25dfa17b3956e8e33ff83547fa56b0a

I thought GDC/LDC simply provided their own implementation of el_* and the elem type.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 02, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6658



--- Comment #7 from Vladimir Panteleev <thecybershadow@gmail.com> 2013-04-02 10:42:00 EEST ---
I don't understand why GDC and LDC did not use the existing code in e2ir. This would imply that every implementation would have to reimplement exactly the same things as DMD (emit calls to Druntime, for example).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 02, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6658



--- Comment #8 from Vladimir Panteleev <thecybershadow@gmail.com> 2013-04-02 10:55:41 EEST ---
DMD: https://github.com/D-Programming-Language/dmd/blob/v2.062/src/e2ir.c#L2431

GDC: https://github.com/D-Programming-GDC/GDC/blob/master/gcc/d/d-elem.cc#L100

LDC: https://github.com/ldc-developers/ldc/blob/master/gen/arrays.cpp#L812

It's all reimplementations of the same thing. I guess there was a reason for why the layer of abstraction was chosen to be this high, but in this situation it's not doing anyone any favors.

Well, I guess that now that DMD is faster than both GDC and LDC at something, the burden is on those compilers' maintainers to catch up with it :)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
April 02, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6658



--- Comment #9 from Iain Buclaw <ibuclaw@ubuntu.com> 2013-04-02 02:45:00 PDT ---
(In reply to comment #8)
> DMD: https://github.com/D-Programming-Language/dmd/blob/v2.062/src/e2ir.c#L2431
> 
> GDC: https://github.com/D-Programming-GDC/GDC/blob/master/gcc/d/d-elem.cc#L100
> 
> LDC: https://github.com/ldc-developers/ldc/blob/master/gen/arrays.cpp#L812
> 
> It's all reimplementations of the same thing. I guess there was a reason for why the layer of abstraction was chosen to be this high, but in this situation it's not doing anyone any favors.
> 

When David first released gdc, the (In reply to comment #6)
> (In reply to comment #5)
> > I'd just like to say that statement is false.  It wasn't a patch in the frontend, and no one but DMD benefits.
> 
> Really??
> 
> We're talking about this, right? https://github.com/D-Programming-Language/dmd/commit/c984175cf25dfa17b3956e8e33ff83547fa56b0a
> 
> I thought GDC/LDC simply provided their own implementation of el_* and the elem type.

When gdc was first released, it provided it's own implementation of dt* and the dt_t type.  This is now what I widely regard of as a bad move.  The DMD implementation of toobj.c; todt.c; typinf.c; has been getting further from GDC's copy of that part of the frontend, thus has become a pain to maintain upon each frontend update.  Frankly I can never see it working trying to hack and wrap dmd backend calls into producing something that another backend see's as correct.  So I'll be yanking those three files out of dfrontend/ sometime around this fall too.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2