Thread overview
[Issue 4244] AA insert from fixed-sized array much slower than from equivalent struct
[Issue 4244] Built-in static array hash function is 8-9 times slower than hash function of a POD struct with equivalent size
May 16, 2019
Seb
June 30, 2015
https://issues.dlang.org/show_bug.cgi?id=4244

hsteoh@quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hsteoh@quickfur.ath.cx

--- Comment #1 from hsteoh@quickfur.ath.cx ---
Confirmed that this performance issue still happens on git HEAD
(1ceb899e9ee430bcd223ddeeb907245ea5be3d47). Tested on Linux/64bit, AMD Phenom
II X6 1055T (800 MHz).

The original code runs a little too fast on my system to have any measurable performance numbers (S/N ratio too low to be accurate), so I bumped N to 1800, and noted that the foo1() version takes on average about 1.4 to 1.5 seconds to complete, whereas the foo2() version takes only about 0.8 to 0.9 seconds on average. Both versions compiled with `dmd -O -release -inline`. (Surprisingly enough, not specifying -inline seems to produce faster code?! It's a small difference, though, so could just be background noise.)

Also surprisingly, gdc-5.1.1 produces much slower code than dmd in this case: the foo1() version takes on average about 3.9 seconds, whereas the foo2() version takes on average about 3.4 seconds. Both compiled with `gdc -O3 -finline -frelease`. This may possibly be caused by gdc using an older version of druntime, as recently the AA code in git HEAD has had some performance improvements.

Will take a look into the generated assembly code next.

--
June 30, 2015
https://issues.dlang.org/show_bug.cgi?id=4244

--- Comment #2 from hsteoh@quickfur.ath.cx ---
Looking at the generated assembly code, it seems that foo1 and foo2 are essentially identical. So it looks like the problem must come from the AA code, probably from the different treatments given to different typeinfo's for the two key types.

--
June 30, 2015
https://issues.dlang.org/show_bug.cgi?id=4244

--- Comment #3 from hsteoh@quickfur.ath.cx ---
The problem is caused by the hash computation of arrays (including static arrays). Here's the benchmark code I used:


------
void hashTest() {
    import std.conv : to;
    import std.datetime : Duration, benchmark;
    import std.stdio : writefln;

    static struct Pair { int x, y; }

    int[2] staticArray;
    Pair pair;

    auto result = benchmark!(() => typeid(staticArray).toHash(),
                             () => typeid(pair).toHash())(30_000_000);

    writefln("staticArrayHash: %s", to!Duration(result[0]));
    writefln("structHash: %s", to!Duration(result[1]));
}

void main() {
    hashTest();
}
------


Here are some trial runs showing the typical measurements:
------
$ ./test
staticArrayHash: 8 secs, 947 ms, 130 μs, and 7 hnsecs
structHash: 1 sec, 196 ms, 711 μs, and 1 hnsec
$ ./test
staticArrayHash: 9 secs, 130 ms, 423 μs, and 2 hnsecs
structHash: 1 sec, 196 ms, and 737 μs
$ ./test
staticArrayHash: 8 secs, 922 ms, 449 μs, and 9 hnsecs
structHash: 1 sec, 260 ms, 688 μs, and 8 hnsecs
$ ./test
staticArrayHash: 9 secs, 43 ms, 237 μs, and 6 hnsecs
structHash: 1 sec, 196 ms, and 983 μs
------

This shows that computing the hash of a static array is about 8-9 times slower than computing the hash of a struct of equivalent size.

Looking into the druntime code, it seems that a likely cause of the performance problem is the getArrayHash() function, which calls an inner function hasCustomHash() that in turn calls getElement(), which uses a loop to walk up the TypeInfo tree to ascertain whether the array elements have a custom hash function. I copied the druntime code into my test file, and modified hasCustomToHash() to just return false, and the benchmark shows that the array hash function is now approximately on par with the struct hash function.

So, it looks like the culprit is the call to getElement(). I'll see if I can cook up a PR to fix this.

--
June 30, 2015
https://issues.dlang.org/show_bug.cgi?id=4244

hsteoh@quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|dmd                         |druntime
            Summary|AA insert from fixed-sized  |Built-in static array hash
                   |array much slower than from |function is 8-9 times
                   |equivalent struct           |slower than hash function
                   |                            |of a POD struct with
                   |                            |equivalent size
                 OS|Windows                     |All

--
May 16, 2019
https://issues.dlang.org/show_bug.cgi?id=4244

Seb <greeenify@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |greeenify@gmail.com
         Resolution|---                         |FIXED

--- Comment #4 from Seb <greeenify@gmail.com> ---
They now use the same implementation and thus result in the same time:

https://run.dlang.io/is/hOpc5R

--