Thread overview
A hash table implementation benchmark
Oct 01, 2014
Ali Çehreli
Oct 01, 2014
Marco Leise
Oct 01, 2014
Marco Leise
Oct 01, 2014
bearophile
Oct 01, 2014
Ali Çehreli
Oct 01, 2014
bearophile
Oct 02, 2014
thedeemon
Oct 02, 2014
Ali Çehreli
October 01, 2014
Found on Reddit:


http://lonewolfer.wordpress.com/2014/03/13/benchmarking-hash-table-implementations-in-different-languages/

Are you motivated enough to compare D's associative arrays with those results? :)

Ali
October 01, 2014
Am Wed, 01 Oct 2014 14:40:01 -0700
schrieb Ali Çehreli <acehreli@yahoo.com>:

> Found on Reddit:
> 
> 
> http://lonewolfer.wordpress.com/2014/03/13/benchmarking-hash-table-implementations-in-different-languages/
> 
> Are you motivated enough to compare D's associative arrays with those results? :)
> 
> Ali

The question is ... do you dare with the current state of the GC :D

-- 
Marco

October 01, 2014
Oh waaaait! It is a read-only benchmark.
October 01, 2014
Ali Çehreli:

> Found on Reddit:

Where's the Reddit thread?


> Are you motivated enough to compare D's associative arrays with those results? :)

D associative arrays are often even slower than CPython ones, so I don't expect D to shine in this comparison.

This is a D port of the Java code, but before using it I suggest you to review it well line by line against the other implementations, because I have not tested it.


import std.stdio, std.conv, std.random, std.datetime;

ulong lookup(in uint[uint] m, in uint[] b) @safe {
    ulong tot = 0;

    StopWatch sw;
    sw.start;
    foreach (immutable bi; b) {
        const ptr = bi in m;
        if (ptr != null)
            tot += *ptr;
    }
    sw.stop;

    return sw.peek.msecs;
}

void randomizeInput(uint[] a, uint[] b, in double p, ref Xorshift rng) @safe {
    foreach (ref ai; a)
        ai = uniform!"[]"(0, uint.max, rng);

    foreach (ref bi; b)
        bi = (uniform01(rng) <= p) ?
             a[uniform(0, $, rng)] :
             uniform!"[]"(0, uint.max, rng);
}

int main(in string[] args) {
    if (args.length != 4) {
        writeln("Usage: benchmark <size> <requests> <measurements> <hit probability>");
        return 1;
    }

    immutable n = args[1].to!uint;
    immutable r = args[2].to!uint;
    immutable k = args[3].to!uint;
    immutable p = args[4].to!double;

    auto rng = Xorshift(0);
    auto a = new uint[n];
    auto b = new uint[r];
    ulong t = 0;

    foreach (immutable _; 0 .. k) {
        uint[uint] m;
        randomizeInput(a, b, p, rng);
        foreach (immutable i, immutable ai; a)
            m[ai] = i;

        t += lookup(m, b);
        m.clear; // Deprecated?
    }

    writefln("%.2f MOPS\n", double(r) * k / t);
    return 0;
}


Bye,
bearophile
October 01, 2014
On 10/01/2014 03:32 PM, bearophile wrote:

> Ali Çehreli:
>
>> Found on Reddit:
>
> Where's the Reddit thread?

There was just a single comment on it so I didn't think it was important:


http://www.reddit.com/r/programming/comments/2hzur4/benchmarking_hash_table_implementations_in/

> D associative arrays are often even slower than CPython ones, so I don't
> expect D to shine in this comparison.

Never mind then. I remember posts by the user named 'spir' who used to be interested in hash table implementations. I recall that he was impressed by the speed of D's associative arrays. I may be wrong.

Ali

P.S. It is a pity that spir seems to have disappeared from online life. (?)

October 01, 2014
Ali Çehreli:

> Never mind then.

Well, now the D code is present, so why don't you benchmark it (but I don't know how much correct it is)? :-)

Bye,
bearophile
October 02, 2014
On Wednesday, 1 October 2014 at 21:40:01 UTC, Ali Çehreli wrote:
>
> Are you motivated enough to compare D's associative arrays with those results? :)
>

Here's another benchmark:
D AAs vs. Vibe.d's open addressing hashes vs. Robin Hood hashing:
http://www.infognition.com/blog/2014/on_robin_hood_hashing.html
There's a link to a table with timings, meaning of which described in the post.

Also compared a bit with C++'s CAtlMap<int, int> in MSVC 10 which I was told was pretty good. Generated 10 million random ints from range of 1 million, made a histogram and then read it. Exactly same data for both implementations (CAtlMap and Robin Hood in D).  With default settings CAtlMap makes histo in 2.19 sec, reads it in 1.19 sec. Robin Hood in D makes same histo in 1.27 sec, reads in 1.09 sec (and it's DMD 32-bit!).

After adding Rehash() between making and reading the histogram, CAtlMap makes histo in 2.37 sec, reads in 0.92.

October 02, 2014
On 10/01/2014 10:00 PM, thedeemon wrote:

> Here's another benchmark:
> D AAs vs. Vibe.d's open addressing hashes vs. Robin Hood hashing:
> http://www.infognition.com/blog/2014/on_robin_hood_hashing.html

What a coincidence. :) Your blog article was written just two weeks ago.

Ali