Fun project - faster associative array algorithm

The current D associative array algorithm https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d uses an array of buckets with a linked list attached to the buckets to resolve collisions. Linked lists are perhaps the worst (i.e. slowest) data structure possible on modern CPUs with memory caches. A possible cache-friendly replacement would be an array of buckets with local probing to resolve collisions. D programs are often heavily dependent on associative arrays, so this could be a big win for us. And it's a fun project, too. Any takers? Interestingly, https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c uses quadratic probing instead of linked lists, but this is not cache friendly, either.

April 07, 2015

Re: Fun project - faster associative array algorithm

Posted by w0rp
in reply to Walter Bright

Permalink

w0rp

Posted in reply to Walter Bright

Permalink

On Tuesday, 7 April 2015 at 17:25:00 UTC, Walter Bright wrote:
> The current D associative array algorithm
>
>     https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d
>
> uses an array of buckets with a linked list attached to the buckets to resolve collisions.
>
> Linked lists are perhaps the worst (i.e. slowest) data structure possible on modern CPUs with memory caches.
>
> A possible cache-friendly replacement would be an array of buckets with local probing to resolve collisions. D programs are often heavily dependent on associative arrays, so this could be a big win for us.
>
> And it's a fun project, too. Any takers?
>
>
>
> Interestingly,
>
>
> https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c
>
> uses quadratic probing instead of linked lists, but this is not cache friendly, either.

I have already been experimenting with this myself actually. My hashmap now uses a bucket array with quadratic probing. I was just following Martin Nowak's advice. My code for it is here.

https://github.com/w0rp/dstruct/blob/master/source/dstruct/map.d

I have a benchmark program available in my repository for testing it against the druntime version. It doesn't amaze me at the moment, as it's slightly faster for integers, and slightly slower for strings at the moment. I would appreciate any advice on it. I'm sure someone will look at my code and say, "Noo! Don't do that!"

I'll also take this chance to once again plug 'setDefault' as a useful associative array primitive. It's in my library. I don't care what name is used for it, just the semantics. Nowak suggested 'getOrSet', which is a good name. // A string -> int[] map HashMap!(string, int[]) map; // set an element as usual. map["a"] = [1, 2, 3]; // Returns [1, 2, 3], by reference, and appends 4 to it. map.setDefault("a") ~= 4 // Finds nothing in the map currently, returns (int[]).init by reference, // now created in the map, appends to that. map.setDefault("b") ~= 1 assert(map["b"] == [1]); // setDefault with a lazy parameter. int[] arr = map.setDefault("matey", [1, 2]); assert(arr == [1, 2] && arr == map["matey"]);

w0rp: > It doesn't amaze me at the moment, as it's slightly faster for integers, and slightly slower for strings at the moment. One problem with D strings is that they don't cache their hash value. Bye, bearophile

On 04/07/2015 07:24 PM, Walter Bright wrote: > https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c > > uses quadratic probing instead of linked lists, but this is not cache friendly, either. Quite to the contrary, it's very cache friendly. The triangular numbers are 1 3 6 10 15 21 With 8 byte per entry as in stringtable you need 2 collisions on the same "bucket" to land 48 bytes away, which might still be the same cacheline. The problem with linear probing is primary clustering, especially when your hash isn't too good, which can lead to long streaks of neighbouring buckets that are filled. Any insert that hashes into such a lump (chances are increasing the bigger it becomes) will make it grow even further. For higher load factor such as 0.75 this can lead to a huge variance of the probe sequence length and extreme lengths on the upper 95/99 percentiles. Here is the ticket for open addressing, btw. https://issues.dlang.org/show_bug.cgi?id=14385

On Tuesday, 7 April 2015 at 18:28:22 UTC, bearophile wrote: > w0rp: > >> It doesn't amaze me at the moment, as it's slightly faster for integers, and slightly slower for strings at the moment. > > One problem with D strings is that they don't cache their hash value. > > Bye, > bearophile I probably need to put that back in for some types. I tried taking the hash values away. I know that for integers where the size is <= size_t.sizeof, then I can keep the hash codes out of the array, as the key is always the same as the hash.

On Tuesday, 7 April 2015 at 18:35:27 UTC, Martin Nowak wrote: > For higher load factor such as 0.75 this can lead to a huge variance of > the probe sequence length and extreme lengths on the upper 95/99 > percentiles. > > Here is the ticket for open addressing, btw. > https://issues.dlang.org/show_bug.cgi?id=14385 One thing I was wondering about, which you might know more about, is that I had to set my load factor to be half the size of the array, as quadratic probing seems to fail when more than half the buckets are filled. Is that correct, or did I make a mistake?

On Tuesday, 7 April 2015 at 17:25:00 UTC, Walter Bright wrote: > The current D associative array algorithm I did a quick-scan of the source and didn't see any Bloom filter there. I wonder if it would be best to have the Bloom filters completely external or if they should be included in the associative array ? -Eg. the Bloom filter does require extra memory, which is not always desirable. On the other hand, a Bloom filter could be opted in, where high speed is required.

Food for thought : http://codecapsule.com/2013/11/11/robin-hood-hashing/ http://people.csail.mit.edu/shanir/publications/disc2008_submission_98.pdf Also it is probably worthwhile to adopt various strategy depending on element types characteristics.

On 4/7/15 3:07 PM, w0rp wrote: > On Tuesday, 7 April 2015 at 18:35:27 UTC, Martin Nowak wrote: >> For higher load factor such as 0.75 this can lead to a huge variance of >> the probe sequence length and extreme lengths on the upper 95/99 >> percentiles. >> >> Here is the ticket for open addressing, btw. >> https://issues.dlang.org/show_bug.cgi?id=14385 > > One thing I was wondering about, which you might know more about, is > that I had to set my load factor to be half the size of the array, as > quadratic probing seems to fail when more than half the buckets are > filled. Is that correct, or did I make a mistake? BTW, fun fact about D AA's, the load factor is 4. As in nodes = 4xbuckets will be added before rehashing. -Steve

Forums