Thread overview
Default hashing function for AA's
Oct 09, 2017
RazvanN
Oct 09, 2017
Per Nordlöw
May 31, 2018
Robert M. Münch
October 09, 2017
Hi all,

We in the UPB dlang group have been having discussions about the hashing functions of associative arrays. In particular, we were wondering why is the AA implementation in druntime is not using the hash function implemented in druntime/src/core/internal/hash.hashOf for classes that don't define toHash().

For us, that seems to be a very good default hashing function.

Best regards,
On behalf of the UPB Dlang group
October 09, 2017
On Monday, 9 October 2017 at 14:11:13 UTC, RazvanN wrote:
> We in the UPB dlang group have been having discussions about the hashing functions of associative arrays. In particular, we were wondering why is the AA implementation in druntime is not using the hash function implemented in druntime/src/core/internal/hash.hashOf for classes that don't define toHash().
>
> For us, that seems to be a very good default hashing function.

Further, I haven't found any instructions on changing the default hash-digest for `hashOf`. Is this in conflict with `hashOf` being `pure`? Could the interface to builtin AA's be extended to support changing the default hash algorithm (which in turn `hashOf` will use) upon AA instantiation?
October 10, 2017
On 10/9/17 10:11 AM, RazvanN wrote:
> Hi all,
> 
> We in the UPB dlang group have been having discussions about the hashing functions of associative arrays. In particular, we were wondering why is the AA implementation in druntime is not using the hash function implemented in druntime/src/core/internal/hash.hashOf for classes that don't define toHash().

AA uses typeid(Key).getHash. [1]

For objects, this calls the virtual function `toHash`. [2]

Please keep in mind that all you are hashing is the class reference pointer, as that is the default comparison for `opEquals`. It might make sense to shuffle those bits a bit, since the bucket algorithm only looks at the lower bits of the hash, and this basically guarantees keys with the default hash will be empty in the first few buckets, since class objects are aligned on a power-of-2 boundary.

But I'm not sure running a full blown hash on the pointer is necessary. Perhaps just xor-ing the upper bits with the lower bits makes more sense.

Alternatively, you could change the default `opEquals` but that may break things more than expected.

What you *can't* do is change what the hash is based on without changing `opEquals`.

Note that I wouldn't recommend using an Object as a key without defining toHash and opEquals anyway.

-Steve

[1] https://github.com/dlang/druntime/blob/master/src/rt/aaA.d#L310
[2] https://github.com/dlang/druntime/blob/master/src/object.d#L66
May 31, 2018
On 2017-10-10 15:22:05 +0000, Steven Schveighoffer said:

> AA uses typeid(Key).getHash. [1]
> 
> For objects, this calls the virtual function `toHash`. [2]
> 
> Please keep in mind that all you are hashing is the class reference pointer, as that is the default comparison for `opEquals`. It might make sense to shuffle those bits a bit, since the bucket algorithm only looks at the lower bits of the hash, and this basically guarantees keys with the default hash will be empty in the first few buckets, since class objects are aligned on a power-of-2 boundary.
> 
> But I'm not sure running a full blown hash on the pointer is necessary. Perhaps just xor-ing the upper bits with the lower bits makes more sense.
> 
> Alternatively, you could change the default `opEquals` but that may break things more than expected.
> 
> What you *can't* do is change what the hash is based on without changing `opEquals`.
> 
> Note that I wouldn't recommend using an Object as a key without defining toHash and opEquals anyway.
> 
> -Steve
> 
> [1] https://github.com/dlang/druntime/blob/master/src/rt/aaA.d#L310
> [2] https://github.com/dlang/druntime/blob/master/src/object.d#L66

Not sure this thread fits or a new one would be better. Anyway, here is an interesting article about a very fast hashtable and maybe it's a good candidate for the default AA implementation in D. IMO providing a very performant AA implementation is a big asset:

https://probablydance.com/2018/05/28/a-new-fast-hash-table-in-response-to-googles-new-fast-hash-table/ 



-- 
Robert M. Münch
http://www.saphirion.com
smarter | better | faster