Thread overview
is there any reason to use SuperFastHash in druntime? (except speed)
May 28, 2015
ketmar
May 28, 2015
Per Nordlöw
May 28, 2015
ketmar
May 28, 2015
Vladimir Panteleev
May 28, 2015
Kagamin
May 28, 2015
ketmar
May 29, 2015
Martin Nowak
RE: is there any reason to use SuperFastHash in druntime? (exceptspeed)
May 29, 2015
Daniel Kozák
Re: is there any reason to use SuperFastHash in druntime? (exceptspeed)
May 29, 2015
ketmar
May 29, 2015
ketmar
May 28, 2015
SuperFastHash has known distribution problems[1]. there is fasthash[2] function, which is comparable to murmur, but requires one multiplication per 8 bytes (and it yields 64-bit hash values) and using ulongs. i think that there is some sense in trading some speed for better distribution. what do you think?

p.s. i made a stupid tester which using "/usr/share/dict/words" to test hashing functions. it hashes each word as all-lower, all-upper and all- lower with first char as upper. the results are:

fastHash64 ... perfect for 115857 words
fastHash32 ... perfect for 115857 words
sfHash ... 6 collisions in total

that's not that bad, but fasthash is definitely better. ;-)

and i believe that people that care about performance will roll their own hash functions anyway, so having not lightning fast, but good hash in druntime is better than having fast, but worse function.


[1] google://superfasthash+problems
[2] http://code.google.com/p/fast-hash/

May 28, 2015
On Thursday, 28 May 2015 at 03:09:07 UTC, ketmar wrote:
> SuperFastHash has known distribution problems[1]. there is fasthash[2]

Is there an easy to tweak the default hashing of key type, say `K`, when using a builtin D AA, say `V[K]`?
May 28, 2015
On Thursday, 28 May 2015 at 03:09:07 UTC, ketmar wrote:
> SuperFastHash has known distribution problems[1].

BTW, something I noticed while browsing Rust docs today:

> The hashes are all keyed by the thread-local random number generator on creation by default. This means that the ordering of the keys is randomized, but makes the tables more resistant to denial-of-service attacks (Hash DoS).

http://doc.rust-lang.org/std/collections/struct.HashMap.html

As I understand, D is certainly vulnerable to Hash DoS.

May 28, 2015
On Thursday, 28 May 2015 at 12:40:07 UTC, Vladimir Panteleev wrote:
> As I understand, D is certainly vulnerable to Hash DoS.

https://issues.dlang.org/show_bug.cgi?id=14414
May 28, 2015
On Thu, 28 May 2015 12:02:44 +0000, Per Nordlöw wrote:

> On Thursday, 28 May 2015 at 03:09:07 UTC, ketmar wrote:
>> SuperFastHash has known distribution problems[1]. there is fasthash[2]
> 
> Is there an easy to tweak the default hashing of key type, say `K`, when using a builtin D AA, say `V[K]`?

nope.

May 28, 2015
On Thu, 28 May 2015 12:39:52 +0000, Vladimir Panteleev wrote:

> On Thursday, 28 May 2015 at 03:09:07 UTC, ketmar wrote:
>> SuperFastHash has known distribution problems[1].
> 
> BTW, something I noticed while browsing Rust docs today:
> 
>> The hashes are all keyed by the thread-local random number generator on creation by default. This means that the ordering of the keys is randomized, but makes the tables more resistant to denial-of-service attacks (Hash DoS).
> 
> http://doc.rust-lang.org/std/collections/struct.HashMap.html
> 
> As I understand, D is certainly vulnerable to Hash DoS.

yes. and the interesting thing is that `toHash()` accepts seed, but it's not used in other code. looks to me like low-hanging fruit.

May 29, 2015
On Thursday, 28 May 2015 at 03:09:07 UTC, ketmar wrote:
> i think
> that there is some sense in trading some speed for better distribution.
> what do you think?

We discussed most part of this already.
http://forum.dlang.org/post/mff4id$hj8$1@digitalmars.com

And we're already using MurmurHash3 for the new CTFEable hash function.
https://github.com/D-Programming-Language/druntime/blob/master/src/core/internal/hash.d

Though I don't get any feedback for the library AA.
http://forum.dlang.org/post/mjsma6$196h$1@digitalmars.com

> and i believe that people that care about performance will roll their own
> hash functions anyway

Anyone will use the built-in hash, performance is of utmost importance.

> [2] http://code.google.com/p/fast-hash/

It's called fast hash, but how fast is it?
May 29, 2015
A little bit faster than murnurhash3 in my scenerio. But I prefer farmhash. http://code.google.com/p/farmhash/

----- Původní zpráva -----
Od:"Martin Nowak via Digitalmars-d" <digitalmars-d@puremagic.com>
Odesláno:‎29. ‎5. ‎2015 11:41
Komu:"digitalmars-d@puremagic.com" <digitalmars-d@puremagic.com>
Předmět:Re: is there any reason to use SuperFastHash in druntime? (exceptspeed)

On Thursday, 28 May 2015 at 03:09:07 UTC, ketmar wrote:
> i think
> that there is some sense in trading some speed for better
> distribution.
> what do you think?

We discussed most part of this already. http://forum.dlang.org/post/mff4id$hj8$1@digitalmars.com

And we're already using MurmurHash3 for the new CTFEable hash
function.
https://github.com/D-Programming-Language/druntime/blob/master/src/core/internal/hash.d

Though I don't get any feedback for the library AA. http://forum.dlang.org/post/mjsma6$196h$1@digitalmars.com

> and i believe that people that care about performance will roll
> their own
> hash functions anyway

Anyone will use the built-in hash, performance is of utmost importance.

> [2] http://code.google.com/p/fast-hash/

It's called fast hash, but how fast is it?


May 29, 2015
On Fri, 29 May 2015 09:36:22 +0000, Martin Nowak wrote:

> On Thursday, 28 May 2015 at 03:09:07 UTC, ketmar wrote:
>> i think that there is some sense in trading some speed for better
>> distribution.
>> what do you think?
> We discussed most part of this already. http://forum.dlang.org/post/mff4id$hj8$1@digitalmars.com

i was more interested in reasons behind choosing SFH, yet worded my question poorly. sorry.

> Though I don't get any feedback for the library AA. http://forum.dlang.org/post/mjsma6$196h$1@digitalmars.com

me has nothing valuable to say. ;-) sadly, i can see things done wrong only after they were done, not in the design stage.

>> and i believe that people that care about performance will roll their own hash functions anyway
> Anyone will use the built-in hash, performance is of utmost importance.

i believe that it should be "acceptable", not "fastest possible". i like to trade some speed for better output, for example. what is the reason of having superfast, but not good hash function? code will spend alot of time resolving collisions then. ;-)

>> [2] http://code.google.com/p/fast-hash/
> It's called fast hash, but how fast is it?

i didn't do any serious measurement, but for me it's slightly faster than Murmur3 (which is seen even in code, as it uses one multiplication where Murmur3 is using two), and the quality of output is comparable.

May 29, 2015
On Fri, 29 May 2015 11:52:58 +0200, Daniel Kozák via Digitalmars-d wrote:

> A little bit faster than murnurhash3 in my scenerio. But I prefer farmhash. http://code.google.com/p/farmhash/

thank you, i didn't hit that one before.