fun project - improving calcHash - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » fun project - improving calcHash

Thread overview

fun project - improving calcHash
Jun 23, 2013 Walter Bright
Jun 23, 2013 Timon Gehr
Jun 24, 2013 Juan Manuel Cabo
Jun 24, 2013 Juan Manuel Cabo
Jun 24, 2013 Martin Nowak
Jun 24, 2013 Martin Nowak
Jun 24, 2013 Timothee Cour
Jun 24, 2013 Michael
Jun 24, 2013 qznc
Jun 24, 2013 Anders Halager
Jun 24, 2013 Walter Bright
Jun 24, 2013 Anders Halager
Jun 24, 2013 Tyro[17]
Jun 26, 2013 Kagamin
Jun 24, 2013 qznc
Jun 24, 2013 bearophile
Jun 24, 2013 deadalnix
Jun 24, 2013 monarch_dodra

June 23, 2013

fun project - improving calcHash

Posted by Walter Bright

Walter Bright

https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c#L21

Profiling shows the calcHash function is a significant contributor to compilation time (3.25% of total time). So making it faster is a win. Even making dmd 1% faster would be a nice win - all those little drops add up.

There are many, many string hash functions findable through google. Anyone want to spend the effort to make a faster one? Remember, ya gotta prove it's faster!

A nice timing test would be the time expending compiling Phobos. I would suggest that the 64 bit build of dmd be used for timing tests.

Also, be careful, many of those hash functions on the intarnets have a license that makes it unusable for dmd.

June 23, 2013

Re: fun project - improving calcHash

Posted by Timon Gehr
in reply to Walter Bright

Timon Gehr

Posted in reply to Walter Bright

On 06/23/2013 11:22 PM, Walter Bright wrote:
> https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c#L21
>
>
> Profiling shows the calcHash function is a significant contributor to
> compilation time (3.25% of total time). So making it faster is a win.
> Even making dmd 1% faster would be a nice win - all those little drops
> add up.
>
> There are many, many string hash functions findable through google.
> Anyone want to spend the effort to make a faster one? Remember, ya gotta
> prove it's faster!
> ...

It seems to be easy to make it faster without changing the algorithm significantly. There should be only one switch executed, not one per 4 characters. If the length is short enough from the start, there is no point in doing any arithmetic.

June 24, 2013

Re: fun project - improving calcHash

Posted by Juan Manuel Cabo
in reply to Walter Bright

Juan Manuel Cabo

Posted in reply to Walter Bright

On 06/23/2013 06:22 PM, Walter Bright wrote:
> https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c#L21
> 
> Profiling shows the calcHash function is a significant contributor to compilation time (3.25% of total time). So making it faster is a win. Even making dmd 1% faster would be a nice win - all those little drops add up.
> 
> There are many, many string hash functions findable through google. Anyone want to spend the effort to make a faster one? Remember, ya gotta prove it's faster!
> 
> A nice timing test would be the time expending compiling Phobos. I would suggest that the 64 bit build of dmd be used for timing tests.
> 
> Also, be careful, many of those hash functions on the intarnets have a license that makes it unusable for dmd.

I performed a quick test, and I don't think that the original function can be improved for speed (though it might be improved for less collisions):

    https://gist.github.com/jmcabo/5847036

I used words and lines from the complete works of Shakespeare. I tested separating the loop from the switch, as was suggested in Timon Gehr post. It didn't improve the speed when compiled with "g++ -O3", though it improved it a bit without -O3.

I also tested removing the multiplication by 37. It didn't improve the speed. With "g++ -O3" they are all the same.

So, unless I'm measuring it wrong, the algorithm is as fast
as can be (as fast as just adding chars).

--jm

June 24, 2013

Re: fun project - improving calcHash

Posted by Martin Nowak
in reply to Walter Bright

Martin Nowak

Posted in reply to Walter Bright

On 06/23/2013 11:22 PM, Walter Bright wrote:
> Profiling shows the calcHash function is a significant contributor to
> compilation time (3.25% of total time). So making it faster is a win.
> Even making dmd 1% faster would be a nice win - all those little drops
> add up.
>

You'll find a benchmark at the end of the post.
http://forum.dlang.org/thread/op.v2yky9pdsqugbd@dawg-freebsd.lan

Obviously the switch in a for loop is a branch prediction killer.
This should work much better.

for (size_t i = 0; i < len / 4; ++i)
{
    hash *= 37;
    hash += *(const uint32_t *)str;
    str += 4;
}
switch (len & 3)
{
case 0:
    break;
case 1:
    hash *= 37;
    hash += *(const uint8_t *)str;
    break;
case 2:
    hash *= 37;
    hash += *(const uint16_t *)str;
    break;
case 3:
    hash *= 37;
    hash += (*(const uint16_t *)str << 8) +
             ((const uint8_t *)str)[2];
    break;
default:
    assert(0);
}
return hash;

Finding a faster hash algorithm isn't simple because prime multiplication is so simple. At least we're not diving the hash result by 37 any longer.
I think using the Hsieh hash is a good choice because it has quite an OK distribution so it leads to fewer collisions. It's also pretty fast for short strings and we already use it in druntime.

June 24, 2013

Re: fun project - improving calcHash

Posted by Juan Manuel Cabo
in reply to Juan Manuel Cabo

Juan Manuel Cabo

Posted in reply to Juan Manuel Cabo

On 06/23/2013 09:20 PM, Juan Manuel Cabo wrote:
> On 06/23/2013 06:22 PM, Walter Bright wrote:
>> https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c#L21
>>
>> Profiling shows the calcHash function is a significant contributor to compilation time (3.25% of total time). So making it faster is a win. Even making dmd 1% faster would be a nice win - all those little drops add up.
>>
>> There are many, many string hash functions findable through google. Anyone want to spend the effort to make a faster one? Remember, ya gotta prove it's faster!
>>
>> A nice timing test would be the time expending compiling Phobos. I would suggest that the 64 bit build of dmd be used for timing tests.
>>
>> Also, be careful, many of those hash functions on the intarnets have a license that makes it unusable for dmd.
> 
> 
> I performed a quick test, and I don't think that the original function can be improved for speed (though it might be improved for less collisions):
> 
>     https://gist.github.com/jmcabo/5847036
> 
> I used words and lines from the complete works of Shakespeare. I tested separating the loop from the switch, as was suggested in Timon Gehr post. It didn't improve the speed when compiled with "g++ -O3", though it improved it a bit without -O3.
> 
> I also tested removing the multiplication by 37. It didn't improve the speed. With "g++ -O3" they are all the same.
> 
> So, unless I'm measuring it wrong, the algorithm is as fast
> as can be (as fast as just adding chars).
> 
> --jm
> 
> 

Oh, it might be improved by loading 128bits at a time instead of 32bits... but that would benefit strings of more than 16 bytes. Google's CitiHash seems tuned for large strings.

--jm

June 24, 2013

Re: fun project - improving calcHash

Posted by Timothee Cour
in reply to Walter Bright

Timothee Cour

Posted in reply to Walter Bright

Attachments:

text/html part

On Sun, Jun 23, 2013 at 2:22 PM, Walter Bright <newshound2@digitalmars.com>wrote:

> https://github.com/D-**Programming-Language/dmd/blob/** master/src/root/stringtable.c#**L21<https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c#L21>
>
> Profiling shows the calcHash function is a significant contributor to compilation time (3.25% of total time). So making it faster is a win. Even making dmd 1% faster would be a nice win - all those little drops add up.
>
> There are many, many string hash functions findable through google. Anyone want to spend the effort to make a faster one? Remember, ya gotta prove it's faster!
>
> A nice timing test would be the time expending compiling Phobos. I would suggest that the 64 bit build of dmd be used for timing tests.
>
> Also, be careful, many of those hash functions on the intarnets have a license that makes it unusable for dmd.
>

implementing this "proposal: lazy compilation model for compiling binaries" http://forum.dlang.org/post/mailman.1357.1371876331.13711.digitalmars-d@puremagic.com would have much larger impact on compile time performance.

June 24, 2013

Re: fun project - improving calcHash

Posted by Michael
in reply to Walter Bright

Michael

Posted in reply to Walter Bright

https://code.google.com/p/xxhash/
BSD 2-Clause License

June 24, 2013

Re: fun project - improving calcHash

Posted by qznc
in reply to Walter Bright

qznc

Posted in reply to Walter Bright

On Sunday, 23 June 2013 at 21:22:40 UTC, Walter Bright wrote:
> https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c#L21
>
> Profiling shows the calcHash function is a significant contributor to compilation time (3.25% of total time). So making it faster is a win. Even making dmd 1% faster would be a nice win - all those little drops add up.
>
> There are many, many string hash functions findable through google. Anyone want to spend the effort to make a faster one? Remember, ya gotta prove it's faster!
>
> A nice timing test would be the time expending compiling Phobos. I would suggest that the 64 bit build of dmd be used for timing tests.
>
> Also, be careful, many of those hash functions on the intarnets have a license that makes it unusable for dmd.

My first idea was to look at Python, since it requires a lot of hash calculation dynamically and probably is one of the most tuned implementations. Interesting how naive it is:

http://hg.python.org/cpython/file/7aab60b70f90/Objects/object.c#l759

Just a simple for loop over chars. No switch.

Wikipedia: "Python Software Foundation License (PSFL) is a BSD-style permissive free software license"

June 24, 2013

Re: fun project - improving calcHash

Posted by Martin Nowak
in reply to Juan Manuel Cabo

Martin Nowak

Posted in reply to Juan Manuel Cabo

On 06/24/2013 03:43 AM, Juan Manuel Cabo wrote:
>> I used words and lines from the complete works of Shakespeare.
>> I tested separating the loop from the switch, as was suggested
>> in Timon Gehr post. It didn't improve the speed when compiled
>> with "g++ -O3", though it improved it a bit without -O3.
>>
This is limited by the memory bandwidth so your testing the wrong thing.

June 24, 2013

Re: fun project - improving calcHash

Posted by qznc
in reply to Walter Bright

qznc

Posted in reply to Walter Bright

On Sunday, 23 June 2013 at 21:22:40 UTC, Walter Bright wrote:
> https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c#L21
>
> Profiling shows the calcHash function is a significant contributor to compilation time (3.25% of total time). So making it faster is a win. Even making dmd 1% faster would be a nice win - all those little drops add up.

To make dmd 1% faster, calcHash must be reduced to 69% run time. Does not look possible.

Also, why is it implemented twice -- in src/root/stringtable.c and in src/root/root.c?

Improving AAs by changing the data structure seems to be more worthwhile. They use a linked list internally, which is not exactly cache-friendly. From your gprof run the potential is mostly 5.19% _aaGetRvalue + 1.30% _aaGet.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation