Honey, I sped up the associated array - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Honey, I sped up the associated array

Thread overview

Honey, I sped up the associated array
Oct 12, 2006 Lionello Lunesu
Oct 13, 2006 Dave
Oct 13, 2006 Lionello Lunesu
Oct 13, 2006 Dave
Oct 13, 2006 Walter Bright
Oct 13, 2006 Josh Stern
Oct 13, 2006 Georg Wrede
Oct 14, 2006 Karen Lanrap
Oct 14, 2006 Lionello Lunesu
Oct 15, 2006 Karen Lanrap

October 12, 2006

Honey, I sped up the associated array

Posted by Lionello Lunesu

Lionello Lunesu

Attachments:

aa_no_mod.zip

Some claim, so numbers first:

As a benchmark, I've used my implementation of the program mentioned in the thread "Lisp vs. C++ (not off-topic)", in a critical-priority thread. Results are averaged over 8 runs. System: AMD X2 4600+, Windows Vista RC1 x64, 1GB.

Old AA: 563ms
New AA: 445ms

The only difference is the indexing. The old AA uses "index = hash % prime" whereas the new AA uses "index = (hash * MAGICNUMBER) >>> shift" (MAGICNUMBER being a uint constant, 0x9E3779B9 == (sqrt(5) - 1)*(2^31), aka golden ration [1]).

The performance of the AA depends also on the size of the underlying array, and since the sizes of the two implementations are never the same (the old one uses primes, the new one powers of 2) it's impossible to find a benchmark that covers all usage cases. But, when comparing only the changed lines of codes, the one involving the multiplication and shift is twice as fast as the one with the modulo (=division).

Attached, the new src/phobos/internal/aaA.d. To try it out, simply overwrite the file and rebuild phobos (make -fwin32.mak). Don't forget to copy phobos.lib to the lib folder.

L.

[1] Art of Computer Programming, Donald E. Knuth

October 13, 2006

Re: Honey, I sped up the associated array

Posted by Dave
in reply to Lionello Lunesu

Dave

Posted in reply to Lionello Lunesu

Lionello Lunesu wrote:
> Some claim, so numbers first:
> 
> As a benchmark, I've used my implementation of the program mentioned in the thread "Lisp vs. C++ (not off-topic)", in a critical-priority thread. Results are averaged over 8 runs. System: AMD X2 4600+, Windows Vista RC1 x64, 1GB.
> 
> Old AA: 563ms
> New AA: 445ms
> 

There were 4 Shootout Benchmark tests using hash tables - one is still active:

http://shootout.alioth.debian.org/gp4/benchmark.php?test=knucleotide&lang=dlang&id=2

Old AA: 1.85
New AA: 1.85

For the three older tests:

hash Old: 2.55
     New: 3.02

hash2 Old: 3.36
      New: 2.88

wordfreq Old: 0.178
         New: 0.170

(These were run as they are/were on the Shootout)

So it's kind-of 6 of one and 1/2 dozen of another for those tests, but your new version sure seems
more elegant!

Based on this and looking at the code for hash.d where the new version doesn't perform as well, it looks like the new version has more collisions.. If the getHash() in std/typeinfo/ti_Aa.d could be changed to provide a more unique hash w/o being made a lot slower, then I'd think your new version would be faster in the general case (at least for char[] AA keys).


> The only difference is the indexing. The old AA uses "index = hash % prime" whereas the new AA uses "index = (hash * MAGICNUMBER) >>> shift" (MAGICNUMBER being a uint constant, 0x9E3779B9 == (sqrt(5) - 1)*(2^31), aka golden ration [1]).
> 
> The performance of the AA depends also on the size of the underlying array, and since the sizes of the two implementations are never the same (the old one uses primes, the new one powers of 2) it's impossible to find a benchmark that covers all usage cases. But, when comparing only the changed lines of codes, the one involving the multiplication and shift is twice as fast as the one with the modulo (=division).
> 
> Attached, the new src/phobos/internal/aaA.d. To try it out, simply overwrite the file and rebuild phobos (make -fwin32.mak). Don't forget to copy phobos.lib to the lib folder.
> 
> L.
> 
> [1] Art of Computer Programming, Donald E. Knuth

October 13, 2006

Re: Honey, I sped up the associated array

Posted by Lionello Lunesu
in reply to Dave

Lionello Lunesu

Posted in reply to Dave

Dave wrote:
> Lionello Lunesu wrote:
>> Some claim, so numbers first:
>>
>> As a benchmark, I've used my implementation of the program mentioned in the thread "Lisp vs. C++ (not off-topic)", in a critical-priority thread. Results are averaged over 8 runs. System: AMD X2 4600+, Windows Vista RC1 x64, 1GB.
>>
>> Old AA: 563ms
>> New AA: 445ms
>>
> 
> There were 4 Shootout Benchmark tests using hash tables - one is still active:
> 
> http://shootout.alioth.debian.org/gp4/benchmark.php?test=knucleotide&lang=dlang&id=2 
> 
> 
> Old AA: 1.85
> New AA: 1.85
> 
> For the three older tests:
> 
> hash Old: 2.55
>      New: 3.02
> 
> hash2 Old: 3.36
>       New: 2.88
> 
> wordfreq Old: 0.178
>          New: 0.170
> 
> (These were run as they are/were on the Shootout)
>
> So it's kind-of 6 of one and 1/2 dozen of another for those tests, but your new version sure seems
> more elegant!

I also like powers-of-2 more than primes ;)

> Based on this and looking at the code for hash.d where the new version doesn't perform as well, it looks like the new version has more collisions.. If the getHash() in std/typeinfo/ti_Aa.d could be changed to provide a more unique hash w/o being made a lot slower, then I'd think your new version would be faster in the general case (at least for char[] AA keys).

How have you tested the number of collisions? It would be fairly trivial to add a counter to aaA.d to keep track of the number of collisions.

The size of the AA's internal array is the biggest factor, I've noticed. Depending on the benchmark, the new AA might end-up having either a much bigger or smaller array than the old AA, and therefor much less/more collisions. To correctly test the two implementations, we should either add a collision metric (testing the hashing quality) or only compare arrays of similar size (which is not possible given the way the two implementations resize).

A good thing of the new AA is the fact that it's sizing can be more easily controlled, and the AA could, in theory, be easily tuned for the task at hand. The old AA uses that static list of primes (half of which, in fact, are never being used; see thread in D.learn).

I'll try using the string hash method from java "h=7; foreach(b;a) h=(h<<5)+b-h;" but I doubt that there'll be any useful result. Although, one less multiplication might make it faster, still ;)

L.

October 13, 2006

Re: Honey, I sped up the associated array

Posted by Dave
in reply to Lionello Lunesu

Dave

Posted in reply to Lionello Lunesu

Lionello Lunesu wrote:
> Dave wrote:
>> Lionello Lunesu wrote:
>>> Some claim, so numbers first:
>>>
>>> As a benchmark, I've used my implementation of the program mentioned in the thread "Lisp vs. C++ (not off-topic)", in a critical-priority thread. Results are averaged over 8 runs. System: AMD X2 4600+, Windows Vista RC1 x64, 1GB.
>>>
>>> Old AA: 563ms
>>> New AA: 445ms
>>>
>>
>> There were 4 Shootout Benchmark tests using hash tables - one is still active:
>>
>> http://shootout.alioth.debian.org/gp4/benchmark.php?test=knucleotide&lang=dlang&id=2 
>>
>>
>> Old AA: 1.85
>> New AA: 1.85
>>
>> For the three older tests:
>>
>> hash Old: 2.55
>>      New: 3.02
>>
>> hash2 Old: 3.36
>>       New: 2.88
>>
>> wordfreq Old: 0.178
>>          New: 0.170
>>
>> (These were run as they are/were on the Shootout)
>  >
>> So it's kind-of 6 of one and 1/2 dozen of another for those tests, but your new version sure seems
>> more elegant!
> 
> I also like powers-of-2 more than primes ;)
> 
>> Based on this and looking at the code for hash.d where the new version doesn't perform as well, it looks like the new version has more collisions.. If the getHash() in std/typeinfo/ti_Aa.d could be changed to provide a more unique hash w/o being made a lot slower, then I'd think your new version would be faster in the general case (at least for char[] AA keys).
> 
> How have you tested the number of collisions? It would be fairly trivial to add a counter to aaA.d to keep track of the number of collisions.
> 
> The size of the AA's internal array is the biggest factor, I've noticed. Depending on the benchmark, the new AA might end-up having either a much bigger or smaller array than the old AA, and therefor much less/more collisions. To correctly test the two implementations, we should either add a collision metric (testing the hashing quality) or only compare arrays of similar size (which is not possible given the way the two implementations resize).
> 
> A good thing of the new AA is the fact that it's sizing can be more easily controlled, and the AA could, in theory, be easily tuned for the task at hand. The old AA uses that static list of primes (half of which, in fact, are never being used; see thread in D.learn).
> 
> I'll try using the string hash method from java "h=7; foreach(b;a) h=(h<<5)+b-h;" but I doubt that there'll be any useful result. Although, one less multiplication might make it faster, still ;)
> 
> L.

The java algorithm doesn't improve any of my tests and is slower yet for hash.d.

Just for kicks, I tried changing the current getHash() multiplier (using your new AA) from 11 to 13 and got slightly better results for everything.

knuc Old: 2.44 (*)
     11:  2.03
     13:  1.93

hash 11: 3.02
     13: 2.94

hash2 11: 2.88
      13: 2.68

wordfreq 11: 0.170
         13: 0.160

The last two are about 5% better - probably worth the minor change to getHash().

hash2.d is lookup intensive and wordfreq along with knuc and your Lisp port have sparse keys so I'd say combining your new AA code along with a slight mod. to the getHash() multiplier would be the way to go.

The only one that's slower (hash.d) will probably improve as the gc improves because I think part of the difference is also more frequent allocations (sorry, I don't have the time right now to instrument this stuff).

Nice work! I don't really like the idea of the static prime number array sizes either.

* The knuc in my first message was using a hashtable library, not the built-in AA's. This last comparison was with a version using the built-in AA's and is now within a 1/10 of a second of the prior version - no need for the custom hashtable!

October 13, 2006

Re: Honey, I sped up the associated array

Posted by Walter Bright
in reply to Lionello Lunesu

Walter Bright

Posted in reply to Lionello Lunesu

Lionello Lunesu wrote:
> The size of the AA's internal array is the biggest factor, I've noticed. 

Not exactly, it's the number of collisions that matters, and the size of the array influences this as well as the hash algorithm. For mathematical reasons I do not understand, taking the remainder of division by a prime number gives the most even 'spread' across the array, minimizing collisions for a given array size.

> The old AA uses that static list of primes (half of which, in fact, are never being used; see thread in D.learn).

It used to, the algorithm changed and the table wasn't updated.

October 13, 2006

Re: Honey, I sped up the associated array

Posted by Josh Stern
in reply to Walter Bright

Josh Stern

Posted in reply to Walter Bright

On Fri, 13 Oct 2006 12:05:02 -0700, Walter Bright wrote:

> Lionello Lunesu wrote:
>> The size of the AA's internal array is the biggest factor, I've noticed.
> 
> Not exactly, it's the number of collisions that matters, and the size of the array influences this as well as the hash algorithm. For mathematical reasons I do not understand, taking the remainder of division by a prime number gives the most even 'spread' across the array, minimizing collisions for a given array size.

If the hash code and the hash array size shared a common factor, then
the first can be written in the form k*x and the second in the form k*y,
so in the division the k's would cancel and spread function would be
really only modulus y rather than modulus k*y.   So picking a prime number
for the size of the array eliminates that possibility for k smaller than
the table size.

This link has the same values used in gcc's hashtable implementation: http://planetmath.org/encyclopedia/GoodHashTablePrimes.html

October 13, 2006

Re: Honey, I sped up the associated array

Posted by Georg Wrede
in reply to Walter Bright

Georg Wrede

Posted in reply to Walter Bright

Walter Bright wrote:
> taking the remainder of
> division by a prime number gives the most even 'spread' across the array, minimizing collisions for a given array size.

If I understand the problem correctly, the entire point of having a hash in the first place is minimizing the collisions. And the _best_ way to achieve this is having a prime to divide them with.

October 14, 2006

Re: Honey, I sped up the associated array

Posted by Karen Lanrap
in reply to Lionello Lunesu

Karen Lanrap

Posted in reply to Lionello Lunesu

Lionello Lunesu wrote:

> the new AA uses "index = (hash * MAGICNUMBER) >>> shift"

Because often the hashvalue will be the address of the entry in memory one will get lots of collisions once the hashtable is filled up to 25%.

October 14, 2006

Re: Honey, I sped up the associated array

Posted by Lionello Lunesu
in reply to Karen Lanrap

Lionello Lunesu

Posted in reply to Karen Lanrap

"Karen Lanrap" <karen@digitaldaemon.com> wrote in message news:Xns985C4FF109739digitaldaemoncom@63.105.9.61...
> Lionello Lunesu wrote:
>
>> the new AA uses "index = (hash * MAGICNUMBER) >>> shift"
>
> Because often the hashvalue will be the address of the entry in memory one will get lots of collisions once the hashtable is filled up to 25%.

What do you mean? Give me an example, please.

L.

October 15, 2006

Re: Honey, I sped up the associated array

Posted by Karen Lanrap
in reply to Lionello Lunesu

Karen Lanrap

Posted in reply to Lionello Lunesu

Lionello Lunesu wrote:

> Give me an example, please.

I cannot give an example because my statement seems to be wrong--- kudos to D. Knuth.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation