AA Performance in Benchmarks

Apr 23, 2015

Shammah Chancellor

Apr 23, 2015

Steven Schveighoffer

Apr 25, 2015

Apr 26, 2015

Apr 26, 2015

Apr 25, 2015

Apr 24, 2015

Apr 24, 2015

Apr 24, 2015

So, I was tooling around with one of the benchmarks I saw listed on twitter: https://github.com/kostya/benchmarks/tree/master/brainfuck This D benchmark spends most of it's time on AA lookups. Discussions about the implementation aside, I am curious as to why the D AA is so slow even after calling AA.rehash. I took a look at the Nim tables implementation, and it looks that they use power-of-two vectors and use the property of primitive roots modulo n to get the next value in the case of a collision: ``` proc nextTry(h, maxHash: THash): THash {.inline.} = result = ((5 * h) + 1) and maxHash ``` So, my questions: 1) What are the general purpose performance implications of something like this? 2) I saw a thread awhile ago where someone had posted an updated implementation of the AA with benchmarks and showed that it was substantially faster in all cases. I can't find the thread again -- what happened with this work? -Shammah

On 4/23/15 10:40 AM, Shammah Chancellor wrote: > 2) I saw a thread awhile ago where someone had posted an updated > implementation of the AA with benchmarks and showed that it was > substantially faster in all cases. I can't find the thread again -- > what happened with this work? There is a PR being discussed right now. Please take a look and chime in: https://github.com/D-Programming-Language/druntime/pull/1229 It would be interesting to know how the new AA performs in this test. -Steve

On Thursday, 23 April 2015 at 14:40:58 UTC, Shammah Chancellor wrote: > So, I was tooling around with one of the benchmarks I saw listed on twitter: > > https://github.com/kostya/benchmarks/tree/master/brainfuck > > This D benchmark spends most of it's time on AA lookups. Discussions about the implementation aside, I am curious as to why the D AA is so slow even after calling AA.rehash. > > I took a look at the Nim tables implementation, and it looks that they use power-of-two vectors and use the property of primitive roots modulo n to get the next value in the case of a collision: > > ``` > proc nextTry(h, maxHash: THash): THash {.inline.} = > result = ((5 * h) + 1) and maxHash > ``` > > So, my questions: > > 1) What are the general purpose performance implications of something like this? > 2) I saw a thread awhile ago where someone had posted an updated implementation of the AA with benchmarks and showed that it was substantially faster in all cases. I can't find the thread again -- what happened with this work? > > -Shammah Interesting. I noticed while doing some benchmarking that looking up data that is stored in an AA is slower than generating the data on the fly. I was really surprised.

On Friday, 24 April 2015 at 09:18:48 UTC, Chris wrote: > On Thursday, 23 April 2015 at 14:40:58 UTC, Shammah Chancellor wrote: >> So, I was tooling around with one of the benchmarks I saw listed on twitter: >> >> https://github.com/kostya/benchmarks/tree/master/brainfuck >> >> This D benchmark spends most of it's time on AA lookups. Discussions about the implementation aside, I am curious as to why the D AA is so slow even after calling AA.rehash. >> >> I took a look at the Nim tables implementation, and it looks that they use power-of-two vectors and use the property of primitive roots modulo n to get the next value in the case of a collision: >> >> ``` >> proc nextTry(h, maxHash: THash): THash {.inline.} = >> result = ((5 * h) + 1) and maxHash >> ``` >> >> So, my questions: >> >> 1) What are the general purpose performance implications of something like this? >> 2) I saw a thread awhile ago where someone had posted an updated implementation of the AA with benchmarks and showed that it was substantially faster in all cases. I can't find the thread again -- what happened with this work? >> >> -Shammah > > Interesting. I noticed while doing some benchmarking that looking up data that is stored in an AA is slower than generating the data on the fly. I was really surprised. AA lookup is (more often than not) slightly slower than (or at least as fast as) generating the data on demand: import std.datetime : benchmark, Duration; import std.string : format; import std.conv : to; import std.stdio : writeln; enum { string[string] words = [ "Data1":"Blah", "Data2":"Blub", ], string[] letters = ["B", "l", "a", "h"] } void main() { words.rehash(); auto result = benchmark!(lookup, make)(100_000); writeln(to!Duration(result[0])); writeln(to!Duration(result[1])); } string lookup() { auto blah = words["Data1"]; return blah; } string make() { string res; foreach (c; letters) res ~= c; return res; } dmd v2.067.0 -release -O -boundscheck=off -inline

April 24, 2015

Re: AA Performance in Benchmarks

Posted by Chris
in reply to Chris

Permalink

Chris

Posted in reply to Chris

Permalink

On Friday, 24 April 2015 at 11:30:14 UTC, Chris wrote:
> On Friday, 24 April 2015 at 09:18:48 UTC, Chris wrote:
>> On Thursday, 23 April 2015 at 14:40:58 UTC, Shammah Chancellor wrote:
>>> So, I was tooling around with one of the benchmarks I saw listed on twitter:
>>>
>>> https://github.com/kostya/benchmarks/tree/master/brainfuck
>>>
>>> This D benchmark spends most of it's time on AA lookups.   Discussions about the implementation aside, I am curious as to why the D AA is so slow even after calling AA.rehash.
>>>
>>> I took a look at the Nim tables implementation, and it looks that they use power-of-two vectors and use the property of primitive roots modulo n to get the next value in the case of a collision:
>>>
>>> ```
>>> proc nextTry(h, maxHash: THash): THash {.inline.} =
>>> result = ((5 * h) + 1) and maxHash
>>> ```
>>>
>>> So, my questions:
>>>
>>> 1) What are the general purpose performance implications of something like this?
>>> 2) I saw a thread awhile ago where someone had posted an updated implementation of the AA with benchmarks and showed that it was substantially faster in all cases.  I can't find the thread again -- what happened with this work?
>>>
>>> -Shammah
>>
>> Interesting. I noticed while doing some benchmarking that looking up data that is stored in an AA is slower than generating the data on the fly. I was really surprised.
>
> AA lookup is (more often than not) slightly slower than (or at least as fast as) generating the data on demand:
>
> import std.datetime : benchmark, Duration;
> import std.string : format;
> import std.conv : to;
> import std.stdio : writeln;
>
> enum {
>   string[string] words = [
>     "Data1":"Blah",
>     "Data2":"Blub",
>   ],
>   string[] letters = ["B", "l", "a", "h"]
> }
>
> void main() {
>   words.rehash();
>   auto result = benchmark!(lookup, make)(100_000);
>   writeln(to!Duration(result[0]));
>   writeln(to!Duration(result[1]));
> }
>
> string lookup() {
>   auto blah = words["Data1"];
>   return blah;
> }
>
> string make() {
>   string res;
>   foreach (c; letters)
>     res ~= c;
>   return res;
> }
>
> dmd v2.067.0 -release -O -boundscheck=off -inline

Recte: I meant to say creating the data is slower here, of course. Ah, well, it's Friday!

On Thursday, 23 April 2015 at 15:29:37 UTC, Steven Schveighoffer wrote: > On 4/23/15 10:40 AM, Shammah Chancellor wrote: > >> 2) I saw a thread awhile ago where someone had posted an updated >> implementation of the AA with benchmarks and showed that it was >> substantially faster in all cases. I can't find the thread again -- >> what happened with this work? > > There is a PR being discussed right now. Please take a look and chime in: > > https://github.com/D-Programming-Language/druntime/pull/1229 > > It would be interesting to know how the new AA performs in this test. > > -Steve At a slight tangent: is there a way to reserve capacity for an associative array? The obvious way does not seem to work. I noticed also in a talk at nativecode (possibly by Herb Sutter) that C++ gives you the ability to tune almost any parameter of the associative array implementation in the standard library. (I'm not a C++ guy). Is this entirely spurious, or would it be helpful for D? Laeeth.

On 04/23/2015 05:29 PM, Steven Schveighoffer wrote: > > https://github.com/D-Programming-Language/druntime/pull/1229 > > It would be interesting to know how the new AA performs in this test. Went from 11s down to 9s, ~20% improvement.

"Laeeth Isharc" wrote in message news:hnihyhgtmdzhszkpnftl@forum.dlang.org... > At a slight tangent: is there a way to reserve capacity for an associative array? The obvious way does not seem to work. No. > I noticed also in a talk at nativecode (possibly by Herb Sutter) that C++ gives you the ability to tune almost any parameter of the associative array implementation in the standard library. (I'm not a C++ guy). Is this entirely spurious, or would it be helpful for D? Yes, if we had an AA in phobos.

On 04/26/2015 12:58 PM, Daniel Murphy wrote: > > Yes, if we had an AA in phobos. Yes, I think phobos should get a few more optimized containers. SparseSet DenseSet SparseHash DenseHash They should be configurable w.r.t. load factor, equality comparison, and allocation policy (when std.allocator is available). http://sparsehash.googlecode.com/svn/trunk/doc/index.html As the builtin druntime AA will likely always remain a general purpose hash table we can't do all the possible optimizations.

Forums