Thread overview
AA Performance in Benchmarks
Apr 23, 2015
Shammah Chancellor
Apr 25, 2015
Laeeth Isharc
Apr 26, 2015
Daniel Murphy
Apr 26, 2015
Martin Nowak
Apr 25, 2015
Martin Nowak
Apr 24, 2015
Chris
Apr 24, 2015
Chris
Apr 24, 2015
Chris
April 23, 2015
So, I was tooling around with one of the benchmarks I saw listed on twitter:

https://github.com/kostya/benchmarks/tree/master/brainfuck

This D benchmark spends most of it's time on AA lookups.   Discussions about the implementation aside, I am curious as to why the D AA is so slow even after calling AA.rehash.

I took a look at the Nim tables implementation, and it looks that they use power-of-two vectors and use the property of primitive roots modulo n to get the next value in the case of a collision:

```
proc nextTry(h, maxHash: THash): THash {.inline.} =
  result = ((5 * h) + 1) and maxHash
```

So, my questions:

1) What are the general purpose performance implications of something like this?
2) I saw a thread awhile ago where someone had posted an updated implementation of the AA with benchmarks and showed that it was substantially faster in all cases.  I can't find the thread again -- what happened with this work?

-Shammah
April 23, 2015
On 4/23/15 10:40 AM, Shammah Chancellor wrote:

> 2) I saw a thread awhile ago where someone had posted an updated
> implementation of the AA with benchmarks and showed that it was
> substantially faster in all cases.  I can't find the thread again --
> what happened with this work?

There is a PR being discussed right now. Please take a look and chime in:

https://github.com/D-Programming-Language/druntime/pull/1229

It would be interesting to know how the new AA performs in this test.

-Steve
April 24, 2015
On Thursday, 23 April 2015 at 14:40:58 UTC, Shammah Chancellor wrote:
> So, I was tooling around with one of the benchmarks I saw listed on twitter:
>
> https://github.com/kostya/benchmarks/tree/master/brainfuck
>
> This D benchmark spends most of it's time on AA lookups.   Discussions about the implementation aside, I am curious as to why the D AA is so slow even after calling AA.rehash.
>
> I took a look at the Nim tables implementation, and it looks that they use power-of-two vectors and use the property of primitive roots modulo n to get the next value in the case of a collision:
>
> ```
> proc nextTry(h, maxHash: THash): THash {.inline.} =
>   result = ((5 * h) + 1) and maxHash
> ```
>
> So, my questions:
>
> 1) What are the general purpose performance implications of something like this?
> 2) I saw a thread awhile ago where someone had posted an updated implementation of the AA with benchmarks and showed that it was substantially faster in all cases.  I can't find the thread again -- what happened with this work?
>
> -Shammah

Interesting. I noticed while doing some benchmarking that looking up data that is stored in an AA is slower than generating the data on the fly. I was really surprised.
April 24, 2015
On Friday, 24 April 2015 at 09:18:48 UTC, Chris wrote:
> On Thursday, 23 April 2015 at 14:40:58 UTC, Shammah Chancellor wrote:
>> So, I was tooling around with one of the benchmarks I saw listed on twitter:
>>
>> https://github.com/kostya/benchmarks/tree/master/brainfuck
>>
>> This D benchmark spends most of it's time on AA lookups.   Discussions about the implementation aside, I am curious as to why the D AA is so slow even after calling AA.rehash.
>>
>> I took a look at the Nim tables implementation, and it looks that they use power-of-two vectors and use the property of primitive roots modulo n to get the next value in the case of a collision:
>>
>> ```
>> proc nextTry(h, maxHash: THash): THash {.inline.} =
>>  result = ((5 * h) + 1) and maxHash
>> ```
>>
>> So, my questions:
>>
>> 1) What are the general purpose performance implications of something like this?
>> 2) I saw a thread awhile ago where someone had posted an updated implementation of the AA with benchmarks and showed that it was substantially faster in all cases.  I can't find the thread again -- what happened with this work?
>>
>> -Shammah
>
> Interesting. I noticed while doing some benchmarking that looking up data that is stored in an AA is slower than generating the data on the fly. I was really surprised.

AA lookup is (more often than not) slightly slower than (or at least as fast as) generating the data on demand:

import std.datetime : benchmark, Duration;
import std.string : format;
import std.conv : to;
import std.stdio : writeln;

enum {
  string[string] words = [
    "Data1":"Blah",
    "Data2":"Blub",
  ],
  string[] letters = ["B", "l", "a", "h"]
}

void main() {
  words.rehash();
  auto result = benchmark!(lookup, make)(100_000);
  writeln(to!Duration(result[0]));
  writeln(to!Duration(result[1]));
}

string lookup() {
  auto blah = words["Data1"];
  return blah;
}

string make() {
  string res;
  foreach (c; letters)
    res ~= c;
  return res;
}

dmd v2.067.0 -release -O -boundscheck=off -inline

April 24, 2015
On Friday, 24 April 2015 at 11:30:14 UTC, Chris wrote:
> On Friday, 24 April 2015 at 09:18:48 UTC, Chris wrote:
>> On Thursday, 23 April 2015 at 14:40:58 UTC, Shammah Chancellor wrote:
>>> So, I was tooling around with one of the benchmarks I saw listed on twitter:
>>>
>>> https://github.com/kostya/benchmarks/tree/master/brainfuck
>>>
>>> This D benchmark spends most of it's time on AA lookups.   Discussions about the implementation aside, I am curious as to why the D AA is so slow even after calling AA.rehash.
>>>
>>> I took a look at the Nim tables implementation, and it looks that they use power-of-two vectors and use the property of primitive roots modulo n to get the next value in the case of a collision:
>>>
>>> ```
>>> proc nextTry(h, maxHash: THash): THash {.inline.} =
>>> result = ((5 * h) + 1) and maxHash
>>> ```
>>>
>>> So, my questions:
>>>
>>> 1) What are the general purpose performance implications of something like this?
>>> 2) I saw a thread awhile ago where someone had posted an updated implementation of the AA with benchmarks and showed that it was substantially faster in all cases.  I can't find the thread again -- what happened with this work?
>>>
>>> -Shammah
>>
>> Interesting. I noticed while doing some benchmarking that looking up data that is stored in an AA is slower than generating the data on the fly. I was really surprised.
>
> AA lookup is (more often than not) slightly slower than (or at least as fast as) generating the data on demand:
>
> import std.datetime : benchmark, Duration;
> import std.string : format;
> import std.conv : to;
> import std.stdio : writeln;
>
> enum {
>   string[string] words = [
>     "Data1":"Blah",
>     "Data2":"Blub",
>   ],
>   string[] letters = ["B", "l", "a", "h"]
> }
>
> void main() {
>   words.rehash();
>   auto result = benchmark!(lookup, make)(100_000);
>   writeln(to!Duration(result[0]));
>   writeln(to!Duration(result[1]));
> }
>
> string lookup() {
>   auto blah = words["Data1"];
>   return blah;
> }
>
> string make() {
>   string res;
>   foreach (c; letters)
>     res ~= c;
>   return res;
> }
>
> dmd v2.067.0 -release -O -boundscheck=off -inline

Recte: I meant to say creating the data is slower here, of course. Ah, well, it's Friday!
April 25, 2015
On Thursday, 23 April 2015 at 15:29:37 UTC, Steven Schveighoffer wrote:
> On 4/23/15 10:40 AM, Shammah Chancellor wrote:
>
>> 2) I saw a thread awhile ago where someone had posted an updated
>> implementation of the AA with benchmarks and showed that it was
>> substantially faster in all cases.  I can't find the thread again --
>> what happened with this work?
>
> There is a PR being discussed right now. Please take a look and chime in:
>
> https://github.com/D-Programming-Language/druntime/pull/1229
>
> It would be interesting to know how the new AA performs in this test.
>
> -Steve

At a slight tangent: is there a way to reserve capacity for an associative array?  The obvious way does not seem to work.

I noticed also in a talk at nativecode (possibly by Herb Sutter) that C++ gives you the ability to tune almost any parameter of the associative array implementation in the standard library.  (I'm not a C++ guy).  Is this entirely spurious, or would it be helpful for D?


Laeeth.
April 25, 2015
On 04/23/2015 05:29 PM, Steven Schveighoffer wrote:
> 
> https://github.com/D-Programming-Language/druntime/pull/1229
> 
> It would be interesting to know how the new AA performs in this test.

Went from 11s down to 9s, ~20% improvement.
April 26, 2015
"Laeeth Isharc"  wrote in message news:hnihyhgtmdzhszkpnftl@forum.dlang.org...

> At a slight tangent: is there a way to reserve capacity for an associative array?  The obvious way does not seem to work.

No.

> I noticed also in a talk at nativecode (possibly by Herb Sutter) that C++ gives you the ability to tune almost any parameter of the associative array implementation in the standard library.  (I'm not a C++ guy).  Is this entirely spurious, or would it be helpful for D?

Yes, if we had an AA in phobos. 

April 26, 2015
On 04/26/2015 12:58 PM, Daniel Murphy wrote:
> 
> Yes, if we had an AA in phobos.

Yes, I think phobos should get a few more optimized containers.

SparseSet
DenseSet
SparseHash
DenseHash

They should be configurable w.r.t. load factor, equality comparison, and allocation policy (when std.allocator is available).

http://sparsehash.googlecode.com/svn/trunk/doc/index.html

As the builtin druntime AA will likely always remain a general purpose hash table we can't do all the possible optimizations.