Thread overview | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
April 23, 2015 AA Performance in Benchmarks | ||||
---|---|---|---|---|
| ||||
So, I was tooling around with one of the benchmarks I saw listed on twitter: https://github.com/kostya/benchmarks/tree/master/brainfuck This D benchmark spends most of it's time on AA lookups. Discussions about the implementation aside, I am curious as to why the D AA is so slow even after calling AA.rehash. I took a look at the Nim tables implementation, and it looks that they use power-of-two vectors and use the property of primitive roots modulo n to get the next value in the case of a collision: ``` proc nextTry(h, maxHash: THash): THash {.inline.} = result = ((5 * h) + 1) and maxHash ``` So, my questions: 1) What are the general purpose performance implications of something like this? 2) I saw a thread awhile ago where someone had posted an updated implementation of the AA with benchmarks and showed that it was substantially faster in all cases. I can't find the thread again -- what happened with this work? -Shammah |
April 23, 2015 Re: AA Performance in Benchmarks | ||||
---|---|---|---|---|
| ||||
Posted in reply to Shammah Chancellor | On 4/23/15 10:40 AM, Shammah Chancellor wrote: > 2) I saw a thread awhile ago where someone had posted an updated > implementation of the AA with benchmarks and showed that it was > substantially faster in all cases. I can't find the thread again -- > what happened with this work? There is a PR being discussed right now. Please take a look and chime in: https://github.com/D-Programming-Language/druntime/pull/1229 It would be interesting to know how the new AA performs in this test. -Steve |
April 24, 2015 Re: AA Performance in Benchmarks | ||||
---|---|---|---|---|
| ||||
Posted in reply to Shammah Chancellor | On Thursday, 23 April 2015 at 14:40:58 UTC, Shammah Chancellor wrote:
> So, I was tooling around with one of the benchmarks I saw listed on twitter:
>
> https://github.com/kostya/benchmarks/tree/master/brainfuck
>
> This D benchmark spends most of it's time on AA lookups. Discussions about the implementation aside, I am curious as to why the D AA is so slow even after calling AA.rehash.
>
> I took a look at the Nim tables implementation, and it looks that they use power-of-two vectors and use the property of primitive roots modulo n to get the next value in the case of a collision:
>
> ```
> proc nextTry(h, maxHash: THash): THash {.inline.} =
> result = ((5 * h) + 1) and maxHash
> ```
>
> So, my questions:
>
> 1) What are the general purpose performance implications of something like this?
> 2) I saw a thread awhile ago where someone had posted an updated implementation of the AA with benchmarks and showed that it was substantially faster in all cases. I can't find the thread again -- what happened with this work?
>
> -Shammah
Interesting. I noticed while doing some benchmarking that looking up data that is stored in an AA is slower than generating the data on the fly. I was really surprised.
|
April 24, 2015 Re: AA Performance in Benchmarks | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | On Friday, 24 April 2015 at 09:18:48 UTC, Chris wrote: > On Thursday, 23 April 2015 at 14:40:58 UTC, Shammah Chancellor wrote: >> So, I was tooling around with one of the benchmarks I saw listed on twitter: >> >> https://github.com/kostya/benchmarks/tree/master/brainfuck >> >> This D benchmark spends most of it's time on AA lookups. Discussions about the implementation aside, I am curious as to why the D AA is so slow even after calling AA.rehash. >> >> I took a look at the Nim tables implementation, and it looks that they use power-of-two vectors and use the property of primitive roots modulo n to get the next value in the case of a collision: >> >> ``` >> proc nextTry(h, maxHash: THash): THash {.inline.} = >> result = ((5 * h) + 1) and maxHash >> ``` >> >> So, my questions: >> >> 1) What are the general purpose performance implications of something like this? >> 2) I saw a thread awhile ago where someone had posted an updated implementation of the AA with benchmarks and showed that it was substantially faster in all cases. I can't find the thread again -- what happened with this work? >> >> -Shammah > > Interesting. I noticed while doing some benchmarking that looking up data that is stored in an AA is slower than generating the data on the fly. I was really surprised. AA lookup is (more often than not) slightly slower than (or at least as fast as) generating the data on demand: import std.datetime : benchmark, Duration; import std.string : format; import std.conv : to; import std.stdio : writeln; enum { string[string] words = [ "Data1":"Blah", "Data2":"Blub", ], string[] letters = ["B", "l", "a", "h"] } void main() { words.rehash(); auto result = benchmark!(lookup, make)(100_000); writeln(to!Duration(result[0])); writeln(to!Duration(result[1])); } string lookup() { auto blah = words["Data1"]; return blah; } string make() { string res; foreach (c; letters) res ~= c; return res; } dmd v2.067.0 -release -O -boundscheck=off -inline |
April 24, 2015 Re: AA Performance in Benchmarks | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | On Friday, 24 April 2015 at 11:30:14 UTC, Chris wrote:
> On Friday, 24 April 2015 at 09:18:48 UTC, Chris wrote:
>> On Thursday, 23 April 2015 at 14:40:58 UTC, Shammah Chancellor wrote:
>>> So, I was tooling around with one of the benchmarks I saw listed on twitter:
>>>
>>> https://github.com/kostya/benchmarks/tree/master/brainfuck
>>>
>>> This D benchmark spends most of it's time on AA lookups. Discussions about the implementation aside, I am curious as to why the D AA is so slow even after calling AA.rehash.
>>>
>>> I took a look at the Nim tables implementation, and it looks that they use power-of-two vectors and use the property of primitive roots modulo n to get the next value in the case of a collision:
>>>
>>> ```
>>> proc nextTry(h, maxHash: THash): THash {.inline.} =
>>> result = ((5 * h) + 1) and maxHash
>>> ```
>>>
>>> So, my questions:
>>>
>>> 1) What are the general purpose performance implications of something like this?
>>> 2) I saw a thread awhile ago where someone had posted an updated implementation of the AA with benchmarks and showed that it was substantially faster in all cases. I can't find the thread again -- what happened with this work?
>>>
>>> -Shammah
>>
>> Interesting. I noticed while doing some benchmarking that looking up data that is stored in an AA is slower than generating the data on the fly. I was really surprised.
>
> AA lookup is (more often than not) slightly slower than (or at least as fast as) generating the data on demand:
>
> import std.datetime : benchmark, Duration;
> import std.string : format;
> import std.conv : to;
> import std.stdio : writeln;
>
> enum {
> string[string] words = [
> "Data1":"Blah",
> "Data2":"Blub",
> ],
> string[] letters = ["B", "l", "a", "h"]
> }
>
> void main() {
> words.rehash();
> auto result = benchmark!(lookup, make)(100_000);
> writeln(to!Duration(result[0]));
> writeln(to!Duration(result[1]));
> }
>
> string lookup() {
> auto blah = words["Data1"];
> return blah;
> }
>
> string make() {
> string res;
> foreach (c; letters)
> res ~= c;
> return res;
> }
>
> dmd v2.067.0 -release -O -boundscheck=off -inline
Recte: I meant to say creating the data is slower here, of course. Ah, well, it's Friday!
|
April 25, 2015 Re: AA Performance in Benchmarks | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Thursday, 23 April 2015 at 15:29:37 UTC, Steven Schveighoffer wrote:
> On 4/23/15 10:40 AM, Shammah Chancellor wrote:
>
>> 2) I saw a thread awhile ago where someone had posted an updated
>> implementation of the AA with benchmarks and showed that it was
>> substantially faster in all cases. I can't find the thread again --
>> what happened with this work?
>
> There is a PR being discussed right now. Please take a look and chime in:
>
> https://github.com/D-Programming-Language/druntime/pull/1229
>
> It would be interesting to know how the new AA performs in this test.
>
> -Steve
At a slight tangent: is there a way to reserve capacity for an associative array? The obvious way does not seem to work.
I noticed also in a talk at nativecode (possibly by Herb Sutter) that C++ gives you the ability to tune almost any parameter of the associative array implementation in the standard library. (I'm not a C++ guy). Is this entirely spurious, or would it be helpful for D?
Laeeth.
|
April 25, 2015 Re: AA Performance in Benchmarks | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On 04/23/2015 05:29 PM, Steven Schveighoffer wrote:
>
> https://github.com/D-Programming-Language/druntime/pull/1229
>
> It would be interesting to know how the new AA performs in this test.
Went from 11s down to 9s, ~20% improvement.
|
April 26, 2015 Re: AA Performance in Benchmarks | ||||
---|---|---|---|---|
| ||||
Posted in reply to Laeeth Isharc | "Laeeth Isharc" wrote in message news:hnihyhgtmdzhszkpnftl@forum.dlang.org... > At a slight tangent: is there a way to reserve capacity for an associative array? The obvious way does not seem to work. No. > I noticed also in a talk at nativecode (possibly by Herb Sutter) that C++ gives you the ability to tune almost any parameter of the associative array implementation in the standard library. (I'm not a C++ guy). Is this entirely spurious, or would it be helpful for D? Yes, if we had an AA in phobos. |
April 26, 2015 Re: AA Performance in Benchmarks | ||||
---|---|---|---|---|
| ||||
Posted in reply to Daniel Murphy | On 04/26/2015 12:58 PM, Daniel Murphy wrote: > > Yes, if we had an AA in phobos. Yes, I think phobos should get a few more optimized containers. SparseSet DenseSet SparseHash DenseHash They should be configurable w.r.t. load factor, equality comparison, and allocation policy (when std.allocator is available). http://sparsehash.googlecode.com/svn/trunk/doc/index.html As the builtin druntime AA will likely always remain a general purpose hash table we can't do all the possible optimizations. |
Copyright © 1999-2021 by the D Language Foundation