Thread overview
AA dsource project
Nov 04, 2009
dsimcha
Nov 05, 2009
Moritz Warning
Nov 05, 2009
dsimcha
Nov 05, 2009
Walter Bright
Nov 05, 2009
dsimcha
Nov 05, 2009
Moritz Warning
Nov 05, 2009
Moritz Warning
Nov 05, 2009
Saaa
Nov 05, 2009
bearophile
Nov 05, 2009
dsimcha
November 04, 2009
A few weeks ago I mentioned that I was going to create some kind of forum for people to post candidate associative array implementations to replace the current, much-despised implementation as the new "builtin" AA for D.  It's now up at http://dsource.org/projects/aa/wiki/WikiStart .  SVN write access should work for all registered dsource users.  If you have an associative array implementation that you feel is a significant improvement over the current builtin and are able/willing to license it under the Boost license, please post it for comment.

Also, if you have a good benchmark for AAs, please create a benchmarks/ directory and submit it.
November 05, 2009
On Wed, 04 Nov 2009 20:53:20 +0000, dsimcha wrote:

> A few weeks ago I mentioned that I was going to create some kind of forum for people to post candidate associative array implementations to replace the current, much-despised implementation as the new "builtin" AA for D.  It's now up at http://dsource.org/projects/aa/wiki/WikiStart .  SVN write access should work for all registered dsource users.  If you have an associative array implementation that you feel is a significant improvement over the current builtin and are able/willing to license it under the Boost license, please post it for comment.
> 
> Also, if you have a good benchmark for AAs, please create a benchmarks/ directory and submit it.

I've committed a port of Pythons dictionary implementation. Here is my announcement along with some benchmark figures: http://digitalmars.com/d/archives/digitalmars/D/ Python_dictionary_inspired_hash_table_implementation_73176.html
November 05, 2009
== Quote from Moritz Warning (moritzwarning@web.de)'s article
> On Wed, 04 Nov 2009 20:53:20 +0000, dsimcha wrote:
> > A few weeks ago I mentioned that I was going to create some kind of forum for people to post candidate associative array implementations to replace the current, much-despised implementation as the new "builtin" AA for D.  It's now up at http://dsource.org/projects/aa/wiki/WikiStart .  SVN write access should work for all registered dsource users.  If you have an associative array implementation that you feel is a significant improvement over the current builtin and are able/willing to license it under the Boost license, please post it for comment.
> >
> > Also, if you have a good benchmark for AAs, please create a benchmarks/ directory and submit it.
> I've committed a port of Pythons dictionary implementation. Here is my announcement along with some benchmark figures: http://digitalmars.com/d/archives/digitalmars/D/ Python_dictionary_inspired_hash_table_implementation_73176.html

Nice job.  I ported your code to D2, (quick and dirty, doesn't work right with const and all, just enough to be able to test it), and uploaded that version for D2 users.  I had in the back of my mind that this was to be D2-centric, but I guess D1 could easily get better AAs, too, since it wouldn't affect the spec.

I couldn't find your benchmarks, but here are my results:

Builtin:
Start Benchmark.
1 x 10000000 iterations
inserts:  394601/s (25.342000s)
lookups: 9813542/s (1.019000s)

PyDict:
Start Benchmark.
1 x 10000000 iterations
inserts:  2044153/s (4.892000s)
lookups: 3401360/s (2.940000s)

This pretty much confirms my suspicion that the builtin AAs are optimized for lookup speed at the expense of insertion speed/GC overhead to an absurd degree.

Also, regarding the license for Python's dictionary, apparently the Python License
doesn't allow removing of copyright notices from stuff distributed in binary form.
 This might prove problematic if this were to be integrated into druntime.
November 05, 2009
dsimcha wrote:
> I couldn't find your benchmarks, but here are my results:
> 
> Builtin:
> Start Benchmark.
> 1 x 10000000 iterations
> inserts:  394601/s (25.342000s)
> lookups: 9813542/s (1.019000s)
> 
> PyDict:
> Start Benchmark.
> 1 x 10000000 iterations
> inserts:  2044153/s (4.892000s)
> lookups: 3401360/s (2.940000s)
> 
> This pretty much confirms my suspicion that the builtin AAs are optimized for
> lookup speed at the expense of insertion speed/GC overhead to an absurd degree.

Perhaps some combination would work. I'd be reluctant to go with a 3x slower lookup speed.

> Also, regarding the license for Python's dictionary, apparently the Python License
> doesn't allow removing of copyright notices from stuff distributed in binary form.
>  This might prove problematic if this were to be integrated into druntime.

That's kind of an insurmountable problem. The core stuff needs to be free of licensing problems.
November 05, 2009
== Quote from Walter Bright (newshound1@digitalmars.com)'s article
> dsimcha wrote:
> > I couldn't find your benchmarks, but here are my results:
> >
> > Builtin:
> > Start Benchmark.
> > 1 x 10000000 iterations
> > inserts:  394601/s (25.342000s)
> > lookups: 9813542/s (1.019000s)
> >
> > PyDict:
> > Start Benchmark.
> > 1 x 10000000 iterations
> > inserts:  2044153/s (4.892000s)
> > lookups: 3401360/s (2.940000s)
> >
> > This pretty much confirms my suspicion that the builtin AAs are optimized for lookup speed at the expense of insertion speed/GC overhead to an absurd degree.
> Perhaps some combination would work. I'd be reluctant to go with a 3x slower lookup speed.

I've also just added my RandAA candidate, which uses parallel arrays and linear congruential randomized probing.  It gives insertion speed comparable to the PyDict, with lookup speed only about 30-40% slower instead of 150-200%.  For more benchmark details, see the wiki http://dsource.org/projects/aa/wiki/WikiStart, which I've just updated.
November 05, 2009
"dsimcha" <dsimcha@yahoo.com> wrote in message news:hcspk0$pk5$1@digitalmars.com...
>A few weeks ago I mentioned that I was going to create some kind of forum for
> people to post candidate associative array implementations to replace the
> current, much-despised implementation as the new "builtin" AA for D.  It's
> now
> up at http://dsource.org/projects/aa/wiki/WikiStart .  SVN write access
> should
> work for all registered dsource users.  If you have an associative array
> implementation that you feel is a significant improvement over the current
> builtin and are able/willing to license it under the Boost license, please
> post it for comment.
>
> Also, if you have a good benchmark for AAs, please create a benchmarks/ directory and submit it.

Slightly related, I recently looked at the typecons code and was a bit
surprised that the string to enum works by checking the string against all
possible enum elements.
Couldn't a compile time perfect hashed aa fix this? :)

private template enumParserImpl(string name, bool first, T...)
{
    static if (first)
    {
        enum string enumParserImpl = "bool enumFromString(string s, ref "
            ~name~" v) {\n"
            ~enumParserImpl!(name, false, T)
            ~"return false;\n}\n";
    }
    else
    {
        static if (T.length)
            enum string enumParserImpl =
                "if (s == `"~T[0]~"`) return (v = "~name~"."~T[0]~"),
true;\n"
                ~enumParserImpl!(name, false, T[1 .. $]);
        else
            enum string enumParserImpl = "";
    }
}


November 05, 2009
On Thu, 05 Nov 2009 03:52:22 +0000, dsimcha wrote:

> == Quote from Moritz Warning (moritzwarning@web.de)'s article
>> On Wed, 04 Nov 2009 20:53:20 +0000, dsimcha wrote:
>> > A few weeks ago I mentioned that I was going to create some kind of forum for people to post candidate associative array implementations to replace the current, much-despised implementation as the new "builtin" AA for D.  It's now up at http://dsource.org/projects/aa/wiki/WikiStart .  SVN write access should work for all registered dsource users.  If you have an associative array implementation that you feel is a significant improvement over the current builtin and are able/willing to license it under the Boost license, please post it for comment.
>> >
>> > Also, if you have a good benchmark for AAs, please create a benchmarks/ directory and submit it.
>> I've committed a port of Pythons dictionary implementation. Here is my announcement along with some benchmark figures: http://digitalmars.com/d/archives/digitalmars/D/ Python_dictionary_inspired_hash_table_implementation_73176.html
> 
> Nice job.  I ported your code to D2, (quick and dirty, doesn't work right with const and all, just enough to be able to test it), and uploaded that version for D2 users.  I had in the back of my mind that this was to be D2-centric, but I guess D1 could easily get better AAs, too, since it wouldn't affect the spec.
> 
> I couldn't find your benchmarks, but here are my results:
> 
> Builtin:
> Start Benchmark.
> 1 x 10000000 iterations
> inserts:  394601/s (25.342000s)
> lookups: 9813542/s (1.019000s)
> 
> PyDict:
> Start Benchmark.
> 1 x 10000000 iterations
> inserts:  2044153/s (4.892000s)
> lookups: 3401360/s (2.940000s)
> 
> This pretty much confirms my suspicion that the builtin AAs are optimized for lookup speed at the expense of insertion speed/GC overhead to an absurd degree.
> 
> Also, regarding the license for Python's dictionary, apparently the
> Python License doesn't allow removing of copyright notices from stuff
> distributed in binary form.
>  This might prove problematic if this were to be integrated into
>  druntime.

I've committed an initial test bench (test.d).
dmd2 test.d pydict/pyDictD2.d randAA/RandAA.d -O -release

Linear test, GC enabled, 10 x 5_000_000 inserts/lookups:

RandAA(K,V,bool storeHash = shouldStoreHash!(K)):
10 x 1000000 iterations
inserts:  677460/s (1.476100s)
lookups: 3549875/s (0.281700s)

BuildIn:
10 x 1000000 iterations
inserts:  218627/s (4.574000s)
lookups: 4766444/s (0.209800s)

PyDict(K,V):
10 x 1000000 iterations
inserts:  1621271/s (0.616800s)
lookups: 3834355/s (0.260800s)
November 05, 2009
On Thu, 05 Nov 2009 15:44:49 +0000, Moritz Warning wrote:

> On Thu, 05 Nov 2009 03:52:22 +0000, dsimcha wrote:
> 
>> == Quote from Moritz Warning (moritzwarning@web.de)'s article
>>> On Wed, 04 Nov 2009 20:53:20 +0000, dsimcha wrote:
>>> > A few weeks ago I mentioned that I was going to create some kind of forum for people to post candidate associative array implementations to replace the current, much-despised implementation as the new "builtin" AA for D.  It's now up at http://dsource.org/projects/aa/wiki/WikiStart .  SVN write access should work for all registered dsource users.  If you have an associative array implementation that you feel is a significant improvement over the current builtin and are able/willing to license it under the Boost license, please post it for comment.
>>> >
>>> > Also, if you have a good benchmark for AAs, please create a benchmarks/ directory and submit it.
>>> I've committed a port of Pythons dictionary implementation. Here is my announcement along with some benchmark figures: http://digitalmars.com/d/archives/digitalmars/D/ Python_dictionary_inspired_hash_table_implementation_73176.html
>> 
>> Nice job.  I ported your code to D2, (quick and dirty, doesn't work right with const and all, just enough to be able to test it), and uploaded that version for D2 users.  I had in the back of my mind that this was to be D2-centric, but I guess D1 could easily get better AAs, too, since it wouldn't affect the spec.
>> 
>> I couldn't find your benchmarks, but here are my results:
>> 
>> Builtin:
>> Start Benchmark.
>> 1 x 10000000 iterations
>> inserts:  394601/s (25.342000s)
>> lookups: 9813542/s (1.019000s)
>> 
>> PyDict:
>> Start Benchmark.
>> 1 x 10000000 iterations
>> inserts:  2044153/s (4.892000s)
>> lookups: 3401360/s (2.940000s)
>> 
>> This pretty much confirms my suspicion that the builtin AAs are optimized for lookup speed at the expense of insertion speed/GC overhead to an absurd degree.
>> 
>> Also, regarding the license for Python's dictionary, apparently the
>> Python License doesn't allow removing of copyright notices from stuff
>> distributed in binary form.
>>  This might prove problematic if this were to be integrated into
>>  druntime.
> 
> I've committed an initial test bench (test.d). dmd2 test.d pydict/pyDictD2.d randAA/RandAA.d -O -release
> 
[..]

Could you check the test bench and see how to integrate the buildin for nicely? (isn't there some alias this that can be used?)

I wasn't able to reproduce your results.
Could you retry with the test.d please?

About the license issue.
I have spoken to some python folks about it,
but I need to search my logs first.
November 05, 2009
dsimcha:
> Also, if you have a good benchmark for AAs, please create a benchmarks/ directory and submit it.

You may try this too (C code, the "khash.h" file, plus few demos), the original code is not mine, with MIT License:
http://www.fantascienza.net/leonardo/js/khash.zip
I have found it to be quick for some of my usages.

Bye,
bearophile
November 05, 2009
== Quote from bearophile (bearophileHUGS@lycos.com)'s article
> dsimcha:
> > Also, if you have a good benchmark for AAs, please create a benchmarks/ directory and submit it.
> You may try this too (C code, the "khash.h" file, plus few demos), the original
code is not mine, with MIT License:
> http://www.fantascienza.net/leonardo/js/khash.zip
> I have found it to be quick for some of my usages.
> Bye,
> bearophile

For the purposes of this project, I only accept D code.  If you want to submit it, port it to D.  Also, if you want to submit it, please submit it so we can browse the source code online instead of linking to a zip file that we then have to unzip, etc.