November 20, 2010
I've created a sealed container implementation of RandAA for inclusion in std.container.  The code is located at:

http://www.dsource.org/projects/aa/browser/trunk/randAA/randaasealed.d

The main advantages RandAA has over builtin AAs are:

1.  Deterministic, ref counted freeing of memory, which is important for huge AAs or very memory-tight systems.

2.  Plays nicely with conservative GC.  Keys, values and hashes are in separate arrays, meaning that they don't all get GC scanned just because one needs to be GC scanned.  This also saves on alignment overhead.

3.  The implementation is based on parallel arrays, and allows for space to be reserved such that a certain number of elements can be added with a guarantee of no reallocation.  Because of this, insertions are much faster for large arrays.

4.  The linear congruential probing retains its O(1) expected lookup time as long as there are minimal collisions in full (32- or 64-bit) hash space, regardless of how many collisions there are in modulus hash space.

The main disadvantages are:

1.  The combination of parallel arrays and randomized probing wreak havoc on cache performance, meaning lookups are somewhat slower than with the builtin AA.

2.  The worst-case lookup time is unbounded, though this is more theoretical trivial than a practical issue (except maybe in high security situations where you shouldn't be using a hash table anyhow). The distribution of lookup times is geometric, meaning that requiring more than a few probing iterations is absurdly unlikely.

The two main known issues are lack of const correctness (I made no attempt and won't bother until inout works reasonably well) and lack of 64-bit support (because I haven't gotten around to searching for good 64-bit linear congruential parameters yet).

Please review the code and let me know whether it flies for std.container.

--David Simcha
January 29, 2011
I've also uploaded some documentation to http://cis.jhu.edu/~dsimcha/randaasealed.html, though interface-wise it's just a standard AA implementation, so there's not much to document.  Please comment.

On 11/20/2010 2:14 PM, David Simcha wrote:
> I've created a sealed container implementation of RandAA for inclusion in std.container.  The code is located at:
>
> http://www.dsource.org/projects/aa/browser/trunk/randAA/randaasealed.d
>
> The main advantages RandAA has over builtin AAs are:
>
> 1.  Deterministic, ref counted freeing of memory, which is important for huge AAs or very memory-tight systems.
>
> 2.  Plays nicely with conservative GC.  Keys, values and hashes are in separate arrays, meaning that they don't all get GC scanned just because one needs to be GC scanned.  This also saves on alignment overhead.
>
> 3.  The implementation is based on parallel arrays, and allows for space to be reserved such that a certain number of elements can be added with a guarantee of no reallocation.  Because of this, insertions are much faster for large arrays.
>
> 4.  The linear congruential probing retains its O(1) expected lookup time as long as there are minimal collisions in full (32- or 64-bit) hash space, regardless of how many collisions there are in modulus hash space.
>
> The main disadvantages are:
>
> 1.  The combination of parallel arrays and randomized probing wreak havoc on cache performance, meaning lookups are somewhat slower than with the builtin AA.
>
> 2.  The worst-case lookup time is unbounded, though this is more theoretical trivial than a practical issue (except maybe in high security situations where you shouldn't be using a hash table anyhow).  The distribution of lookup times is geometric, meaning that requiring more than a few probing iterations is absurdly unlikely.
>
> The two main known issues are lack of const correctness (I made no attempt and won't bother until inout works reasonably well) and lack of 64-bit support (because I haven't gotten around to searching for good 64-bit linear congruential parameters yet).
>
> Please review the code and let me know whether it flies for std.container.
>
> --David Simcha