October 07, 2011
During profiling I found that the StringTable lookups took always at least
as much time as the whole scanning(lexing).
Upon closer inspection the StringTable showed some serious flaws.

  - The use of a fixed number (37) of BTree buckets results in averaged
log(N/37) strcmp calls
    for every lookup. N is easily in the order of a few thousand entries.

  - The string table calculates a mildly costly hash and almost dismisses
the result.

    Simplified discourse:
      StringTable: hash = calcHash(str, len);

      Hash: hash_t hash = 0;
            for (size_t i = 0; i < len; i+=4)
               hash = hash * 37 + *(uint32_t*)input[i];

      StringTable: idx = hash % tabledim;

      tabledim: I'm always 37.

    Which sums down to ((((n0 * 37) + n1) * 37 + n2) * 37 + n3) % 37 == n3.

  - The Dchar::calcHash function is not so good (see Hash Alternatives).

    It has also a few bugs:
      - The windows DMC inline assembly version multiplies by 9 not by 37.
      - Nobody defines LITTLE_ENDIAN so each int is reassembled out of
bytes.
      - The M_UNICODE calcHash adds first and then multiplies by 37.
      - Using a for loop with an inner switch is a branch prediction killer.

    It remains a little unclear what parts of the dchar.c file are actively
used by different DMC/DMD versions.


A much better solution to reduce a hash to a bucket index is using only
the N lowest bits(hash & 2^N-1) instead of taking the modulo with a prime.
This requires a hash that does decent bit avalanching (see Hash
Alternatives)
but provides two advantages.
  - One can used the well working double size/half size pattern.
  - Doubling the size can be done with a simple partition of buckets.
    Reducing consists only of merging two buckets.

With small value types and load factors it might be better to use arrays instead of singly linked lists.


I consider to merge the root/AA with the backend/AArray interface and
refine the implementation,
probably sharing improvements with druntime.
Are there any reasons (licensing?) to have two and a half AAs in dmd and
another in druntime?


Hash Alternatives:

    The Hsieh hash used in druntime isn't too good either but provides
    reasonably good hashes with fast performance for small keys.
    For bigger keys Google's City Hash is exceptionally fast.
    MIT Licensed at http://code.google.com/p/cityhash/.
    Some speed comparisons recored with SMHasher on a Core i5-2400S
processor can be found here http://codepad.org/kCVQ8eoq.
    I won't show the collisions etc. for the Dchar hash but it FAILS in
about every category.