2 days ago
  1. No function to create with desired capability
    The initial size is default to 8 and increased of power of 2. There are two problems with this scheme. To populate to size of few thousands, there are too many memory relocations and rebuild the list. The other problem is wasted of unused slots; example a list of 2_100, the list will have 4_096 of slots

  2. Cascade of collision
    If hash causes a collision, it will use the subsequent slot as an overflow. However, if a next item hash points to that subsequent slot and it is already occupied by that overflow item, hence the cascade collision

  3. Duplicate hash calculation
    For a key that fits into register, size_t, there is no hash calculation but when it is added into the list, it will apply final MurmurHash2; for a key size bigger than size_t, it will do hash calculation with applied final MurmurHah2 and later added to the list, it will apply MurmurHash2 again. There are two problems with this scheme, the first one is obvious un-needed calculation; the other is if an integer key value fitted into list length/capability, applied MurmurHash2 potentially causes hash collision

Some suggestions

  1. Use prime number length/capability to avoid wasted of unused slots
  2. Add a way to initialize list with length/capability
  3. A way to avoid final MurmurHash2 or get rid of it completely and add function/delegate to customized hash calculation
  4. Use different scheme to handle collision such as plain array to hold key-value pair and hash list with integer index pointed to key-value pair. This scheme has two advantages; key-value pair will have integer index for collision and associated list will have order of items added when do iteration (possible use in json...); The dis-advantage is indirect access and it will take more work when doing deletion. However, deletion is rarely used and not a big deal for this use case

Happy codings

2 days ago

On Sunday, 30 March 2025 at 16:30:34 UTC, An Pham wrote:

>
  1. No function to create with desired capability
    The initial size is default to 8 and increased of power of 2. There are two problems with this scheme. To populate to size of few thousands, there are too many memory relocations and rebuild the list. The other problem is wasted of unused slots; example a list of 2_100, the list will have 4_096 of slots

The memory allocation of the bucket list is not significant, neither is having the bucket list be larger than needed. And the growth factor is actually 4.

For an AA, you want to have a load factor that is not 1, or the hash lookup creeps towards linear complexity.

>
  1. Cascade of collision
    If hash causes a collision, it will use the subsequent slot as an overflow. However, if a next item hash points to that subsequent slot and it is already occupied by that overflow item, hence the cascade collision

This is not the case. The algorithm to find an appropriate slot is not linear in this fashion. Please try reading the code again.

>
  1. Duplicate hash calculation
    For a key that fits into register, size_t, there is no hash calculation but when it is added into the list, it will apply final MurmurHash2; for a key size bigger than size_t, it will do hash calculation with applied final MurmurHah2 and later added to the list, it will apply MurmurHash2 again. There are two problems with this scheme, the first one is obvious un-needed calculation; the other is if an integer key value fitted into list length/capability, applied MurmurHash2 potentially causes hash collision

The hash function is used once, and is calculated by the TypeInfo. If you want a different hash algorithm, you can create a type and define toHash.

>

Some suggestions

Please implement your suggestions, test the performance, and get back with numbers. It's often quite surprising what makes the difference in performance! So suggesting changes without testing isn't fruitful.

-Steve