Thread overview
Pre-expanding alloc cell(s) / reserving space for an associative array
Jul 09, 2023
Cecil Ward
Jul 10, 2023
IchorDev
Jul 10, 2023
H. S. Teoh
July 09, 2023
Before I posted a question about avoiding unnecessary allocs/reallocs when adding entries to an array like so
    uint[ dstring ]  arr;

when I build it up from nothing with successive insertions.

The array is accessed by a key that is a dstring. I was told that I can’t use .reserve or the like on it? Is that correct? My memory fails me, powerful pain drugs.

Is there an alternate method I could use ? To be honest, the number of entries is likely to be extremely modest, so this is not a huge performance issue, six entries would be considered quite a lot. The string keys are probably not very very long. But I’m learning good habits for the day when I might have a bigger such database.
July 09, 2023

On 7/9/23 4:24 PM, Cecil Ward wrote:

>

Before I posted a question about avoiding unnecessary allocs/reallocs when adding entries to an array like so
    uint[ dstring ]  arr;

when I build it up from nothing with successive insertions.

The array is accessed by a key that is a dstring. I was told that I can’t use .reserve or the like on it? Is that correct? My memory fails me, powerful pain drugs.

Is there an alternate method I could use ? To be honest, the number of entries is likely to be extremely modest, so this is not a huge performance issue, six entries would be considered quite a lot. The string keys are probably not very very long. But I’m learning good habits for the day when I might have a bigger such database.

No there is no such .reserve call for associative arrays.

It might be possible to implement, but it's quite a bit different from a normal array reserve -- an associative array allocates all its elements as individual memory blocks, so there is no place to reserve things. You would have to add e.g. a free-list, that is reserved from the allocator itself.

-Steve

July 10, 2023

On Sunday, 9 July 2023 at 20:24:24 UTC, Cecil Ward wrote:

>

Before I posted a question about avoiding unnecessary allocs/reallocs when adding entries to an array like so
uint[ dstring ] arr;

when I build it up from nothing with successive insertions.

The array is accessed by a key that is a dstring. I was told that I can’t use .reserve or the like on it? Is that correct? My memory fails me, powerful pain drugs.

Is there an alternate method I could use ? To be honest, the number of entries is likely to be extremely modest, so this is not a huge performance issue, six entries would be considered quite a lot. The string keys are probably not very very long. But I’m learning good habits for the day when I might have a bigger such database.

From the spec it sounds as though (but good luck testing for sure) that if you have (for example) 6 big dummy key-value pairs in the AA to begin with, then if you use .clear it "Removes all remaining keys and values from [the] associative array. The array is not rehashed after removal, to allow for the existing storage to be reused." (source) I'm not 100% sure this would work how you want, and it's such a hack, but it's something.
D has a few user-made hash-map libraries, so it might be worth investigating those to see if they offer a pre-allocation method, too.

July 10, 2023
On Mon, Jul 10, 2023 at 09:30:57AM +0000, IchorDev via Digitalmars-d-learn wrote: [...]
> From the spec it sounds as though (but good luck testing for sure) that if you have (for example) 6 big dummy key-value pairs in the AA to begin with, then if you use `.clear` it "Removes all remaining keys and values from [the] associative array. The array is not rehashed after removal, __to allow for the existing storage to be reused.__"
[...]

This is not an accurate understanding of what actually happens.  The AA implementation consists of a primary hashtable (an array), each slot of which points to a list of buckets. Clearing the AA does not discard the hashtable, but does dispose of the buckets, so adding new keys afterwards will allocate new buckets.  So the buckets used by the dummy key-value pairs do not get reused without a reallocation.


T

-- 
This is a tpyo.
July 10, 2023

On 7/10/23 8:44 AM, H. S. Teoh wrote:

>

On Mon, Jul 10, 2023 at 09:30:57AM +0000, IchorDev via Digitalmars-d-learn wrote:
[...]

>

From the spec it sounds as though (but good luck testing for sure)
that if you have (for example) 6 big dummy key-value pairs in the AA
to begin with, then if you use .clear it "Removes all remaining keys
and values from [the] associative array. The array is not rehashed
after removal, to allow for the existing storage to be reused."
[...]

This is not an accurate understanding of what actually happens. The AA
implementation consists of a primary hashtable (an array), each slot of
which points to a list of buckets. Clearing the AA does not discard the
hashtable, but does dispose of the buckets, so adding new keys
afterwards will allocate new buckets. So the buckets used by the dummy
key-value pairs do not get reused without a reallocation.

This used to be the case, but in the latest implementation, the AA uses
open addressing. That is, if a slot is full, it linearly searches the table for the next open slot. This cuts down significantly on cache misses, since the hash slot itself stores the hash, so the pointer only needs to be followed if the hash matches.

Fun historical fact, the original AA used a tree to store colliding elements, which means that all AA keys had to support both opEquals and opCmp.

BUT, you are right that the elements themselves are individual blocks of memory, and only the bucket array is reused.

If you want to see how the hash table works, without having to deal with all the typeinfo acrobatics that the actual builtin AA uses, you can look at my newaa library:

https://github.com/schveiguy/newaa

This is the exact implementation of the AA internals, but uses compile-time introspection instead of typeinfo.

-Steve