April 07, 2015
On 4/7/15 10:24 AM, Walter Bright wrote:
> The current D associative array algorithm
>
>
> https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d
>
> uses an array of buckets with a linked list attached to the buckets to
> resolve collisions.
>
> Linked lists are perhaps the worst (i.e. slowest) data structure
> possible on modern CPUs with memory caches.
>
> A possible cache-friendly replacement would be an array of buckets with
> local probing to resolve collisions.

Arrays would need to move data. Current hashtables rely on values staying put. -- Andrei

April 07, 2015
Andrei Alexandrescu:

>> A possible cache-friendly replacement would be an array of buckets with local probing to resolve collisions.
>
> Arrays would need to move data. Current hashtables rely on values staying put. -- Andrei

The efficiency behavour of modern CPUs+memory pyramid are rather not linear and not intuitive. As Walter has said at the the start of this thread, arrays come out as more efficient in a large number of cases...

Bye,
bearophile
April 07, 2015
On Tuesday, 7 April 2015 at 22:14:46 UTC, Andrei Alexandrescu wrote:
> Arrays would need to move data. Current hashtables rely on values staying put. -- Andrei

I think it is fair to say current AA are bankrupt and need a revamping anyway.

We can make the in operator return a wrapper that cast to bool (safely) and get/update the data (systemely).
April 07, 2015
On 4/7/15 3:18 PM, bearophile wrote:
> Andrei Alexandrescu:
>
>>> A possible cache-friendly replacement would be an array of buckets
>>> with local probing to resolve collisions.
>>
>> Arrays would need to move data. Current hashtables rely on values
>> staying put. -- Andrei
>
> The efficiency behavour of modern CPUs+memory pyramid are rather not
> linear and not intuitive. As Walter has said at the the start of this
> thread, arrays come out as more efficient in a large number of cases...

You must have replied to the wrong post. -- Andrei

April 07, 2015
On 4/7/15 3:22 PM, deadalnix wrote:
> On Tuesday, 7 April 2015 at 22:14:46 UTC, Andrei Alexandrescu wrote:
>> Arrays would need to move data. Current hashtables rely on values
>> staying put. -- Andrei
>
> I think it is fair to say current AA are bankrupt and need a revamping
> anyway.

Doesn't strike me as a fair statement.

> We can make the in operator return a wrapper that cast to bool (safely)
> and get/update the data (systemely).

That's not enough. People may take the address of elements in the hashtable and assume the data stays put. This is currently safe and legal in D.


Andrei

April 07, 2015
On 4/7/15 6:27 PM, Andrei Alexandrescu wrote:
> On 4/7/15 3:22 PM, deadalnix wrote:
>> On Tuesday, 7 April 2015 at 22:14:46 UTC, Andrei Alexandrescu wrote:
>>> Arrays would need to move data. Current hashtables rely on values
>>> staying put. -- Andrei
>>
>> I think it is fair to say current AA are bankrupt and need a revamping
>> anyway.
>
> Doesn't strike me as a fair statement.
>
>> We can make the in operator return a wrapper that cast to bool (safely)
>> and get/update the data (systemely).
>
> That's not enough. People may take the address of elements in the
> hashtable and assume the data stays put. This is currently safe and
> legal in D.

The correct way forward is to implement the AA in the library in the safest way possible. Then, make the library implementation customizable for specialized situations. But it needs to be fully in the library, not partly in the compiler as it is now. Until that point, all these discussions on AA updates are useless.

-Steve
April 07, 2015
On 4/7/2015 3:14 PM, Andrei Alexandrescu wrote:
> Arrays would need to move data. Current hashtables rely on values staying put.

If the array of buckets were enhanced to include the hashes, the key/value objects could stay untouched when the AA is grown. The key/value objects would only be dereferenced if the hashes matched.
April 07, 2015
On 4/7/15 3:56 PM, Walter Bright wrote:
> On 4/7/2015 3:14 PM, Andrei Alexandrescu wrote:
>> Arrays would need to move data. Current hashtables rely on values
>> staying put.
>
> If the array of buckets were enhanced to include the hashes, the
> key/value objects could stay untouched when the AA is grown. The
> key/value objects would only be dereferenced if the hashes matched.

Not sure I understand. What if the collision array needs to grow? That might move it. -- Andrei
April 07, 2015
On 4/7/2015 3:33 PM, Steven Schveighoffer wrote:
> Until that point, all these discussions on AA updates are useless.

The data structure has changed before (used to be a binary tree) without disruption and can change again.

April 07, 2015
On 4/7/2015 3:59 PM, Andrei Alexandrescu wrote:
> On 4/7/15 3:56 PM, Walter Bright wrote:
>> On 4/7/2015 3:14 PM, Andrei Alexandrescu wrote:
>>> Arrays would need to move data. Current hashtables rely on values
>>> staying put.
>>
>> If the array of buckets were enhanced to include the hashes, the
>> key/value objects could stay untouched when the AA is grown. The
>> key/value objects would only be dereferenced if the hashes matched.
>
> Not sure I understand. What if the collision array needs to grow? That might
> move it. -- Andrei

The collision array is an array of pairs:

    KeyValue *kv;
    Hash_t hash;

growing the array moves those around, but the locations of the array elements are never exposed to the users. The addresses of kv are exposed, but those don't move.