December 22, 2021

On Tuesday, 21 December 2021 at 17:06:04 UTC, Nick Treleaven wrote:

>

Dennis is right that conceptually a key for an array is an index. An array element value is the data associated with an index.

What it is conceptually, depends on how you conceptualize it. The most common conceptualization of "in" is as a set operation and you cannot represent a set with array indexes.

December 22, 2021

On Sunday, 19 December 2021 at 19:34:18 UTC, Ola Fosheim Grøstad wrote:

>

On Sunday, 19 December 2021 at 18:36:12 UTC, Steven Schveighoffer wrote:

>

That is what we claim (and what hash algorithms in general claim). amortized O(1) lookups and inserts.

Yes, if you don't shrink.

You need to prove that the claim holds for all possible orderings of inserts, deletions and lookups.

So you need to prove that you always do O(N) operations between rehashing, so you get O(N) operations of complexity O(1) + one rehash of complexity O(N) = O(N).

To guarantee amortized O(1) deletion, it is sufficient to ensure that shrink_threshold * growth_factor < growth_threshold. And if you look at the source code for D's built-in AAs, you will see that this is, in fact, ensured:

https://github.com/dlang/druntime/blob/v2.098.1/src/rt/aaA.d#L17-L27

So, we can safely put this discussion to rest, I think.

December 29, 2021

On Wednesday, 22 December 2021 at 14:27:32 UTC, Paul Backus wrote:

>

https://github.com/dlang/druntime/blob/v2.098.1/src/rt/aaA.d#L17-L27

So, we can safely put this discussion to rest, I think.

I haven't looked at the implementation of AAs or the allocator used, so I cannot tell whether it is O(1) amortized or not. You can test it by repeatedly adding/removing items near the thresholds.

A preliminary test suggests that the observed behaviour is rather wasteful, which should be a warning for anyone who considers using it in production.

It grows by a factor of 4, that is a lot of waste. It also appears to grow on deletion:

adding
changed 0 : 0
changed 6 : 8
changed 25 : 32
changed 102 : 128
changed 409 : 512
changed 1638 : 2048
changed 6553 : 8192

removing
changed 4095 : 32768
changed 1023 : 8192
changed 255 : 2048
changed 63 : 512
changed 15 : 128
changed 3 : 32

import std.stdio;

struct AA
{
    Impl* impl;
};

struct Impl {
    void*[] buckets;
};


void main(){
    immutable N = 6554;
    int[int] a;
    ulong nn = 0;
    writeln("adding");
    for(int i = 0; i<N; i++){
        auto b = a;
        a[i] = i;
    	auto aa = cast(AA*) &a;
    	auto n = aa.impl.buckets.length;
        if (n!=nn) {
            writeln("changed ", i, " : ", nn);
            nn = n;
        }
    }

    writeln("\nremoving");
    for(int i = N-1; i>=0; i--){
        auto b = a;
        a.remove(i);
    	auto aa = cast(AA*) &a;
    	auto n = aa.impl.buckets.length;
        if (n!=nn) {
            writeln("changed ", i, " : ", nn);
            nn = n;
        }
    }
}
December 29, 2021

On Wednesday, 29 December 2021 at 10:53:13 UTC, Ola Fosheim Grøstad wrote:

>

removing
changed 4095 : 32768
changed 1023 : 8192
changed 255 : 2048
changed 63 : 512
changed 15 : 128
changed 3 : 32

To put numbers on this, when deleting a long series of keys the hash table has consistently less than 13% filled slots, and right before shrinking it has 3% filled slots (1024/32768). That is a lot of wasted memory, 97%. Hash tables should not be worse than 75% waste as a rule of thumb.

With such high constant factors O(1) notation is of little value.

December 29, 2021

On Wednesday, 29 December 2021 at 11:26:48 UTC, Ola Fosheim Grøstad wrote:

>

On Wednesday, 29 December 2021 at 10:53:13 UTC, Ola Fosheim Grøstad wrote:

>

removing
changed 4095 : 32768
changed 1023 : 8192
changed 255 : 2048
changed 63 : 512
changed 15 : 128
changed 3 : 32

To put numbers on this, when deleting a long series of keys the hash table has consistently less than 13% filled slots, and right before shrinking it has 3% filled slots (1024/32768). That is a lot of wasted memory, 97%. Hash tables should not be worse than 75% waste as a rule of thumb.

With such high constant factors O(1) notation is of little value.

Seems like an issue worth filing a bugzilla report about.

January 01, 2022

On Wednesday, 22 December 2021 at 12:43:26 UTC, Ola Fosheim Grøstad wrote:

>

What it is conceptually, depends on how you conceptualize it. The most common conceptualization of "in" is as a set operation and you cannot represent a set with array indexes.

in is defined for RedBlackTree, but it returns bool:
https://dlang.org/phobos/std_container_rbtree.html#.RedBlackTree.opBinaryRight

So aside from the complexity argument, element in arr could be defined if it returned bool, not a pointer (in order to be consistent with AAs not returning a pointer to the key). But due to time complexity it makes more sense to define index in arr.

January 02, 2022
On Saturday, 1 January 2022 at 16:40:02 UTC, Nick Treleaven wrote:
> On Wednesday, 22 December 2021 at 12:43:26 UTC, Ola Fosheim Grøstad wrote:
>> What it is conceptually, depends on how you conceptualize it. The most common conceptualization of "in" is as a set operation and you cannot represent a set with array indexes.
>
> `in` is defined for RedBlackTree, but it returns bool:
> https://dlang.org/phobos/std_container_rbtree.html#.RedBlackTree.opBinaryRight

A redblacktree is not a sequence, though; its elements are not keyed in any way.
January 02, 2022

On Wednesday, 29 December 2021 at 21:18:30 UTC, Paul Backus wrote:

>

Seems like an issue worth filing a bugzilla report about.

Probably, I don't use AAs, but I can file an issue if nobody else has already.

January 02, 2022
On Sunday, 2 January 2022 at 02:33:22 UTC, Elronnd wrote:
> A redblacktree is not a sequence, though; its elements are not keyed in any way.

Right, "in" is a set operator. You can implement a set as a dict with null-values, then you can think of the keys as values. If you think of the ADT as the set-concept then what is a key and what is a value is a matter of interpretation. If you think of a binary tree (or rbtree) as a set, then the distinction between key and value becomes blurred.
January 03, 2022

On Tuesday, 21 December 2021 at 17:00:49 UTC, Steven Schveighoffer wrote:

>

One thing not really being discussed is that there is a difference between "some library" defining slow in operators, or slow opIndex, and dlang/Phobos doing it.

D picked the path of trying to ensure complexity consistency for in, but it's more of a stylistic rule, not necessarily a requirement.

Then maybe we in should be implemented; Have it check if you have it sorted (unless you do assumesorted template). If it is sorted naturally use BinarySearch, and if not it would probably do a linear search BUT give a warning message and file/line number so it can be fixed/traced? (Or just make it have to be Binary Search and asserts out if it isn't sorted)

As for bool vs pointer return... probably return a pointer as that can easily be tested as bool for no extra cost.