Thread overview
ArrayBoundsError for associative array operation
Dec 05, 2006
Egor Starostin
Dec 05, 2006
xs0
Dec 05, 2006
Egor Starostin
Dec 05, 2006
xs0
Dec 05, 2006
Frits van Bommel
Dec 06, 2006
Egor Starostin
December 05, 2006
I have a testcase (simplified from real data) when access to associative array by existing key produces ArrayBoundsError.

Here it is:
***
import std.stdio;
void main() {
    struct SKey {
        int dep;
        char[] hv;
        uint toHash() {
            uint hkey;
            for (int i=0;i<hv.length;i++) {
                hkey = hkey*10 + (hv[i]-'0');
            }
            return hkey + dep;
        }
    }
    int[SKey] aa;
    SKey ck;

// populate aa (in that exact order)
    ck.dep=0; ck.hv="1236448822";
    aa[ck] = 1;
    ck.dep=0; ck.hv="2716102924";
    aa[ck] = 1;
    ck.dep=0; ck.hv="315901071";
    aa[ck] = 1;

// remove one key
    ck.dep=0; ck.hv="1236448822";
    aa.remove(ck);

// get an error ArrayBoundsError
// while trying to access by one of remaining keys
    ck.dep=0; ck.hv="2716102924";
    writefln(aa[ck]);
}
***

Why I get an ArrayBoundsError for existing key?
Is this a bug in D? If yes -- can somebody suggest a workaround?


--
Egor
December 05, 2006
Egor Starostin wrote:
> I have a testcase (simplified from real data) when access to
> associative array by existing key produces ArrayBoundsError.
> 
> Here it is:
> ***
> [snip]
> ***
> 
> Why I get an ArrayBoundsError for existing key?
> Is this a bug in D? If yes -- can somebody suggest a workaround?
> 

Most likely you forgot to implement a function or two:

http://www.digitalmars.com/d/arrays.html#associative

Structs or unions can be used as the KeyType. For this to work, the struct or union definition must define the following member functions:

    * hash_t toHash()
    * int opEquals(S) or int opEquals(S*)
    * int opCmp(S) or int opCmp(S*)

Note that the parameter to opCmp and opEquals can be either the struct or union type, or a pointer to the struct or untion type.


xs0
December 05, 2006
> Most likely you forgot to implement a function or two:

Sorry that I oversimplified posted testcase (throwed away opEquals
and opCmp).

In my working code I have the following functions in struct
definition:
***
        int opEquals(SKey* s) {
            return ((dep == s.dep) && (hv == s.hv));
        }
        int opCmp(SKey* s) {
            if (dep == s.dep) {
                return cmp(hv,s.hv);
            } else {
                return dep - s.dep;
            }
        }
***


--
Egor
December 05, 2006
Egor Starostin wrote:
>> Most likely you forgot to implement a function or two:
> 
> Sorry that I oversimplified posted testcase (throwed away opEquals
> and opCmp).
> 
> In my working code I have the following functions in struct
> definition:
> ***
>         int opEquals(SKey* s) {
>             return ((dep == s.dep) && (hv == s.hv));
>         }
>         int opCmp(SKey* s) {
>             if (dep == s.dep) {
>                 return cmp(hv,s.hv);
>             } else {
>                 return dep - s.dep;
>             }
>         }
> ***
> 
> 
> --
> Egor

Yes, I think it's a bug.. The reason it happens is the following:

1) all the nodes happen to fall into the same bucket in the AA
2) when that is the case, entries get placed into a binary tree
3) the ordering in the binary tree is determined by hashcodes first, opCmp second (in this case, opCmp can be ignored, as it isn't called at all)
4) after that node is deleted, the tree looks like this:
     root.hash        =  315901071
     root->right.hash = 2716102924
5) and, finally, the bug: in aaA.d around line 319, the following takes place:

     c = key_hash - e.hash;
     // snip
     if (c < 0)
         e = e.left;
     else
         e = e.right;

   because those two hash values are too far apart, c becomes negative (-1894765443) and the left subtree is followed instead of the right subtree, causing the key to not be found.


xs0
December 05, 2006
xs0 wrote:
> Yes, I think it's a bug.. The reason it happens is the following:
> 
> 1) all the nodes happen to fall into the same bucket in the AA
> 2) when that is the case, entries get placed into a binary tree
> 3) the ordering in the binary tree is determined by hashcodes first, opCmp second (in this case, opCmp can be ignored, as it isn't called at all)
> 4) after that node is deleted, the tree looks like this:
>      root.hash        =  315901071
>      root->right.hash = 2716102924
> 5) and, finally, the bug: in aaA.d around line 319, the following takes place:
> 
>      c = key_hash - e.hash;
>      // snip
>      if (c < 0)
>          e = e.left;
>      else
>          e = e.right;
> 
>    because those two hash values are too far apart, c becomes negative (-1894765443) and the left subtree is followed instead of the right subtree, causing the key to not be found.

Well, at least the fix is easy. Comparing the hashes directly should solve the problem.
December 06, 2006
> Yes, I think it's a bug.. The reason it happens is the following:
Thank you for the explanation.

One more question: I've found out that when I explicitly call
aa.rehash after every aa.remove() then problem disappears. Is it
really so or it's just hides (and will reappears on different data
set)?


--
Egor