Thread overview
associative arrays ate my lunch (and my support)
Jul 26, 2004
Kris
Jul 27, 2004
Derek Parnell
Jul 27, 2004
Michael Jørgensen
Jul 27, 2004
Florian
Jul 27, 2004
Arcane Jill
Jul 27, 2004
Stewart Gordon
Jul 28, 2004
James McComb
July 26, 2004
So, I've been happily thinking this whole time that associative arrays (AA) in D are rather cool; being built into the language and so on ... then today I spent several hours tracking down what seemed to be a truly bizarre bug, which turned out to be caused by the way D associative arrays operate. Here's what happened:

I want to check if there's an entry in the AA; use it if so, otherwise continue. The documentation say to use this construct for checking:

class X
{
     MyClass [char[]] aa;

    void test ()
    {
         if ("hello" in aa)
             ....
     }
}

and then (presumably) you have to subsequently retrieve the entry as
follows:

    void test ()
    {
         if ("hello" in aa)
            {
            MyClass x = aa ["hello"];
            // do something with x ...
            }
     }

Of course, you've just performed two full lookups instead of one: first you look to see if the entry exists, then you lookup again to retrieve it. Being the freak that I am, I don't want that overhead. So ... I proceeded with this instead:

    void test ()
    {
         MyClass x = aa ["hello"];

         if (x)
            {
            // do something with x ...
            }
     }

Much better! Right?

Well, apparently not. Now clearly I did not assign anything to the AA in the above code, but an entry is created regardless. That is, AA entries are created as a /side-effect/ of doing that lookup. Now, I've used many hash-table implementations before, but this behavior is truly a first. Hands-up please, all those who would have expected this ...

This sneaky behavior really wasted a whole lot of effort (I can almost hear Walter laughing in the background :-)

That aside, one is forced to do a double lookup in such cases. This is not good at all, and I simply cannot fathom what benefit there is to such an approach. In my particular application, the double-lookup (and, of course, double hash-computation) is entirely unacceptable. Lookup is the most common operation on a hash-table, by far. It has to be efficient, and this "side-effect" totally flies in the face of reason. If it were a Phobos module it would have been replaced long ago.

For the moment I have to assume the behavior is "by design", so fair-warning to everyone.

(Ben? Do you have a proper hash-table I can use please? If not, I'll do an FNV port; AA's be damned. Out! Damn spot! Out, I say!)




July 27, 2004
On Mon, 26 Jul 2004 14:48:05 -0700, Kris wrote:

> class X
> {
>      MyClass [char[]] aa;

It does seem a tad inefficient. Here is some simple code to demonstrate the
effect...
<code>
import std.stdio;

void main()
{
    long[char[]] aa;
    long a = 9;
    long b = 9;

    a = aa["hello"];
    aa["hello"] = 2;
    b = aa["hello"];
    writef("%d %d\n", a,b);
}
</code>

We sort of need a conditional fetch mechanism that maybe throws an exception if the key is not in the AA.

   try { a = aa[?"hello"]; } catch (AA_KeyMissing) { . . .};

-- 
Derek
Melbourne, Australia
27/Jul/04 10:57:11 AM
July 27, 2004
"Kris" <someidiot@earthlink.dot.dot.dot.net> wrote in message news:ce3u1t$1h7i$1@digitaldaemon.com...

>     void test ()
>     {
>          MyClass x = aa ["hello"];
>
>          if (x)
>             {
>             // do something with x ...
>             }
>      }
>
> Much better! Right?
>
> Well, apparently not. Now clearly I did not assign anything to the AA in
the
> above code, but an entry is created regardless. That is, AA entries are created as a /side-effect/ of doing that lookup. Now, I've used many hash-table implementations before, but this behavior is truly a first. Hands-up please, all those who would have expected this ...

Well, I don't know about D [Just arrived here], but the same behaviour is present in C++: The STL map<> construct provides the same functionality (side-effect).

However, in the STL there is also a find() method that returns an interator
(a pointer) to the element - provided it exists - or to a fictive
end-of-array:

MyClass  *xx = aa.find("hello");
if (xx != aa.end())
{
// do something with *xx.
}

Presumably, something similar is possible in D.

-Michael.


July 27, 2004
In article <ce3u1t$1h7i$1@digitaldaemon.com>, Kris says...

>Now, I've used many
>hash-table implementations before, but this behavior is truly a first.

No, C++'s std::map got there first.


>Hands-up please, all those who would have expected this ...

Me. Because I'm used to std::map.

Jill


July 27, 2004
>>     void test ()
>>     {
>>          MyClass x = aa ["hello"];
>>
>>          if (x)
>>             {
>>             // do something with x ...
>>             }
>>      }
>>
>> Much better! Right?
>>
>> Well, apparently not. Now clearly I did not assign anything to the AA in
>the
>> above code, but an entry is created regardless. That is, AA entries are created as a /side-effect/ of doing that lookup. Now, I've used many hash-table implementations before, but this behavior is truly a first. Hands-up please, all those who would have expected this ...
>
>Well, I don't know about D [Just arrived here], but the same behaviour is present in C++: The STL map<> construct provides the same functionality (side-effect).
>
>However, in the STL there is also a find() method that returns an interator
>(a pointer) to the element - provided it exists - or to a fictive
>end-of-array:

You may also note that in c++ stl (at least microsoft's implementation of it), the [] opperator on the hash_map does not only have this side effect, but also is not doing the single look-up it seems to do, but actualy a double lookup in some cases : There is always one lookup to find the element, but if it is not there, a second lookup starts to find where it should be inserted.

I don't know if D works the same way, but it wouldn't be surprising to me.


July 27, 2004
Kris wrote:

> So, I've been happily thinking this whole time that associative arrays (AA)
> in D are rather cool; being built into the language and so on ... then today
> I spent several hours tracking down what seemed to be a truly bizarre bug,

Which bug?  The one that looking up a value adds it, or that your program is slow?

<snip>
> and then (presumably) you have to subsequently retrieve the entry as
> follows:
> 
>     void test ()
>     {
>          if ("hello" in aa)
>             {
>             MyClass x = aa ["hello"];
>             // do something with x ...
>             }
>      }
<snip>
> That aside, one is forced to do a double lookup in such cases. This is not
> good at all, and I simply cannot fathom what benefit there is to such an
> approach.

The AA implementation could be improved to cache lookups.  A simple idea would be to keep a note of the last key to be looked up, its hash and the memory address (if there) of the value.  The next lookup (or even assignment) would check the key against the one in the cache (could get away with === rather than == for efficiency) and use the cached information to repeat the lookup.

Even better, the double lookup itself could be optimised to a single lookup.

<snip>
> For the moment I have to assume the behavior is "by design", so fair-warning
> to everyone.

I think the idea is that you can nest AAs to make a sparse matrix or something....

    int[int][int] qwert;
    qwert[13][105] = 42;

> (Ben? Do you have a proper hash-table I can use please? If not, I'll do an
> FNV port; AA's be damned. Out! Damn spot! Out, I say!)

Depends on what you mean by 'proper'.  But mine happens to implement lookup without adding....

http://smjg.port5.com/pr/d/

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment.  Please keep replies on the 'group where everyone may benefit.
July 28, 2004
Kris wrote:

> Well, apparently not. Now clearly I did not assign anything to the AA in the
> above code, but an entry is created regardless. That is, AA entries are
> created as a /side-effect/ of doing that lookup.

Read operations with side-effects raise issues of exception safety.

For example,

x = aa["hello"]

may throw an exception if creates the "hello" entry. Then the value of aa.length would not be predictable: It could equal the old .length or .length + 1 depending upon whether the exception was thrown before of after the .length was incremented.

I know that std::map also behaves this way. Can anyone confirm whether std::map's [] operator suffers from the same issue?

James McComb