Jump to page: 1 24  
Page
Thread overview
associative-arrays ate my lunch (and my support)
Jul 26, 2004
Kris
Jul 26, 2004
Sean Kelly
Jul 26, 2004
Regan Heath
Jul 26, 2004
Ben Hinkle
Jul 26, 2004
Sean Kelly
Jul 27, 2004
Kris
Jul 27, 2004
Ben Hinkle
Jul 27, 2004
Ben Hinkle
Jul 27, 2004
Kris
Re: associative-arrays ate my lunch (and my shorts)
Jul 27, 2004
Kris
Jul 28, 2004
Ben Hinkle
Jul 28, 2004
Regan Heath
Jul 28, 2004
Kris
Jul 28, 2004
Regan Heath
Jul 28, 2004
Kris
Jul 28, 2004
Regan Heath
Jul 28, 2004
Regan Heath
Jul 28, 2004
Kris
Jul 29, 2004
Ben Hinkle
Jul 30, 2004
Regan Heath
Jul 28, 2004
Ben Hinkle
Jul 28, 2004
Regan Heath
Jul 28, 2004
Kris
Jul 28, 2004
Sam McCall
Jul 28, 2004
Kris
Jul 28, 2004
Regan Heath
Jul 29, 2004
Gold Dragon
Jul 30, 2004
Regan Heath
Jul 30, 2004
Gold Dragon
Jul 29, 2004
Juanjo Álvarez
Jul 29, 2004
Matthew
Jul 30, 2004
Gold Dragon
Jul 30, 2004
Matthew
Jul 30, 2004
Gold Dragon
Jul 30, 2004
Matthew
July 26, 2004
So, I've been happily thinking this whole time that associative arrays (AA) in D are rather cool; being built into the language and so on ... then today I spent several hours tracking down what seemed to be a truly bizarre bug, which turned out to be caused by the way D associative arrays operate. Here's what happened:

I want to check if there's an entry in the AA; use it if so, otherwise continue. The documentation say to use this construct for checking:

class X
{
     MyClass [char[]] aa;

    void test ()
    {
         if ("hello" in aa)
             ....
     }
}

and then (presumably) you have to subsequently retrieve the entry as
follows:

    void test ()
    {
         if ("hello" in aa)
            {
            MyClass x = aa ["hello"];
            // do something with x ...
            }
     }

Of course, you've just performed two full lookups instead of one: first you look to see if the entry exists, then you lookup again to retrieve it. Being the freak that I am, I don't want that overhead. So ... I proceeded with this instead:

    void test ()
    {
         MyClass x = aa ["hello"];

         if (x)
            {
            // do something with x ...
            }
     }

Much better! Right?

Well, apparently not. Now clearly I did not assign anything to the AA in the above code, but an entry is created regardless. That is, AA entries are created as a /side-effect/ of doing that lookup. Now, I've used many hash-table implementations before, but this behavior is truly a first. Hands-up please, all those who would have expected this ...

This sneaky behavior really wasted a whole lot of effort (I can almost hear Walter laughing in the background :-)

That aside, one is forced to do a double lookup in such cases. This is not good at all, and I simply cannot fathom what benefit there is to such an approach. In my particular application, the double-lookup (and, of course, double hash-computation) is entirely unacceptable. Lookup is the most common operation on a hash-table, by far. It has to be efficient, and this "side-effect" totally flies in the face of reason. If it were a Phobos module it would have been replaced long ago.

For the moment I have to assume the behavior is "by design", so fair-warning to everyone.

(Ben? Do you have a proper hash-table I can use please? If not, I'll do an FNV port; AA's be damned. Out! Damn spot! Out, I say!)



July 26, 2004
In article <ce3tul$1h6o$1@digitaldaemon.com>, Kris says...
>
>Well, apparently not. Now clearly I did not assign anything to the AA in the above code, but an entry is created regardless. That is, AA entries are created as a /side-effect/ of doing that lookup. Now, I've used many hash-table implementations before, but this behavior is truly a first. Hands-up please, all those who would have expected this ...

Not I.  Though I'd still want to look at the D code that's doing this because I still don't quite believe it.  What if MyClass is not default constructible?

>That aside, one is forced to do a double lookup in such cases. This is not good at all, and I simply cannot fathom what benefit there is to such an approach. In my particular application, the double-lookup (and, of course, double hash-computation) is entirely unacceptable. Lookup is the most common operation on a hash-table, by far. It has to be efficient, and this "side-effect" totally flies in the face of reason. If it were a Phobos module it would have been replaced long ago.

So how would you propose a built in associative array handle both class and primitive types?  Obviously "in" can return a bit.  But what does the subscript operator return for failed lookups on primitive types?  I would likely return Type.init in all cases.  Besides, this is a hashtable we're talking about, not a btree.  A double lookup might not be ideal but it's still far less expensive than a double tree traversal.


Sean


July 26, 2004
On Mon, 26 Jul 2004 22:44:23 +0000 (UTC), Sean Kelly <sean@f4.ca> wrote:

> In article <ce3tul$1h6o$1@digitaldaemon.com>, Kris says...
>>
>> Well, apparently not. Now clearly I did not assign anything to the AA in the
>> above code, but an entry is created regardless. That is, AA entries are
>> created as a /side-effect/ of doing that lookup. Now, I've used many
>> hash-table implementations before, but this behavior is truly a first.
>> Hands-up please, all those who would have expected this ...
>
> Not I.  Though I'd still want to look at the D code that's doing this because I
> still don't quite believe it.  What if MyClass is not default constructible?
>
>> That aside, one is forced to do a double lookup in such cases. This is not
>> good at all, and I simply cannot fathom what benefit there is to such an
>> approach. In my particular application, the double-lookup (and, of course,
>> double hash-computation) is entirely unacceptable. Lookup is the most common
>> operation on a hash-table, by far. It has to be efficient, and this
>> "side-effect" totally flies in the face of reason. If it were a Phobos
>> module it would have been replaced long ago.
>
> So how would you propose a built in associative array handle both class and
> primitive types?  Obviously "in" can return a bit.  But what does the subscript
> operator return for failed lookups on primitive types?  I would likely return
> Type.init in all cases.  Besides, this is a hashtable we're talking about, not a
> btree.  A double lookup might not be ideal but it's still far less expensive
> than a double tree traversal.

The problem is, reference types like pointers, arrays, and classes can be null, value types like int, float, structs etc cannot, yet you want some what of saying "does not exist" for value types.

One solution is to wrap them in a reference type, i.e. a class, but this IMO is un-necessary baggage, instead you want to be able to call a method on the AA, something like:

  int[char[]] foo;
  int value;

  if (foo.find("label",value)) {
  }

where find looks something like

bool find(K key, out V value) {
  ..lookup key, assign to value..
  ..return true on success..
  ..else false..
}

This allows you to check whether it exists (the return value) and get it's value (inout value) all at once for both reference and value types.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
July 26, 2004
Sean Kelly wrote:

> In article <ce3tul$1h6o$1@digitaldaemon.com>, Kris says...
>>
>>Well, apparently not. Now clearly I did not assign anything to the AA in the above code, but an entry is created regardless. That is, AA entries are created as a /side-effect/ of doing that lookup. Now, I've used many hash-table implementations before, but this behavior is truly a first. Hands-up please, all those who would have expected this ...
> 
> Not I.  Though I'd still want to look at the D code that's doing this
> because I
> still don't quite believe it.  What if MyClass is not default
> constructible?

The value put into the array is null (more generally the Type.init value) so no constructor needs to be called.

>>That aside, one is forced to do a double lookup in such cases. This is not good at all, and I simply cannot fathom what benefit there is to such an approach. In my particular application, the double-lookup (and, of course, double hash-computation) is entirely unacceptable. Lookup is the most common operation on a hash-table, by far. It has to be efficient, and this "side-effect" totally flies in the face of reason. If it were a Phobos module it would have been replaced long ago.
> 
> So how would you propose a built in associative array handle both class
> and
> primitive types?  Obviously "in" can return a bit.  But what does the
> subscript
> operator return for failed lookups on primitive types?  I would likely
> return
> Type.init in all cases.  Besides, this is a hashtable we're talking about,
> not a
> btree.  A double lookup might not be ideal but it's still far less
> expensive than a double tree traversal.

The mintl.map implementation has a private function that performs a lookup and if it fails returns in an inout parameter the parent node for the add call. This is tough to do without exposing some datatype internals so it isn't public.

> 
> Sean

July 26, 2004
In article <ce43vb$1j7m$1@digitaldaemon.com>, Ben Hinkle says...
>
>Sean Kelly wrote:
>
>> In article <ce3tul$1h6o$1@digitaldaemon.com>, Kris says...
>>>
>>>Well, apparently not. Now clearly I did not assign anything to the AA in the above code, but an entry is created regardless. That is, AA entries are created as a /side-effect/ of doing that lookup. Now, I've used many hash-table implementations before, but this behavior is truly a first. Hands-up please, all those who would have expected this ...
>> 
>> Not I.  Though I'd still want to look at the D code that's doing this
>> because I
>> still don't quite believe it.  What if MyClass is not default
>> constructible?
>
>The value put into the array is null (more generally the Type.init value) so no constructor needs to be called.

Oops.  I misread Kris' post.  Yes, an entry will be created as a side effect but a lookup should return null for class value types.


Sean


July 27, 2004
"Ben Hinkle"  wrote ...
> The value put into the array is null (more generally the Type.init value)
so

Why does it add an entry when used as an rValue? Any ideas Ben?

- Kris


July 27, 2004
Kris wrote:

> "Ben Hinkle"  wrote ...
>> The value put into the array is null (more generally the Type.init value)
> so
> 
> Why does it add an entry when used as an rValue? Any ideas Ben?
> 
> - Kris

The only reason I can think of is that it makes the code for AA's a little easier since the routine that gets a node (_aaGet in src/phobos/internal/aaA.d) returns a pointer to the value slot and so if the context is rvalue it assigns through the pointer and if it is lvalue it dereferences the pointer. Having the get routine know about rvalue vs lvalue would probably mean the context has to be passed to it or there would be two versions of get: _aaGetRValue and _aaGetLValue.

-Ben
July 27, 2004
Kris wrote:

> So, I've been happily thinking this whole time that associative arrays (AA) in D are rather cool; being built into the language and so on ... then today I spent several hours tracking down what seemed to be a truly bizarre bug, which turned out to be caused by the way D associative arrays operate. Here's what happened:
> 
> I want to check if there's an entry in the AA; use it if so, otherwise continue. The documentation say to use this construct for checking:
> 
> class X
> {
>      MyClass [char[]] aa;
> 
>     void test ()
>     {
>          if ("hello" in aa)
>              ....
>      }
> }
> 
> and then (presumably) you have to subsequently retrieve the entry as
> follows:
> 
>     void test ()
>     {
>          if ("hello" in aa)
>             {
>             MyClass x = aa ["hello"];
>             // do something with x ...
>             }
>      }
> 
> Of course, you've just performed two full lookups instead of one: first you look to see if the entry exists, then you lookup again to retrieve it. Being the freak that I am, I don't want that overhead. So ... I proceeded with this instead:
> 
>     void test ()
>     {
>          MyClass x = aa ["hello"];
> 
>          if (x)
>             {
>             // do something with x ...
>             }
>      }
> 
> Much better! Right?
> 
> Well, apparently not. Now clearly I did not assign anything to the AA in the above code, but an entry is created regardless. That is, AA entries are created as a /side-effect/ of doing that lookup. Now, I've used many hash-table implementations before, but this behavior is truly a first. Hands-up please, all those who would have expected this ...

I don't know about hash tables, but for STL maps I think the semantics are the same. The operator[] will insert if the key isn't there. For STL this makes sense since it must return a reference to the value. Since D returns the value without a reference it doesn't have to insert it into the table. So it does seem surprising that it also inserts. I can live with either way, though, since the value returned is Type.init.

> This sneaky behavior really wasted a whole lot of effort (I can almost hear Walter laughing in the background :-)
> 
> That aside, one is forced to do a double lookup in such cases. This is not good at all, and I simply cannot fathom what benefit there is to such an approach. In my particular application, the double-lookup (and, of course, double hash-computation) is entirely unacceptable. Lookup is the most common operation on a hash-table, by far. It has to be efficient, and this "side-effect" totally flies in the face of reason. If it were a Phobos module it would have been replaced long ago.

Hash tables are O(1) at least. I tried many variations on D's AA data structure to try to speed it up across the board and couldn't get anything that performed as well for a range of applications. Things I tried mostly were around preallocating and changing the string hash function. There are undoubtably faster implementations for specific needs but all around I'd say the AA in D is as good a work-horse as one can expect.

> For the moment I have to assume the behavior is "by design", so fair-warning to everyone.

It is subtle if you are expecting something else.

> (Ben? Do you have a proper hash-table I can use please? If not, I'll do an FNV port; AA's be damned. Out! Damn spot! Out, I say!)

I don't have another hash-table in mind. What is FNV? "frickin no vay"? ;-)
July 27, 2004
"Ben Hinkle" <bhinkle4@juno.com> wrote in message news:ce4gf5$1nbl$1@digitaldaemon.com...
> Hash tables are O(1) at least. I tried many variations on D's AA data structure to try to speed it up across the board and couldn't get anything that performed as well for a range of applications. Things I tried mostly were around preallocating and changing the string hash function. There are undoubtably faster implementations for specific needs but all around I'd say the AA in D is as good a work-horse as one can expect.

Yes, I followed your post on that with interest. The problem I have is the 2*O(1) aspect. If I'm using hash strings of any reasonable size, that adds up pretty quickly. In this case, the keys can be long.

> > (Ben? Do you have a proper hash-table I can use please? If not, I'll do
an
> > FNV port; AA's be damned. Out! Damn spot! Out, I say!)
>
> I don't have another hash-table in mind. What is FNV? "frickin no vay"?
;-)

It's worse than that <g>  http://www.isthe.com/chongo/tech/comp/fnv/

Do you think Walter would add my port of this to Phobos?
(chortle! chortle!)


July 27, 2004
"Ben Hinkle" wrote ...
> Kris wrote:
> > Well, apparently not. Now clearly I did not assign anything to the AA in the above code, but an entry is created regardless. That is, AA entries are created as a /side-effect/ of doing that lookup. Now, I've used many hash-table implementations before, but this behavior is truly a first. Hands-up please, all those who would have expected this ...
>
> > That aside, one is forced to do a double lookup in such cases. This is
not
> > good at all, and I simply cannot fathom what benefit there is to such an approach. In my particular application, the double-lookup (and, of
course,
> > double hash-computation) is entirely unacceptable. Lookup is the most common operation on a hash-table, by far. It has to be efficient, and
this
> > "side-effect" totally flies in the face of reason. If it were a Phobos module it would have been replaced long ago.
> >
> > For the moment I have to assume the behavior is "by design", so fair-warning to everyone.
>
> It is subtle if you are expecting something else.

I'd like to just ignore this and go on, but the semantics here are just too badly broken to serve as an example of what containers should do. Since this is the unofficial "container zone", I ought to say something ...

To recap; referencing an associative array as an rValue modifies the array itself. That is

MyClass [char[]]  aa;

MyClass x = aa ["key"];
if (x)
    ...

actually /adds/ a null entry to the array. Instead of having 0 entries in the AA, you now have 1. And all you did was nonchalantly read the thing.

If you question the sanity of such side-effects, you'd be well within your rights. For example, this is akin to saying "if I read from a byte in memory, then I expect something else to be modified". Or "if I open a file and read from it, I expect the file length to change". Would you be content with that? I would hope not. Talk about confusing!

This really appears to defy the very grain of fundamental read/write assumptions. Sure, there are some rather complex and domain-specific scenarios whereby you'd /perhaps/ consider doing something similar regarding side-effects, but this AA stuff is a fundamental language type, built into the language itself ... it's only a hashmap, so there's just no excuse for such, uhhh, vagaries. If you're not convinced, please ask yourself if you'd expect this code to modify the array itself:

static const char[] ca = "can't touch this";

char c = ca[0];

I haven't tried it, but I would certainly hope the array isn't modified! Note that the basic semantics are identical to the AA case, and a major factor for building AA into the language was such that it would be just an array "extension". I'd call it a "stretch" instead <g>

So, the reason for writing this is twofold:

1) being the "container zone", I beg and plead with all container writers to avoid this sorry approach at all cost. After all, even if you don't agree with the lunacy aspect, you can't deny the fact that a value-lookup takes twice as long as it should (first check for existence, then perform a get).

2) containers should follow the principle of "least surprise" in all aspects. Please consider that.

Maybe a third reason would be to try and get the AA semantics changed ... Naaaaaa .... that would be asking for waaaay too much.



« First   ‹ Prev
1 2 3 4