Thread overview
AA's and mutating keys
Oct 31, 2007
BCS
Nov 01, 2007
Paul Findlay
October 31, 2007
There isn't any documentation that I can tell on the Arrays page that determines whether a mutating key can be used for an AA.  For example, if I use char[] as a key:

char[][char[]] map;

char[] keyvalue = "hello";

map[keyvalue] = "world";
keyvalue[] = "cello";

char[] *val = "hello" in map;

So does val get set to non-null?

-Steve


October 31, 2007
"Steven Schveighoffer" <schveiguy@yahoo.com> wrote in message news:fga4ah$kov$1@digitalmars.com...
> There isn't any documentation that I can tell on the Arrays page that determines whether a mutating key can be used for an AA.  For example, if I use char[] as a key:
>
> char[][char[]] map;
>
> char[] keyvalue = "hello";
>
> map[keyvalue] = "world";
> keyvalue[] = "cello";
>
> char[] *val = "hello" in map;
>
> So does val get set to non-null?
>
> -Steve

Never ever ever ever ever ever ever do this.  Really.  I'm pretty sure what you get from that 'in' is undefined.

Some languages go so far as to disallow using mutable types as map keys (i.e. Python).  It's just bad news.


October 31, 2007
It's ok if the AA makes a copy of the key, or uses something based on the key (like a hashcode).

My code is less likely to be encountered in real life, but I have a more probable example:

Let's say you were counting the number of occurrences of each word in a file.  You may do it by reading lines into a buffer.  Then for each word in the line, increment a counter in an AA that uses strings as the key and integers as the value.

If you re-use the buffer, you might overwrite the key that you used to insert an element into an AA!  But it's not obvious to someone who normally codes in C++ or Java, because strings are either pass by value or immutable. I've used D for a few months now, and it isn't even obvious to me :)

I am running into this very situation, so I want to know whether an AA will dup the key automatically or if I have to do it myself.

Plus, if I have to dup the key every time I just increment the count (not the first insertion), that is a lot of wasted memory and copying.  So I have to do something like:

int *x = key in map;
if(x)
  (*x)++;
else
  map[key.dup] = 1;

which would be nice if it was done automatically for me :)

Also, if I do have to dup it myself, then the website should specify that keys should not be modified once they are used to insert nodes into an AA.

-Steve

"Jarrett Billingsley" wrote
> "Steven Schveighoffer" <schveiguy@yahoo.com> wrote in message news:fga4ah$kov$1@digitalmars.com...
>> There isn't any documentation that I can tell on the Arrays page that determines whether a mutating key can be used for an AA.  For example, if I use char[] as a key:
>>
>> char[][char[]] map;
>>
>> char[] keyvalue = "hello";
>>
>> map[keyvalue] = "world";
>> keyvalue[] = "cello";
>>
>> char[] *val = "hello" in map;
>>
>> So does val get set to non-null?
>>
>> -Steve
>
> Never ever ever ever ever ever ever do this.  Really.  I'm pretty sure what you get from that 'in' is undefined.
>
> Some languages go so far as to disallow using mutable types as map keys (i.e. Python).  It's just bad news.
> 


October 31, 2007
Steven Schveighoffer wrote:

> I am running into this very situation, so I want to know whether an AA will dup the key automatically or if I have to do it myself.
> 

AA's do NOT dup the string. AA's should requirer that keys be invariant.

It's a bit expensive but I try to so this

int[char[]] a;
char[] s = something();
a[a.dup] = 5l;
October 31, 2007
"Steven Schveighoffer" <schveiguy@yahoo.com> wrote in message news:fgaf4j$1d18$1@digitalmars.com...
> It's ok if the AA makes a copy of the key, or uses something based on the key (like a hashcode).

Not really.  Consider:

int[char[]] map;
char[] key = new char[5];
key[] = "hello"[];
map[key] = 8;
key[0] = 'c';
map["cello"] = 2;

What happens is that you insert ("hello" => 8) into the table.  But then you go and modify the key so it's "cello" instead.  However now you have an inconsistent key: the key says "cello" but its hash is the hash of "hello". So, when you try to change "cello", the new "cello" maps to another slot and ("cello" => 2) is inserted.  In fact, if you run this code and then print out the key value pairs, you'll get something like:

cello 8
cello 2

In fact it's now impossible to modify or remove the original key-value pair because even if you can get something to hash to the right slot, the keys won't compare equal, and so the AA implementation won't find it.

> Let's say you were counting the number of occurrences of each word in a file.  You may do it by reading lines into a buffer.  Then for each word in the line, increment a counter in an AA that uses strings as the key and integers as the value.
>
> If you re-use the buffer, you might overwrite the key that you used to insert an element into an AA!  But it's not obvious to someone who normally codes in C++ or Java, because strings are either pass by value or immutable. I've used D for a few months now, and it isn't even obvious to me :)
>
> I am running into this very situation, so I want to know whether an AA will dup the key automatically or if I have to do it myself.
>
> Plus, if I have to dup the key every time I just increment the count (not the first insertion), that is a lot of wasted memory and copying.  So I have to do something like:
>
> int *x = key in map;
> if(x)
>  (*x)++;
> else
>  map[key.dup] = 1;
>
> which would be nice if it was done automatically for me :)

Easy, nested functions to the rescue:

char[80] buffer;
int[char[]] map;

void inc(char[] key)
{
    int* x = key in map;

    if(x)
        (*x)++;
    else
        map[key.dup] = 1;
}

foreach(thing; input)
{
    fill buffer;
    inc(buffer[0 .. end]);
}

The nice thing about it not doing it automatically for you is performance. Most of the time you _don't_ need this behavior, and when you do it's easy to implement it on top of the existing behavior.

> Also, if I do have to dup it myself, then the website should specify that keys should not be modified once they are used to insert nodes into an AA.

Agreed.


November 01, 2007
Steven Schveighoffer wrote:
> int *x = key in map;
> if(x)
>   (*x)++;
> else
>   map[key.dup] = 1;
That's exactly what I was doing when trying Tim Bray's "Wide Finder" thing in D. I didn't mind it so much, but I found myself wishing it could do something like dup the key automatically and insert a default value.

 - Paul
November 01, 2007
"Jarrett Billingsley" wrote
>
> "Steven Schveighoffer" <schveiguy@yahoo.com> wrote in message news:fgaf4j$1d18$1@digitalmars.com...
>> It's ok if the AA makes a copy of the key, or uses something based on the key (like a hashcode).
>
> Not really.  Consider:
>
> int[char[]] map;
> char[] key = new char[5];
> key[] = "hello"[];
> map[key] = 8;
> key[0] = 'c';
> map["cello"] = 2;
>
> What happens is that you insert ("hello" => 8) into the table.  But then you go and modify the key so it's "cello" instead.  However now you have an inconsistent key: the key says "cello" but its hash is the hash of "hello". So, when you try to change "cello", the new "cello" maps to another slot and ("cello" => 2) is inserted.  In fact, if you run this code and then print out the key value pairs, you'll get something like:
>
> cello 8
> cello 2

You are half right :)  The first part of my statement says "if the AA makes a copy of the key", so I think that is ok (and I think you agree).

But you are right about the hash table.  I understand that portion of it, but if the AA had a near-perfect hash function (like for an extreme example, md5sum of the text), then the AA would not even need to store the key, it would just store the hash.

That's besides the point.  I probably shouldn't have said the second part in my original statement.

>
> In fact it's now impossible to modify or remove the original key-value pair because even if you can get something to hash to the right slot, the keys won't compare equal, and so the AA implementation won't find it.

<splitting hairs>
In fact it is possible :)

key[0] = 'h';
map.remove(key);
</splitting hairs>

> Easy, nested functions to the rescue:

This is a good point, I absolutely love nested functions, and didn't think of that.

>
> The nice thing about it not doing it automatically for you is performance. Most of the time you _don't_ need this behavior, and when you do it's easy to implement it on top of the existing behavior.

The performance hit of copying the string on first insertion is very minimal in most cases.  I think the benefit far outweighs the cost.  C++ std::map does this, and nobody has ever complained about the performance hit.

It would be nice to have a class/struct that wraps an AA with this behavior.

-Steve


November 02, 2007
"Steven Schveighoffer" <schveiguy@yahoo.com> wrote in message news:fgctpf$1acr$1@digitalmars.com...
>>> It's ok if the AA makes a copy of the key, or uses something based on the key (like a hashcode).
>
> You are half right :)  The first part of my statement says "if the AA makes a copy of the key", so I think that is ok (and I think you agree).

I must have interpreted the 'or' in your first statement as 'xor' ;)

> <splitting hairs>
> In fact it is possible :)
>
> key[0] = 'h';
> map.remove(key);
> </splitting hairs>

PSSSSSHHHHH

> The performance hit of copying the string on first insertion is very minimal in most cases.  I think the benefit far outweighs the cost.  C++ std::map does this, and nobody has ever complained about the performance hit.
>
> It would be nice to have a class/struct that wraps an AA with this behavior.

struct DupAA(V, K)
{
    V[K] data;

    static DupAA!(V, K) opCall(V[K] data)
    {
        DupAA!(V, K) ret;
        ret.data = data;
        return ret;
    }

    size_t length()
    {
        return data.length;
    }

    K[] keys()
    {
        return data.keys;
    }

    V[] values()
    {
        return data.values;
    }

    void rehash()
    {
        data.rehash;
    }

    V opIndex(K k)
    {
        return data[k];
    }

    void opIndexAssign(V v, K k)
    {
        if(auto val = k in data)
            *val = v;
        else
        {
            static if(is(typeof(k.dup)))
                data[k.dup] = v;
            else
                data[k] = v;
        }
    }

    int opApply(int delegate(inout V) dg)
    {
        foreach(v; data)
            if(auto result = dg(v))
                return result;

        return 0;
    }

    int opApply(int delegate(inout K, inout V) dg)
    {
        foreach(k, v; data)
            if(auto result = dg(k, v))
                return result;

        return 0;
    }
}

void main()
{
    DupAA!(int, char[]) map;

    char[] key = new char[5];
    key[] = "hello"[];
    map[key] = 8;
    key[0] = 'c';
    map["cello"] = 2;

    foreach(k, v; map)
        Stdout.formatln("{}: {}", k, v);
}


WHAM BAM