Thread overview
Vote: AA builtin method get()
Jul 13, 2005
David Medlock
Re: AA builtin method get()
Jul 13, 2005
Ben Hinkle
Jul 13, 2005
David Medlock
Jul 13, 2005
Ben Hinkle
Jul 13, 2005
David Medlock
Jul 13, 2005
Ben Hinkle
Jul 13, 2005
David Medlock
July 13, 2005
Walter,

I really think the following is a good solution to the AA issues.

1. Add a get method to all AAs:

bool get( in key, out value );

if the value is there, you get it and no double lookups.  A great side effect of this is the out value is initialized to the default! Which mimics the old behavior and allows you to determine if it was there.

2. Deprecate the in syntax.  It is really not useful because it cannot be overridden if I wish to replace an AA with a class.  Semantically it has no advantage to a method.  It also requires pointers which isn't really required elsewhere in the rest of D's core.

3. allow remove(..) to return a boolean if the key was removed.  An overloaded remove with an 'out key' paramter might be useful as well.

4. Upon implementing those, the don't create behavior is kept for normal lookups.

Comments?

Here's hoping Walter is reading...

-DavidM
July 13, 2005
"David Medlock" <noone@nowhere.com> wrote in message news:db3026$629$1@digitaldaemon.com...
> Walter,
>
> I really think the following is a good solution to the AA issues.
>
> 1. Add a get method to all AAs:
>
> bool get( in key, out value );
>
> if the value is there, you get it and no double lookups.  A great side effect of this is the out value is initialized to the default! Which mimics the old behavior and allows you to determine if it was there.

question: would get() insert if not present? I assume not but that last
sentance above made me not so sure.
It might also be nice to have insert(key) which would equivalent to &aa[key]
assuming & inserts key if it isn't present.

> 2. Deprecate the in syntax.  It is really not useful because it cannot be overridden if I wish to replace an AA with a class.  Semantically it has no advantage to a method.  It also requires pointers which isn't really required elsewhere in the rest of D's core.

overloading shall come someday. The current AA indexing behavior isn't overloadable either.

> 3. allow remove(..) to return a boolean if the key was removed.  An overloaded remove with an 'out key' paramter might be useful as well.

sounds good to me.

> 4. Upon implementing those, the don't create behavior is kept for normal lookups.

What do you mean by normal lookups? Do you mean rvalue aa[key]?

> Comments?
>
> Here's hoping Walter is reading...
>
> -DavidM


July 13, 2005
Ben Hinkle wrote:

> "David Medlock" <noone@nowhere.com> wrote in message news:db3026$629$1@digitaldaemon.com...
> 
>>Walter,
>>
>>I really think the following is a good solution to the AA issues.
>>
>>1. Add a get method to all AAs:
>>
>>bool get( in key, out value );
>>
>>if the value is there, you get it and no double lookups.  A great side effect of this is the out value is initialized to the default! Which mimics the old behavior and allows you to determine if it was there.
> 
> 
> question: would get() insert if not present? I assume not but that last sentance above made me not so sure.
> It might also be nice to have insert(key) which would equivalent to &aa[key] assuming & inserts key if it isn't present.
> 
> 

I would assume it would not insert, which could be handled if it returned false.  Since an insert into an AA isnt a lookup, this is still efficient.  With get() I don't see a case where insert would be needed?

MyClass var;
if ( aa.get( key, var )==false ) aa[key] = ( var = new Class() );

>>2. Deprecate the in syntax.  It is really not useful because it cannot be overridden if I wish to replace an AA with a class.  Semantically it has no advantage to a method.  It also requires pointers which isn't really required elsewhere in the rest of D's core.
> 
> 
> overloading shall come someday. 

Overloading in? Whats wrong with a method? Seems like a novelty....

The current AA indexing behavior isn't
> overloadable either.
> 
Why not? Do you mean the &aa[key] syntax?

> 
>>3. allow remove(..) to return a boolean if the key was removed.  An overloaded remove with an 'out key' paramter might be useful as well.
> 
> 
> sounds good to me.
> 
> 
>>4. Upon implementing those, the don't create behavior is kept for normal lookups.
> 
> 
> What do you mean by normal lookups? Do you mean rvalue aa[key]?
> 
Yes, rvalues.  LValues would ostensibly not be affected.
> 
>>Comments?
>>
>>Here's hoping Walter is reading...
>>
>>-DavidM 
> 
> 
> 
July 13, 2005
"David Medlock" <noone@nowhere.com> wrote in message news:db342u$97b$1@digitaldaemon.com...
> Ben Hinkle wrote:
>
>> "David Medlock" <noone@nowhere.com> wrote in message news:db3026$629$1@digitaldaemon.com...
>>
>>>Walter,
>>>
>>>I really think the following is a good solution to the AA issues.
>>>
>>>1. Add a get method to all AAs:
>>>
>>>bool get( in key, out value );
>>>
>>>if the value is there, you get it and no double lookups.  A great side effect of this is the out value is initialized to the default! Which mimics the old behavior and allows you to determine if it was there.
>>
>> question: would get() insert if not present? I assume not but that last
>> sentance above made me not so sure.
>> It might also be nice to have insert(key) which would equivalent to
>> &aa[key] assuming & inserts key if it isn't present.
>>
>
> I would assume it would not insert, which could be handled if it returned false.  Since an insert into an AA isnt a lookup, this is still efficient. With get() I don't see a case where insert would be needed?
>
> MyClass var;
> if ( aa.get( key, var )==false ) aa[key] = ( var = new Class() );

That's still a double-lookup. I don't know what you mean exactly when you
say "an insert into an AA isn't a lookup". In order to insert a key you need
to compute the hash value and find an available slot for it. In particular
if the key is already in the AA then an insert is 99% the same amount of
work as looking up a key, which computes the hash and find the existing
slot.
I thought you were looking for an API that would mimic the C++ API - the
point being that missing keys are inserted and an lvalue is returned.

>>>2. Deprecate the in syntax.  It is really not useful because it cannot be overridden if I wish to replace an AA with a class.  Semantically it has no advantage to a method.  It also requires pointers which isn't really required elsewhere in the rest of D's core.
>>
>>
>> overloading shall come someday.
>
> Overloading in? Whats wrong with a method? Seems like a novelty....
>
> The current AA indexing behavior isn't
>> overloadable either.
>>
> Why not? Do you mean the &aa[key] syntax?

yes. The only overloads are opIndex and opIndexAssign



July 13, 2005
Ben Hinkle wrote:

> "David Medlock" <noone@nowhere.com> wrote in message news:db342u$97b$1@digitaldaemon.com...
> 
>>Ben Hinkle wrote:
>>
>>
>>>"David Medlock" <noone@nowhere.com> wrote in message news:db3026$629$1@digitaldaemon.com...
>>>
>>>
>>>>Walter,
>>>>
>>>>I really think the following is a good solution to the AA issues.
>>>>
>>>>1. Add a get method to all AAs:
>>>>
>>>>bool get( in key, out value );
>>>>
>>>>if the value is there, you get it and no double lookups.  A great side effect of this is the out value is initialized to the default! Which mimics the old behavior and allows you to determine if it was there.
>>>
>>>question: would get() insert if not present? I assume not but that last sentance above made me not so sure.
>>>It might also be nice to have insert(key) which would equivalent to &aa[key] assuming & inserts key if it isn't present.
>>>
>>
>>I would assume it would not insert, which could be handled if it returned false.  Since an insert into an AA isnt a lookup, this is still efficient. With get() I don't see a case where insert would be needed?
>>
>>MyClass var;
>>if ( aa.get( key, var )==false ) aa[key] = ( var = new Class() );
> 
> 
> That's still a double-lookup. I don't know what you mean exactly when you say "an insert into an AA isn't a lookup". In order to insert a key you need to compute the hash value and find an available slot for it. In particular if the key is already in the AA then an insert is 99% the same amount of work as looking up a key, which computes the hash and find the existing slot.

I assume Walter uses a bucket hashmap.  This means a linked list has an entry inserted at its head for the key's bucket.  This is a constant time operation(basically nil). Computing the key is not usually a hefty task(most are ints most likely), but searching through the bucket of entries can be(depending on the load). With cache-sensitive processors I would bet searching through the bucket list takes quite a bit longer.

Besides, even in the cases where the key takes a couple of extra milliseconds to compute, it only happens the *first time*.

If that is the crux of your program's performance, I would say it either:
1. Needs a better data strategy.
2. Is as fast as it is going to be, barring assembly.

> I thought you were looking for an API that would mimic the C++ API - the
> point being that missing keys are inserted and an lvalue is returned.

I actually like the idea of a Rvalue (var= aa[100]) lookup simply returning the default and *not creating* an entry, but this would break due to arrays being null proof...

-DavidM

July 13, 2005
"David Medlock" <noone@nowhere.com> wrote in message news:db36p6$bt9$1@digitaldaemon.com...
> Ben Hinkle wrote:
>
>> "David Medlock" <noone@nowhere.com> wrote in message news:db342u$97b$1@digitaldaemon.com...
>>
>>>Ben Hinkle wrote:
>>>
>>>
>>>>"David Medlock" <noone@nowhere.com> wrote in message news:db3026$629$1@digitaldaemon.com...
>>>>
>>>>
>>>>>Walter,
>>>>>
>>>>>I really think the following is a good solution to the AA issues.
>>>>>
>>>>>1. Add a get method to all AAs:
>>>>>
>>>>>bool get( in key, out value );
>>>>>
>>>>>if the value is there, you get it and no double lookups.  A great side effect of this is the out value is initialized to the default! Which mimics the old behavior and allows you to determine if it was there.
>>>>
>>>>question: would get() insert if not present? I assume not but that last
>>>>sentance above made me not so sure.
>>>>It might also be nice to have insert(key) which would equivalent to
>>>>&aa[key] assuming & inserts key if it isn't present.
>>>>
>>>
>>>I would assume it would not insert, which could be handled if it returned false.  Since an insert into an AA isnt a lookup, this is still efficient. With get() I don't see a case where insert would be needed?
>>>
>>>MyClass var;
>>>if ( aa.get( key, var )==false ) aa[key] = ( var = new Class() );
>>
>>
>> That's still a double-lookup. I don't know what you mean exactly when you say "an insert into an AA isn't a lookup". In order to insert a key you need to compute the hash value and find an available slot for it. In particular if the key is already in the AA then an insert is 99% the same amount of work as looking up a key, which computes the hash and find the existing slot.
>
> I assume Walter uses a bucket hashmap.  This means a linked list has an entry inserted at its head for the key's bucket.  This is a constant time operation(basically nil). Computing the key is not usually a hefty task(most are ints most likely), but searching through the bucket of entries can be(depending on the load). With cache-sensitive processors I would bet searching through the bucket list takes quite a bit longer.
>
> Besides, even in the cases where the key takes a couple of extra milliseconds to compute, it only happens the *first time*.
>
> If that is the crux of your program's performance, I would say it either:
> 1. Needs a better data strategy.
> 2. Is as fast as it is going to be, barring assembly.

Actually the current AA implementation doesn't use a linked list of
buckets - it uses an unbalanced tree (or perhaps more accurately the tree
will be balanced if the hashing algorithm is good). In any case "inserting"
a key that is already in the map needs to locate and reuse the existing
bucket otherwise you'll end up with duplicate entries in the map. So insert
is essentially the same as a lookup. I agree if you know a key is not in the
map and the map is using a linked list of buckets that inserting is faster
than lookup but neither of those assumptions would be true for the "insert"
function I was talking about.
But I agree performance isn't the crucial difference. Convenience and making
the code more explicit would probably be the true benefits.

> > I thought you were looking for an API that would mimic the C++ API - the point being that missing keys are inserted and an lvalue is returned.
>
> I actually like the idea of a Rvalue (var= aa[100]) lookup simply returning the default and *not creating* an entry, but this would break due to arrays being null proof...

ok.


July 13, 2005
Ben Hinkle wrote:

> "David Medlock" <noone@nowhere.com> wrote in message news:db36p6$bt9$1@digitaldaemon.com...
> 
>>I assume Walter uses a bucket hashmap.  This means a linked list has an entry inserted at its head for the key's bucket.  This is a constant time operation(basically nil). Computing the key is not usually a hefty task(most are ints most likely), but searching through the bucket of entries can be(depending on the load). With cache-sensitive processors I would bet searching through the bucket list takes quite a bit longer.
>>
>>Besides, even in the cases where the key takes a couple of extra milliseconds to compute, it only happens the *first time*.
>>
>>If that is the crux of your program's performance, I would say it either:
>>1. Needs a better data strategy.
>>2. Is as fast as it is going to be, barring assembly.
> 
> 
> Actually the current AA implementation doesn't use a linked list of buckets - it uses an unbalanced tree (or perhaps more accurately the tree will be balanced if the hashing algorithm is good). 

Interesting, I didn't know this.

In any case "inserting"
> a key that is already in the map needs to locate and reuse the existing bucket otherwise you'll end up with duplicate entries in the map. So insert is essentially the same as a lookup. 

True. Mea culpa.

> But I agree performance isn't the crucial difference. Convenience and making the code more explicit would probably be the true benefits.
> 

Hehe, right on.

My work time is a constant, while the computer gets more 'work time' constantly(Moore's law).

Thanks Ben.

-DavidM