July 28, 2004
Kris wrote:

> "Ben Hinkle" wrote ...
>> Kris wrote:
>> > Well, apparently not. Now clearly I did not assign anything to the AA in the above code, but an entry is created regardless. That is, AA entries are created as a /side-effect/ of doing that lookup. Now, I've used many hash-table implementations before, but this behavior is truly a first. Hands-up please, all those who would have expected this ...
>>
>> > That aside, one is forced to do a double lookup in such cases. This is
> not
>> > good at all, and I simply cannot fathom what benefit there is to such an approach. In my particular application, the double-lookup (and, of
> course,
>> > double hash-computation) is entirely unacceptable. Lookup is the most common operation on a hash-table, by far. It has to be efficient, and
> this
>> > "side-effect" totally flies in the face of reason. If it were a Phobos module it would have been replaced long ago.
>> >
>> > For the moment I have to assume the behavior is "by design", so fair-warning to everyone.
>>
>> It is subtle if you are expecting something else.
> 
> I'd like to just ignore this and go on, but the semantics here are just too badly broken to serve as an example of what containers should do. Since this is the unofficial "container zone", I ought to say something ...
> 
> To recap; referencing an associative array as an rValue modifies the array itself. That is
> 
> MyClass [char[]]  aa;
> 
> MyClass x = aa ["key"];
> if (x)
>     ...
> 
> actually /adds/ a null entry to the array. Instead of having 0 entries in the AA, you now have 1. And all you did was nonchalantly read the thing.
> 
> If you question the sanity of such side-effects, you'd be well within your rights. For example, this is akin to saying "if I read from a byte in memory, then I expect something else to be modified". Or "if I open a file and read from it, I expect the file length to change". Would you be content with that? I would hope not. Talk about confusing!
> 
> This really appears to defy the very grain of fundamental read/write assumptions. Sure, there are some rather complex and domain-specific scenarios whereby you'd /perhaps/ consider doing something similar regarding side-effects, but this AA stuff is a fundamental language type, built into the language itself ... it's only a hashmap, so there's just no excuse for such, uhhh, vagaries. If you're not convinced, please ask yourself if you'd expect this code to modify the array itself:
> 
> static const char[] ca = "can't touch this";
> 
> char c = ca[0];
> 
> I haven't tried it, but I would certainly hope the array isn't modified! Note that the basic semantics are identical to the AA case, and a major factor for building AA into the language was such that it would be just an array "extension". I'd call it a "stretch" instead <g>
> 
> So, the reason for writing this is twofold:
> 
> 1) being the "container zone", I beg and plead with all container writers to avoid this sorry approach at all cost. After all, even if you don't agree with the lunacy aspect, you can't deny the fact that a value-lookup takes twice as long as it should (first check for existence, then perform a get).
> 
> 2) containers should follow the principle of "least surprise" in all aspects. Please consider that.
> 
> Maybe a third reason would be to try and get the AA semantics changed ... Naaaaaa .... that would be asking for waaaay too much.

Here's the dilemma: if you try to get a value that isn't in the hashmap then what should be returned? No matter what you return the user can't tell if the key is/was in the hashmap or not. So in some sense the right thing to do is throw an exception when the user tries to evaluate aa["key"] and "key" isn't in the map - this solution would be the most correct in terms of principle of least astonishment since it is analogous to asking a dynamic array for an index that is out of bounds (which throws an exception). I'm half-tempted to serious propose that instead of some hack about returning Type.init and maybe or maybe not adding it to the map. That way both STL and Java users will get a surprise!

eh, maybe I need another glass of wine. :-)


July 28, 2004
On Tue, 27 Jul 2004 22:49:39 -0400, Ben Hinkle <bhinkle4@juno.com> wrote:

> Kris wrote:
>
>> "Ben Hinkle" wrote ...
>>> Kris wrote:
>>> > Well, apparently not. Now clearly I did not assign anything to the AA
>>> > in the above code, but an entry is created regardless. That is, AA
>>> > entries are created as a /side-effect/ of doing that lookup. Now, 
>>> I've
>>> > used many hash-table implementations before, but this behavior is 
>>> truly
>>> > a first. Hands-up please, all those who would have expected this ...
>>>
>>> > That aside, one is forced to do a double lookup in such cases. This 
>>> is
>> not
>>> > good at all, and I simply cannot fathom what benefit there is to such
>>> > an approach. In my particular application, the double-lookup (and, of
>> course,
>>> > double hash-computation) is entirely unacceptable. Lookup is the most
>>> > common operation on a hash-table, by far. It has to be efficient, and
>> this
>>> > "side-effect" totally flies in the face of reason. If it were a 
>>> Phobos
>>> > module it would have been replaced long ago.
>>> >
>>> > For the moment I have to assume the behavior is "by design", so
>>> > fair-warning to everyone.
>>>
>>> It is subtle if you are expecting something else.
>>
>> I'd like to just ignore this and go on, but the semantics here are just
>> too badly broken to serve as an example of what containers should do.
>> Since this is the unofficial "container zone", I ought to say something
>> ...
>>
>> To recap; referencing an associative array as an rValue modifies the array
>> itself. That is
>>
>> MyClass [char[]]  aa;
>>
>> MyClass x = aa ["key"];
>> if (x)
>>     ...
>>
>> actually /adds/ a null entry to the array. Instead of having 0 entries in
>> the AA, you now have 1. And all you did was nonchalantly read the thing.
>>
>> If you question the sanity of such side-effects, you'd be well within your
>> rights. For example, this is akin to saying "if I read from a byte in
>> memory, then I expect something else to be modified". Or "if I open a file
>> and read from it, I expect the file length to change". Would you be
>> content with that? I would hope not. Talk about confusing!
>>
>> This really appears to defy the very grain of fundamental read/write
>> assumptions. Sure, there are some rather complex and domain-specific
>> scenarios whereby you'd /perhaps/ consider doing something similar
>> regarding side-effects, but this AA stuff is a fundamental language type,
>> built into the language itself ... it's only a hashmap, so there's just no
>> excuse for such, uhhh, vagaries. If you're not convinced, please ask
>> yourself if you'd expect this code to modify the array itself:
>>
>> static const char[] ca = "can't touch this";
>>
>> char c = ca[0];
>>
>> I haven't tried it, but I would certainly hope the array isn't modified!
>> Note that the basic semantics are identical to the AA case, and a major
>> factor for building AA into the language was such that it would be just an
>> array "extension". I'd call it a "stretch" instead <g>
>>
>> So, the reason for writing this is twofold:
>>
>> 1) being the "container zone", I beg and plead with all container writers
>> to avoid this sorry approach at all cost. After all, even if you don't
>> agree with the lunacy aspect, you can't deny the fact that a value-lookup
>> takes twice as long as it should (first check for existence, then perform
>> a get).
>>
>> 2) containers should follow the principle of "least surprise" in all
>> aspects. Please consider that.
>>
>> Maybe a third reason would be to try and get the AA semantics changed ...
>> Naaaaaa .... that would be asking for waaaay too much.
>
> Here's the dilemma: if you try to get a value that isn't in the hashmap then
> what should be returned? No matter what you return the user can't tell if
> the key is/was in the hashmap or not. So in some sense the right thing to
> do is throw an exception when the user tries to evaluate aa["key"] and
> "key" isn't in the map - this solution would be the most correct in terms
> of principle of least astonishment since it is analogous to asking a
> dynamic array for an index that is out of bounds (which throws an
> exception). I'm half-tempted to serious propose that instead of some hack
> about returning Type.init and maybe or maybe not adding it to the map.

I think this is the best solution. In addition you can supply a method like

if (aa.contains("key",value))

which returns true/false and assigns value if found. So those who do not want to catch an exception can call that one instead. Personally I think I'd always want to use the contains method instead of aa["key"].

If you dont like the name contains you can use anything else i.e. find, get, lookup

> That way both STL and Java users will get a surprise!

At least it's an immediate surprise and not one you don't realise until later, as it was for Kris with the current behaviour.

> eh, maybe I need another glass of wine. :-)

Don't we all..

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
July 28, 2004
"Ben Hinkle"  wrote
> Here's the dilemma: if you try to get a value that isn't in the hashmap
then
> what should be returned? No matter what you return the user can't tell if the key is/was in the hashmap or not. So in some sense the right thing to do is throw an exception when the user tries to evaluate aa["key"] and "key" isn't in the map - this solution would be the most correct in terms of principle of least astonishment since it is analogous to asking a dynamic array for an index that is out of bounds (which throws an exception). I'm half-tempted to serious propose that instead of some hack about returning Type.init and maybe or maybe not adding it to the map.
That
> way both STL and Java users will get a surprise!
>
> eh, maybe I need another glass of wine. :-)

HaHa! Got a fairly reasonable Chateau De Armpit open here too <g>

Doesn't Python throw an exception like that? Or is it Ruby? I forget ...

One reasonable way out of the hole is to stipulate (as part of the API contract) that a "value" of null is illegal. That way, a null return signifies the entry does not exist. You can then happily do a get(), and test the return. That gets around the double-lookup issue (for a non-mutating lookup). A value-type of pointer, array or class would support a null return. From recollection, this is what Java does.

Unfortunately, you can't do that for simple ints and chars (without
wrappers), so there's the rub.

OT: meant to ask if you or Mike were gonna' work on porting D.L's ConcurrentHashMap for your container-set ... I'd pay money for that :-)


July 28, 2004
"Regan Heath"  wrote ..
> I think this is the best solution. In addition you can supply a method
like
>
> if (aa.contains("key",value))
>
> which returns true/false and assigns value if found. So those who do not want to catch an exception can call that one instead. Personally I think I'd always want to use the contains method instead of aa["key"].

Good point Regan;

Unfortunately, the "value" may be expensive to construct. If so, then you may end up constructing it for nothing (because the AA already has an instance). A mutating method/property such as this might get away with a delegate instead, to create the missing "value". Might call it createWhereMissing(), or something equally hideous <g>

For a large percentage of cases, I think one can get away with excluding null as a valid "value", and returning that for failed queries/lookups. There again, the latter wouldn't necessarily work for all AA types (though it would for many other containers).

Here's to that glass that Ben was suggesting!


July 28, 2004
On Tue, 27 Jul 2004 21:15:44 -0700, Kris <someidiot@earthlink.dot.dot.dot.net> wrote:

> "Regan Heath"  wrote ..
>> I think this is the best solution. In addition you can supply a method
> like
>>
>> if (aa.contains("key",value))
>>
>> which returns true/false and assigns value if found. So those who do not
>> want to catch an exception can call that one instead. Personally I think
>> I'd always want to use the contains method instead of aa["key"].
>
> Good point Regan;
>
> Unfortunately, the "value" may be expensive to construct. If so, then you
> may end up constructing it for nothing (because the AA already has an
> instance).

The only case I can think of is a large struct. In which case you should probably be storing pointers to them in the AA. eg.

struct Foo {
  ..lots..
}

Foo*[char[]] aa;
Foo* value;

if (aa.contains("regan",value)) ..

If you're not storing pointers then it's going to be slow/expensive to simply add/assign one to the aa in the first place.

> A mutating method/property such as this might get away with a
> delegate instead, to create the missing "value". Might call it
> createWhereMissing(), or something equally hideous <g>

In the instance where you want to look for something _and_ add it if it does not exist I believe a different 'find' function should be used, one that returns the insertion location eg.

if (aa.contains("regan",at)) ..value exists..
else {
  aa.add("regan",value,at);  //add value at 'at'
}

is a better solution.

> For a large percentage of cases, I think one can get away with excluding
> null as a valid "value", and returning that for failed queries/lookups.

Perhaps, but the fact that it does not handle basic types (without a wrapper class) makes it a poor soln IMO.

> There again, the latter wouldn't necessarily work for all AA types (though it would for many other containers).
> Here's to that glass that Ben was suggesting!

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
July 28, 2004
1) Oh, I misread your initial post. Sorry. So, when you say "returns true/false and assigns a value if found" I guess you mean this is a kind of "replace" function instead? And returns whether the operation took place? If so, then that's very useful. Still leaves the inverse scenario open though, where you want to retrieve the value of something from the container without mutating (egad!) at all: a very common case. With the current syntax you have to check first, then do the retrieval: two lookups.

2) The problem with those "insertion location" style APIs is one of atomicity ... if something throws an exception in-between, then all bets are off. Definately works well for some environments though.


"Regan Heath"  wrote...

> The only case I can think of is a large struct. In which case you should probably be storing pointers to them in the AA. eg.
>
> struct Foo {
>    ..lots..
> }
>
> Foo*[char[]] aa;
> Foo* value;
>
> if (aa.contains("regan",value)) ..
>
> If you're not storing pointers then it's going to be slow/expensive to simply add/assign one to the aa in the first place.
>


July 28, 2004
"Ben Hinkle"  wrote ...
> Here's the dilemma: if you try to get a value that isn't in the hashmap
then
> what should be returned? No matter what you return the user can't tell if the key is/was in the hashmap or not. So in some sense the right thing to do is throw an exception when the user tries to evaluate aa["key"] and "key" isn't in the map - this solution would be the most correct in terms of principle of least astonishment since it is analogous to asking a dynamic array for an index that is out of bounds (which throws an exception). I'm half-tempted to serious propose that instead of some hack about returning Type.init and maybe or maybe not adding it to the map.
That
> way both STL and Java users will get a surprise!
>
> eh, maybe I need another glass of wine. :-)

Perhaps I've had too many recently ... 8~}

How about returning a simple true/false from a lookup instead? One can use an 'out' argument for the "value" if there's one present, yes? Not the most elegant solution, or perhaps the most efficient; but it's a whole lot better than two lookups, or the dreaded (cue music ...) *mutating rValue* ...

bool get (K key, out V value);





July 28, 2004
Kris wrote:

> HaHa! Got a fairly reasonable Chateau De Armpit open here too <g>
> 
> Doesn't Python throw an exception like that? Or is it Ruby? I forget ...
Python throws an exception. Ruby will by default (Hash.new() or {}) return nil (which is the best single behaviour, but not all types support it in D. If you care about existant keys with nil values, call contains()).
Hash.new(0) will create a hash that returns 0 for nonexistant keys. (Think sparse matrices...)
Hash.new { |hash,key|
	# code
}
Passes a the hash and the specified key and executes the code, and returns the result, if the key isn't in the hash.
For example:
Hash.new { |hash,key|
	hash[key]=foo(key)
}
creates a hash that's a caching wrapper of foo. (The value of the last expression is implicitly returned).

I like Ruby by the way :-)

> One reasonable way out of the hole is to stipulate (as part of the API
> contract) that a "value" of null is illegal. That way, a null return
> signifies the entry does not exist. You can then happily do a get(), and
> test the return. That gets around the double-lookup issue (for a
> non-mutating lookup). A value-type of pointer, array or class would support
> a null return. From recollection, this is what Java does.
> 
> Unfortunately, you can't do that for simple ints and chars (without
> wrappers), so there's the rub.
Hmm, here's a D 3.0 feature (or D08, or whatever version is sufficiently complicated anyway that one more feature won't make a difference). I'm in a frivolous mood, so don't bother flaming me :-)

int? value=array["test"]
if(value?)
	printf("%d\n",value);

Type? would be something like
struct ?(T) {
	T value;
	bit present;
	/* implicit */ T opCast() { return value; }
	bit opQuery() { return present; }
}

Actually, I quite like the opQuery, or maybe foo? could be the "foo aint null" operator that's being sought.

Meh.

Sam
July 28, 2004
> I think this is the best solution. In addition you can supply a method
like
>
> if (aa.contains("key",value))
>
> which returns true/false and assigns value if found. So those who do not want to catch an exception can call that one instead. Personally I think I'd always want to use the contains method instead of aa["key"].
>
> If you dont like the name contains you can use anything else i.e. find, get, lookup

Seems reasonable to me - though the name "contains" doesn't imply anything about setting values so maybe something like "trySet" would be better. Also I could see a "tryGet" function that looks up the key and if present sets the output variable and returns true and otherwise returns false.

> > That way both STL and Java users will get a surprise!
>
> At least it's an immediate surprise and not one you don't realise until later, as it was for Kris with the current behaviour.

very true. subtle bugs can be very frustrating.

> > eh, maybe I need another glass of wine. :-)
>
> Don't we all..
>
> Regan
>
> -- 
> Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/


July 28, 2004
On Tue, 27 Jul 2004 22:03:17 -0700, Kris <someidiot@earthlink.dot.dot.dot.net> wrote:
> 1) Oh, I misread your initial post. Sorry. So, when you say "returns
> true/false and assigns a value if found" I guess you mean this is a kind of
> "replace" function instead?

No, no, no.. I'm just not making myself clear methinks.

Two different functions, one:

  bool contains(key_type key, value_type value);

looks for key, returns true/false (existance of the value) and assigns value on true.

The other:

  bool contains(key_type key, index_type at);

looks for key, returns true/false (existance of the value) and gives an index to the value which you can use to:
a) retrieve the value (if it exists)
b) replace the value with a new one (if it exists)
c) insert a value (if it does not exist)

> And returns whether the operation took place? If
> so, then that's very useful. Still leaves the inverse scenario open though,
> where you want to retrieve the value of something from the container without
> mutating (egad!) at all: a very common case. With the current syntax you
> have to check first, then do the retrieval: two lookups.

Not using my idea above :)

> 2) The problem with those "insertion location" style APIs is one of
> atomicity ... if something throws an exception in-between, then all bets are
> off. Definately works well for some environments though.

I realise this, I don't think it should ever mutate on read/lookup operations.

Regan

> "Regan Heath"  wrote...
>
>> The only case I can think of is a large struct. In which case you should
>> probably be storing pointers to them in the AA. eg.
>>
>> struct Foo {
>>    ..lots..
>> }
>>
>> Foo*[char[]] aa;
>> Foo* value;
>>
>> if (aa.contains("regan",value)) ..
>>
>> If you're not storing pointers then it's going to be slow/expensive to
>> simply add/assign one to the aa in the first place.
>>
>
>



-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/