November 06, 2005
On Sun, 06 Nov 2005 11:29:28 -0800, Sean Kelly <sean@f4.ca> wrote:
> Walter Bright wrote:
>>  I liked the previous syntax, and the way it worked, because it was
>> efficient. But nobody, not one, spoke out in favor of it, and all heaped
>> ridicule on it (and not completely without merit, I threw in the towel on it
>> when it was pointed out that javascript didn't do it that way either, though
>> I thought it did). Sorry if I'm a little sensitive about this <g>.
>
> For what it's worth, I liked it too.  And I believe the C++ map worked this way (as justification for the design).
>
>> For most uses of AAs, the lookups of existing entries one expects to be in
>> there far outnumber the test/set style, so while it is a bit slower, it
>> isn't appreciably.
>
> I would have preferred leaving the existing syntax as-is and adding a new method called 'find' or some such that returned a pointer to the element or null if it doesn't exist.

I think we can and should avoid pointers.

I think the types of things we want to do can be broken into categories:

1. 'check' for existance of an item.
2. 'check' for existance of an item and get it.
3. 'get' value, error if not exists.
4. 'set' value, create or replace existing.
[optional]
5. 'set' value if not existing, i.e. create only, don't replace.
6. 'set' value if existing, i.e. replace only, don't create.

I think ideally we want to be able to achieve all of the above without double lookups.
I think we can, or come pretty close without too many changes, here is what I recommend:

#1 - 'key in aa'
leave it as is, or change it back to returning true/false.
(NOCHANGE)

#2 - 'aa.finds(key,[out]value)'
returns true/false and gets value if existing.
(ADD)

#3- 'value = aa.get(key)'
  - 'value = aa[key]'
returns value or throws error.
(ADD/NOCHANGE)

#4- 'aa.set(key,value)'
  - 'aa[key] = value'
creates or replaces value for key with (v).
(ADD/NOCHANGE)

[optional]
#5- 'aa.create(key,[inout]value)'
return true and assign value if non-existant (creating it), false otherwise and get existing value.
(ADD)

#6- 'aa.replace(key,value)'
return true and assign value if exists (replacing), false otherwise
(ADD)

#5 and #6 can be done with double lookups using 'find' eg.
if (!aa.find(key,cur)) { aa[key] = value; }
if (aa.find(key,cur)) { aa[key] = value; }

I believe we need one get/find method that throws and one that doesn't (get throws, find doesn't) allowing us to make our intentions clear, i.e. you use the one that throws in cases where the item should exist, and not existing is an error.

I think 'find' is the essential component of AA's which we are missing at present.

I really dont mind what the syntax looks like, be it method style i.e. "value = aa.get(key)" or array style "value = aa[key]".

However, I think one consistent style is a good idea, and I don't think it's possible for the array style to represent the different intentions we have, which is why 'find' is essential.

Regan
November 06, 2005
I apologize for the long post. The salient point is right at the end, so please skip over the point/counterpoint argy-bargy.


Derek Parnell wrote:
> Currently in D, when one attempts to retrieve a non-existent element in an
> array, it causes a run-time error to occur. This applies to all array
> types: fixed-length, dynamic-length, and associative. (And yes, in the
> current D, an associative array is implemented as a hash-table.) The type
> of error depends on whether the -release switch has been used or not. If it
> has been used then a memory access violation occurs (ie. GPF under unix),
> otherwise if -release was not used an ArrayBoundsError exception is thrown.

I see you've bought into that. There is no such thing as an array-bounds error from the API of a hash-table, Derek. It's purely a manufactured idiom of the current API.

> The problem is that you don't like this behaviour for associative arrays.

Really? If there's something I "don't like" here, it is an API that is problematic purely for the sake of using a particular syntax. You've read the tale of the Emporer's New Clothes, haven't you?


> I assume that when trying to fetch a non-existent element you would either
> like the element to be automatically created with .init value(s) and/or to
> return some initialized value, or to always throw an ArrayBoundsError
> regardless of the -release status. Which is it you'd like to see happen?

I can't understand why you feel these are the only options, Derek. I agree those are perhaps the options when using array-syntax, but that's exactly where the problem lies. Neither of your two options are attractive; particularly so when they are purely artificial constraints.


>>I failed miserably to get your drift here. 
> 
> 
> I apologize. Sometimes I'm not as good with words as I think I am.
> 
> I made the assumption that the library managed an AA, and that an public
> function was available that fetches data from that AA based on a supplied
> key in one of the parameters. I was just saying that if this is the case,
> then you'd be wise to validate the key data prior to fetching the AA based
> on the externally supplied key value.

This is hardly on topic, and smells of smoke. I have to remind you that you do not, and should not, require redundant lookups to check if an entry exists before fetching it from a hash-table.


>>Hash tables are *not* like arrays. If they don't contain a key it is surely not a reason to GPF. Is it? 
> 
> 
> D's associative arrays are a specific type of hash table. The entries in
> the table are based on keys. And I agree, a GPF is only one of the possible
> implementation behaviors that are possible in response to a fetch attempt
> for an element that does not exist.

Well, thank goodness. But, "specific type"? It's just a plain old hash-table, with some unwieldly syntax bolted onto it. The latter is the problem, not the former.


>>We're talking about this code causing a GPF:
>>
>>char[char[]] AA;
>>
>>char[] s = AA["unforseen key"];  // GPF; can't check for a null return
>>
> 
> 
> This is why you might benefit from Walter reestablishing this sort of
> behaviour in D - in addition to the current AA behaviour. Sophisticated
> coders such as yourself can use such facilities. 

Sophisticated coders? Yer arse <g>. Hash tables are supposed to be trivial from the perspective of the user.


>   char[] s = AA.initset("unforseen key");
> 
> Now you can check for s.length == 0 if that's important to you. Of course,
> that isn't always a perfect way of detecting unforseen key accesses.
> 

Complexity just for the sake of it. This is entirely unecessary.


> In spite of the "double lookup" effect, I would still code it thus ...
> 
>   char[] s;
>   if ("unforseen key" in AA)
>      s = AA["unforseen key"];
>   else
>      -- some error processing if appropriate.
> 
> because it tells the reader of the code that it is possible to get bad keys
> and it implements a way to handle those unambiguously.


I see. Pray explain why this alternate API is so appalling by comparison:

  char[] s;

  if (aa.get("unforseen key", s))
      // do something with s
  else
     // do something else

Look! No redundant lookups! No pointers! It must be magic! And, to quote you, "it tells the reader of the code that it is possible to get bad keys and it implements a way to handle those unambiguously". Wouldn't you agree?


>>Let's try to stay in the land of reason here. Yes, you can come up with all sort of ways to /make/ it work with /multiple/ lookups. Walter suggested a way to do it with three lookups instead. I know you appreciate optimal code paths, Derek, so can we sidestep this please? 
> 
> 
> You might be misunderstanding me, now. I prize maintainable source code
> over runtime performance any day. If run time performance is really such an
> issue, code in assembler otherwise get back into the land of reason.
> 

Entirely misleading. Look at the example above and reconsider. You appear to be trying to turn this into something unrelated, Derek. Please desist. Yes, there is a performance related aspect here, but only because you insist on applying entirely redundant lookups.

One can write perfectly clear intentions (arguably more so) by using an alternate, and more appropriate, API.



>>The above code has two lookups, where only one should be necessary. I sure hope you avoid multiple lookups within Build?
> 
> 
> Not if I can help it ;-) By the way, Build runs pretty fast in spite of me
> 'wasting' cycles checking for valid AA keys.


Build is a great tool. However, it appears as though Build takes longer to execute than both the compiler plus linker together. It is not a high performance application, because it doesn't really need to be. But that's hardly important since we're talking about API's here. Build is great at what it does ~ it does not represent every application.

Again, your statement vaguely implies that I argue against checking for valid AA keys. That's silly, Derek. I'm claiming that one can clearly and unambiguously both test the existence of, and avoid redundant lookups upon, a hash-table entry by using a more appropriate API.


> Isn't that what I said? Your code is performing a 'get' and not an
> 'enquiry'. 

As far as HT's are concerned, a get is equivalent to a query. You can argue about seperating them all you wish, but you're simply arguing for redundant lookups. To avoid this is exactly why Walter is returning a pointer from the 'in' statement. Are you disagreeing with all perspectives?


>>Au contraire, my friend ~ unless you're prepared to perform unecessary multiple lookups. I don't consider redundant lookups to be relevant, and neither should anyone following this ridiculous saga.
> 
> 
> I must be one of the clowns then. I don't follow your philosophy anymore.
> Cost of the application over time is more important to me than trivial
> optimizations. Trivial in the sense that if it doesn't account for more
> than 5% of a program's execution time, why optimize it to death. My
> philosophy regard this is more along the lines of code it legibly first,
> and then profile it to locate areas that are worth optimizing.

(the saga is ridiculous because it's years' old, whilst perfectly suitable alternatives have existed for decades)

You're welcome to do double lookups all you want, Derek. However, you're insisting that D remain staunchly oblivious to better alternatives.

There's so much spin in your counter that I'm feeling dizzy. You're attempting to suggest I don't care a whit about legibility, and that any quest to avoid redundant code is misguided. That's utter nonsense.


> And you have measured this, right?

Nobody needs to measure it, Derek. If one executes two lookups where one would suffice, then one will expend close to twice the effort/time. It stands to reason. You're trying to argue that a single lookup somehow makes the code less clear (entirely false), therefore we should all use two lookups instead. It's a pointless argument. Please try to keep an open mind about alternative APIs.


>>Please re-read. Array-syntax lookup /by itself/ is borked. It has to be used in conjunction with 'in', and is therefore superfluous (since 'in' supplies the data anyway). 
> 
> 
> Well not actually so. The "in" supplies the key and not the data. The data
> can be something totally different.
> 
>    real[char[]] AA;
>    . . . 
> 
>    real X;
>    . . .    if ("Some Key" in AA)
>      X = AA["Some Key"];
>    else
>      -- Handle unknown key value.


What is your point there? That statment makes no sense at all. Here's the pointer version of you example:

real[char[]] AA;
...
real* x;
...
x = ("Some Key" in AA);
if (x)
    // do something with *x
else
   // do something else


And here's a robust, simple, efficient API, sans pointers:

real[char[]] aa;
...
real x;
...
if (aa.get("Some Key", x))
    // do something with x
else
   // do something else



>>Sure; you can lookup the AA again if you wish, but your counter-argument is redundant; just like the additional lookup. As noted, the existence of s=AA[], by itself, encourages rather fragile code. D requires pointer-syntax to lookup an AA entry without GPFing (redundant multiple lookups aside).
> 
> 
> D does not *require* pointer syntax. It is optional. But for completeness
> here is the pointer version.


I'll repeat what I said above: "D requires pointer-syntax to lookup an AA entry without GPFing (redundant multiple lookups aside)"

Your counter chooses to ignore the parenthisis. Restated: to avoid multiple lookups, and GPFs, one must use pointer syntax in D. Period.



>>>What would have been nice is instead of Walter get all upset over people
>>>not liking his implementation, is to provide all four types of access.
>>>
>>>   'enquiry' ::  Key in Array (returns pointer to Value or Null)
>>>   'get'     ::  Value = Array[Key] (Gets the Value if it exists,
>>>                                     error otherwise)
>>>   'set'     ::  Array[Key] = Value (Sets/Replaces the Value. Creates if
>>>                                     it doesn't exist.
>>>   'initget' ::  Value = Array.initset(Key) (Gets the Value if it exists,
>>>                                          otherwise creates an entry with
>>>                                          .init values.)
>>>
>>>Or any other equivalent syntax. The point is that there is no reason for
>>>the old behaviour to be totally removed from the language, just shifted
>>>away from being the default behaviour for 'Value = Array(Key)' syntax.


I agree. But I see you're insisting on force-fitting the [] syntax, resulting in a sub-optimal and overly busy API. All one needs is right here:

bool get(key, inout value);
void put(key, value);

That is simple, robust, intuitive, optimal, proven, succinct. No redundant lookups. No pointers anywhere to be seen.

The [] syntax seriously limits D in the API it can expose for these purposes. Which is why it's messy at this point. And it's why you have chosen to suggest 4 methods, plus the use of pointers, whereas 2 simple methods are pefectly capable instead.
November 06, 2005
On Sun, 06 Nov 2005 13:23:40 -0800, kris <fu@bar.org> wrote:
> Derek Parnell wrote:
>> Currently in D, when one attempts to retrieve a non-existent element in an
>> array, it causes a run-time error to occur. This applies to all array
>> types: fixed-length, dynamic-length, and associative. (And yes, in the
>> current D, an associative array is implemented as a hash-table.) The type
>> of error depends on whether the -release switch has been used or not. If it
>> has been used then a memory access violation occurs (ie. GPF under unix),
>> otherwise if -release was not used an ArrayBoundsError exception is thrown.
>
> I see you've bought into that. There is no such thing as an array-bounds error from the API of a hash-table, Derek. It's purely a manufactured idiom of the current API.

Sez you! ;)

Seriously though I disagree. I think it depends on what you're using it for. I have found the thrown exception useful for catching bugs in at least one app I have been writing. The code in question assumed a value existed, it was a program error for it not to exist. Thus, the current implementation, the current API was exactly what I desired in this case. "array bounds error" may not be exactly what it is, but whatever you want to call it an error when the item does not exist was a requirement in this case.

However, I agree with your original point. There are cases where it's never an error for the value to be non existant, in fact I think perhaps it's more common for this to be the case. In which case if you were to reword your statement above to say that an array bounds error was not common in the API of a hash table I would be quite happy to agree.

You guys seem to be arguing about all the wrong things. How about we start with what we want, i.e.

1. ability to code different "use cases" in a clear and simple manner.
2. avoid double lookups if possible, without sacraficing #1.

The problem we all have with the current implentation is that in places #1 destroys #2 and vice-versa.

See my reply to Sean in another branch of this thread, it has the API I would most like to see, essentially the addition of a function to check and get an item without an exception. I believe this API satisfies #1 and #2 above.

Regan
November 06, 2005
kris wrote:
> 
> I agree. But I see you're insisting on force-fitting the [] syntax, resulting in a sub-optimal and overly busy API. All one needs is right here:
> 
> bool get(key, inout value);
> void put(key, value);
> 
> That is simple, robust, intuitive, optimal, proven, succinct. No redundant lookups. No pointers anywhere to be seen.
> 
> The [] syntax seriously limits D in the API it can expose for these purposes. Which is why it's messy at this point. And it's why you have chosen to suggest 4 methods, plus the use of pointers, whereas 2 simple methods are pefectly capable instead.

I like the [] and 'in' syntax as it was originally implemented, as it covered the majority of cases that I typically use dictionaries: either testing for existence or adding/modifying something already there.  I personally have never used the [] syntax, for example, in instances where I did not want a value to be created if one did not exist, assuming it's modifying an lvalue.  ie.

var[key]++;
var[key] = val;

The only sticky issue with this syntax is how to handle rvalue expressions:

x = var[key];

Does the above insert or merely return the init() value?  I would prefer the latter, but I can see how it would be confusing.  Assuming creation in all cases seems entirely reasonable to me, and it would be consitent with the C++ syntax.

That aside, I would like to see your proposed get/put syntax added as it is both meaningful and relatively succinct.


Sean
November 06, 2005
On Mon, 07 Nov 2005 10:13:22 +1300, Regan Heath wrote:


[snip]
> I think the types of things we want to do can be broken into categories:
> 
> 1. 'check' for existance of an item.
> 2. 'check' for existance of an item and get it.
> 3. 'get' value, error if not exists.
> 4. 'set' value, create or replace existing.
> [optional]
> 5. 'set' value if not existing, i.e. create only, don't replace.
> 6. 'set' value if existing, i.e. replace only, don't create.

Well said. I think you are on to a winner here.

[snip]

> I really dont mind what the syntax looks like, be it method style i.e. "value = aa.get(key)" or array style "value = aa[key]".

Totally agree with you. I'm not wedded to either syntax. However, we should really stop calling Associative Arrays, "arrays" if we drop the array syntax ;-)

> However, I think one consistent style is a good idea, and I don't think it's possible for the array style to represent the different intentions we have, which is why 'find' is essential.

Agreed. The array syntax only covers some of the desired behaviours one would want to see in a hash-table (a.k.a AA)

-- 
Derek Parnell
Melbourne, Australia
7/11/2005 9:11:40 AM
November 06, 2005
On Sun, 06 Nov 2005 13:23:40 -0800, kris wrote:

> I apologize for the long post. The salient point is right at the end, so please skip over the point/counterpoint argy-bargy.
> 
> 
> Derek Parnell wrote:
>> Currently in D, when one attempts to retrieve a non-existent element in an array, it causes a run-time error to occur. This applies to all array types: fixed-length, dynamic-length, and associative. (And yes, in the current D, an associative array is implemented as a hash-table.) The type of error depends on whether the -release switch has been used or not. If it has been used then a memory access violation occurs (ie. GPF under unix), otherwise if -release was not used an ArrayBoundsError exception is thrown.
> 
> I see you've bought into that. There is no such thing as an array-bounds error from the API of a hash-table, Derek. It's purely a manufactured idiom of the current API.

That may be true. However I am working with we we've got, knowing that change to D is such an unlikely thing that we'd really be better off building Beeblebrox's Probability Drive.

>> The problem is that you don't like this behaviour for associative arrays.
> 
> Really? If there's something I "don't like" here, it is an API that is problematic purely for the sake of using a particular syntax. You've read the tale of the Emporer's New Clothes, haven't you?

I see where you're coming from now. And I have to agree that the functionality of AAs is being restricted by a strict adherence to the 'array' style of syntax.

>> I assume that when trying to fetch a non-existent element you would either like the element to be automatically created with .init value(s) and/or to return some initialized value, or to always throw an ArrayBoundsError regardless of the -release status. Which is it you'd like to see happen?
> 
> I can't understand why you feel these are the only options, Derek. I agree those are perhaps the options when using array-syntax, but that's exactly where the problem lies. Neither of your two options are attractive; particularly so when they are purely artificial constraints.

Can you help me see another option? If one is trying to access an non-existent element, one either wants to know that it didn't exist or wants a default value returned. What else could there be?

>>>I failed miserably to get your drift here.
>> 
>> 
>> I apologize. Sometimes I'm not as good with words as I think I am.
>> 
>> I made the assumption that the library managed an AA, and that an public function was available that fetches data from that AA based on a supplied key in one of the parameters. I was just saying that if this is the case, then you'd be wise to validate the key data prior to fetching the AA based on the externally supplied key value.
> 
> This is hardly on topic, and smells of smoke. I have to remind you that you do not, and should not, require redundant lookups to check if an entry exists before fetching it from a hash-table.

Of course one does not *require* redundant run time lookups. I still think that I was 'on topic'. I thought the original post came about because you have a library routine that GPFed when presented with a non-existent key. My point was that **GIVEN THE TOOLS WE HAVE** you'd be wise to cater for the possibility of 'bad' keys. If we had other tools (for example, a better AA functionality) then you'd approach this topic differently.

>>>Hash tables are *not* like arrays. If they don't contain a key it is surely not a reason to GPF. Is it?
>> 
>> 
>> D's associative arrays are a specific type of hash table. The entries in the table are based on keys. And I agree, a GPF is only one of the possible implementation behaviors that are possible in response to a fetch attempt for an element that does not exist.
> 
> Well, thank goodness. But, "specific type"? It's just a plain old hash-table, with some unwieldly syntax bolted onto it. The latter is the problem, not the former.

Okay. 'Specific type' in the sense that some hash tables are only used to
detect the presence of the element keys, whereas other types of hash tables
associate non-key data with the elements.

>>>We're talking about this code causing a GPF:
>>>
>>>char[char[]] AA;
>>>
>>>char[] s = AA["unforseen key"];  // GPF; can't check for a null return
>>>
>> 
>> 
>> This is why you might benefit from Walter reestablishing this sort of behaviour in D - in addition to the current AA behaviour. Sophisticated coders such as yourself can use such facilities.
> 
> Sophisticated coders? Yer arse <g>. Hash tables are supposed to be trivial from the perspective of the user.

Totally agree.

> 
>>   char[] s = AA.initset("unforseen key");
>> 
>> Now you can check for s.length == 0 if that's important to you. Of course, that isn't always a perfect way of detecting unforseen key accesses.
>> 
> 
> Complexity just for the sake of it. This is entirely unecessary.

We agree to differ.

> 
>> In spite of the "double lookup" effect, I would still code it thus ...
>> 
>>   char[] s;
>>   if ("unforseen key" in AA)
>>      s = AA["unforseen key"];
>>   else
>>      -- some error processing if appropriate.
>> 
>> because it tells the reader of the code that it is possible to get bad keys and it implements a way to handle those unambiguously.
> 
> 
> I see. Pray explain why this alternate API is so appalling by comparison:
> 
>    char[] s;
> 
>    if (aa.get("unforseen key", s))
>        // do something with s
>    else
>       // do something else

It isn't appalling. Did I say that it was? In fact it is identical to my example, except for the syntax. I'm trying to discuss concepts, and not syntax.

> Look! No redundant lookups! No pointers! It must be magic! And, to quote you, "it tells the reader of the code that it is possible to get bad keys and it implements a way to handle those unambiguously". Wouldn't you agree?

Yes. It is identical to my code (bar the syntax).

>>>Let's try to stay in the land of reason here. Yes, you can come up with all sort of ways to /make/ it work with /multiple/ lookups. Walter suggested a way to do it with three lookups instead. I know you appreciate optimal code paths, Derek, so can we sidestep this please?
>> 
>> 
>> You might be misunderstanding me, now. I prize maintainable source code over runtime performance any day. If run time performance is really such an issue, code in assembler otherwise get back into the land of reason.
>> 
> 
> Entirely misleading. Look at the example above and reconsider. You appear to be trying to turn this into something unrelated, Derek. Please desist. Yes, there is a performance related aspect here, but only because you insist on applying entirely redundant lookups.
> 
> One can write perfectly clear intentions (arguably more so) by using an alternate, and more appropriate, API.

I agree. I didn't know your issue was with the syntax, as your original post was talking about GPFs and not syntax. I admit my mistake in not understanding your point of view regarding syntax.

>>>The above code has two lookups, where only one should be necessary. I sure hope you avoid multiple lookups within Build?
>> 
>> 
>> Not if I can help it ;-) By the way, Build runs pretty fast in spite of me 'wasting' cycles checking for valid AA keys.
> 
> 
> Build is a great tool. However, it appears as though Build takes longer to execute than both the compiler plus linker together.

That would be because it does a shit load of work before calling the bloody compiler and linker!

> It is not a high performance application, because it doesn't really need to be.

Exactly. An if it was, I'd definitely reconsider some of the coding idioms used.

>But
> that's hardly important since we're talking about API's here. Build is great at what it does ~ it does not represent every application.
> 
> Again, your statement vaguely implies that I argue against checking for valid AA keys. That's silly, Derek. I'm claiming that one can clearly and unambiguously both test the existence of, and avoid redundant lookups upon, a hash-table entry by using a more appropriate API.

That was the part that I didn't get. Sorry for the waste of bandwidth.

>> Isn't that what I said? Your code is performing a 'get' and not an 'enquiry'.
> 
> As far as HT's are concerned, a get is equivalent to a query.

Only for certain types of hash tables. If I want to get the data associated with a key I need to validate the key before getting the data.

> You can argue about seperating them all you wish, but you're simply arguing for redundant lookups. To avoid this is exactly why Walter is returning a pointer from the 'in' statement. Are you disagreeing with all perspectives?

No! Where did that come from?! I imagine that a pointe is being returned so that the coder can get access to data (not the key) when a valid key is presented.

> 
>>>Au contraire, my friend ~ unless you're prepared to perform unecessary multiple lookups. I don't consider redundant lookups to be relevant, and neither should anyone following this ridiculous saga.
>> 
>> 
>> I must be one of the clowns then. I don't follow your philosophy anymore. Cost of the application over time is more important to me than trivial optimizations. Trivial in the sense that if it doesn't account for more than 5% of a program's execution time, why optimize it to death. My philosophy regard this is more along the lines of code it legibly first, and then profile it to locate areas that are worth optimizing.
> 
> (the saga is ridiculous because it's years' old, whilst perfectly suitable alternatives have existed for decades)
> 
> You're welcome to do double lookups all you want, Derek. However, you're insisting that D remain staunchly oblivious to better alternatives.

How *do* you read this into my words? I cannot understand where I have said that the current D syntax is the best available and we should stop looking for better? I'm sure that anyone could discover with a short scan of previous posts, that I'm one of Walter's biggest critic. D is a great language but I'm one of the first to say that some decisions that Walter has made are terrible (IMNSHO), and that some other non-decisions are inexcusable.

> There's so much spin in your counter that I'm feeling dizzy. You're attempting to suggest I don't care a whit about legibility, and that any quest to avoid redundant code is misguided. That's utter nonsense.

As are the words you've placed into my posts ;-)

>> And you have measured this, right?
> 
> Nobody needs to measure it, Derek. If one executes two lookups where one would suffice, then one will expend close to twice the effort/time. It stands to reason. You're trying to argue that a single lookup somehow makes the code less clear (entirely false), therefore we should all use two lookups instead. It's a pointless argument. Please try to keep an open mind about alternative APIs.

My mind is not, and has never been closed (on that issue anyway). Of course two lookups are going to take longer than one lookup! But there are some situations that it doesn't actually matter.

>>>Please re-read. Array-syntax lookup /by itself/ is borked. It has to be used in conjunction with 'in', and is therefore superfluous (since 'in' supplies the data anyway).
>> 
>> 
>> Well not actually so. The "in" supplies the key and not the data. The data can be something totally different.
>> 
>>    real[char[]] AA;
>>    . . .
>> 
>>    real X;
>>    . . .
>>    if ("Some Key" in AA)
>>      X = AA["Some Key"];
>>    else
>>      -- Handle unknown key value.
> 
> 
> What is your point there? That statment makes no sense at all. Here's the pointer version of you example:
> 
> real[char[]] AA;
> ...
> real* x;
> ...
> x = ("Some Key" in AA);
> if (x)
>      // do something with *x
> else
>     // do something else
> 
> 
> And here's a robust, simple, efficient API, sans pointers:
> 
> real[char[]] aa;
> ...
> real x;
> ...
> if (aa.get("Some Key", x))
>      // do something with x
> else
>     // do something else
> 

The only difference is syntax. The concepts are the same.

>>>Sure; you can lookup the AA again if you wish, but your counter-argument is redundant; just like the additional lookup. As noted, the existence of s=AA[], by itself, encourages rather fragile code. D requires pointer-syntax to lookup an AA entry without GPFing (redundant multiple lookups aside).
>> 
>> 
>> D does not *require* pointer syntax. It is optional. But for completeness here is the pointer version.
> 
> 
> I'll repeat what I said above: "D requires pointer-syntax to lookup an AA entry without GPFing (redundant multiple lookups aside)"

Agreed.

> Your counter chooses to ignore the parenthisis. Restated: to avoid multiple lookups, and GPFs, one must use pointer syntax in D. Period.

Agreed.

>>>>What would have been nice is instead of Walter get all upset over people not liking his implementation, is to provide all four types of access.
>>>>
>>>>   'enquiry' ::  Key in Array (returns pointer to Value or Null)
>>>>   'get'     ::  Value = Array[Key] (Gets the Value if it exists,
>>>>                                     error otherwise)
>>>>   'set'     ::  Array[Key] = Value (Sets/Replaces the Value. Creates if
>>>>                                     it doesn't exist.
>>>>   'initget' ::  Value = Array.initset(Key) (Gets the Value if it exists,
>>>>                                          otherwise creates an entry with
>>>>                                          .init values.)
>>>>
>>>>Or any other equivalent syntax. The point is that there is no reason for the old behaviour to be totally removed from the language, just shifted away from being the default behaviour for 'Value = Array(Key)' syntax.
> 
> 
> I agree. But I see you're insisting on force-fitting the [] syntax, resulting in a sub-optimal and overly busy API.

Well actually it turns out that I was just trying to work within the constraints that Walter has implemented. I didn't know that you were really advocating syntax change. I wish for better functionality in AA's too.

> All one needs is right here:
> 
> bool get(key, inout value);
> void put(key, value);
> 
> That is simple, robust, intuitive, optimal, proven, succinct. No redundant lookups. No pointers anywhere to be seen.

Well, 'inout' implements a pointer, but that's splitting hairs.

> The [] syntax seriously limits D in the API it can expose for these purposes. Which is why it's messy at this point. And it's why you have chosen to suggest 4 methods, plus the use of pointers, whereas 2 simple methods are pefectly capable instead.

Yep. A new syntax for AA would be a wonderful addition to D.

-- 
Derek Parnell
Melbourne, Australia
7/11/2005 9:15:39 AM
November 06, 2005
Thanks ~ I'm really glad that's cleared up! Some replies inline:

Derek Parnell wrote:
> On Sun, 06 Nov 2005 13:23:40 -0800, kris wrote:

> That may be true. However I am working with we we've got, knowing that
> change to D is such an unlikely thing that we'd really be better off
> building Beeblebrox's Probability Drive.

Yes ~ there is that aspect. We should know better by now <g>

> Can you help me see another option? If one is trying to access an
> non-existent element, one either wants to know that it didn't exist or
> wants a default value returned. What else could there be?

Oh, I think one wants to know the entry does not exist; just via a more suitable API. I believe a "bool get(key, inout value)" style of function resolves those issues. I suspect you'd agree. It's something that various folks were asking for /eons/ ago. Worth revisiting, I felt.


> My point was that **GIVEN THE TOOLS WE HAVE** you'd be wise to cater for
> the possibility of 'bad' keys. If we had other tools (for example, a better
> AA functionality) then you'd approach this topic differently.

Point taken. Sorry for miscontruing your perspective.

> That would be because it does a shit load of work before calling the bloody
> compiler and linker!

Sure, it will take longer ~ I meant it takes about twice as long. Of course, that's neither here nor there since Build is such a great tool.

>>There's so much spin in your counter that I'm feeling dizzy. You're attempting to suggest I don't care a whit about legibility, and that any quest to avoid redundant code is misguided. That's utter nonsense.
> 
> 
> As are the words you've placed into my posts ;-)

Touche <g>

>>The [] syntax seriously limits D in the API it can expose for these purposes. Which is why it's messy at this point. And it's why you have chosen to suggest 4 methods, plus the use of pointers, whereas 2 simple methods are pefectly capable instead.
> 
> 
> Yep. A new syntax for AA would be a wonderful addition to D.

I feel it is actually more essential than that ~ but wholeheartedy agree otherwise. The original syntax could well have stayed intact, if it were bolstered via the addition of a "bool get(key, inout value)" method.
November 07, 2005
Sean Kelly wrote:
> 
> I would have preferred leaving the existing syntax as-is and adding a new method called 'find' or some such that returned a pointer to the element or null if it doesn't exist.
> 
> 
> Sean

If you mean adding an AA method similar to this:

# bool get(key, inout value);

... then I'd fully agree with you. I think adding such a method is a good way to satisfy/resolve so many different requirements, and tastes. That particular signature avoids pointer usage and redundant lookups. An alternative would be to twist the array syntax some more, to do the same thing:

# bool opArray(key, inout value);


I do like how Walter changed 'in' (returning a pointer), since that can be useful for integration with C functions. But the AA[] rvalue, x = AA["foo"], change could be reverted in the presence of that new method.
November 07, 2005
On Sun, 06 Nov 2005 19:57:15 -0800, kris <fu@bar.org> wrote:
> Sean Kelly wrote:
>>  I would have preferred leaving the existing syntax as-is and adding a new method called 'find' or some such that returned a pointer to the element or null if it doesn't exist.
>>   Sean
>
> If you mean adding an AA method similar to this:
>
> # bool get(key, inout value);
>
> ... then I'd fully agree with you. I think adding such a method is a good way to satisfy/resolve so many different requirements, and tastes. That particular signature avoids pointer usage and redundant lookups. An alternative would be to twist the array syntax some more, to do the same thing:
>
> # bool opArray(key, inout value);
>
>
> I do like how Walter changed 'in' (returning a pointer), since that can be useful for integration with C functions.

Good point, I hadn't thought of that. I was seeing this as redundant in the face of a 'get' function as shown above.

> But the AA[] rvalue, x = AA["foo"], change could be reverted in the presence of that new method.

Do you mean reverted all the way back to inserting on lookup? i.e.

x = AA["foo"];

So, this causes the creation and insertion of an item for the key "foo" (assuming none existed prior to the call) into AA?

I can't see what advantage that gives us over returning typeof(v).init and not inserting?

Regan
November 07, 2005
I would strongly argue that if you want such checking, you should have two versions of the library: one with -release, and one without.

For example, if I write a C program in release mode, and pass negative coordinates to a function that renders data to the screen (obviously, assuming it took 'int's), I would not be surprised if it crashed.  Nor would I complain to the makers of my compiler or library.  I am passing bad data.

It's clear you disagree; you want to catch every case of the bad data (even though, I'm entirely sure, there are places in your library where your OWN logic might cause bugs/crashes because of bad data.)

Back to the dual library concept, I think this is more of an argument for that than for changing the way associative arrays are handled *again*.  If I could have some way to compile my D program with the contacts-on version of phobos, I'm sure that would be a great gain.

Anyway, your arguments are also flawed, as follows:

1. This is true.

2. This is true, only in the case that you use release mode and want to avoid GPFs when bad data is provided (assuming that a GPF can be stack traced, either by code in the program or a debugger.  That is outside this issue, so we assume reasonable-case.)

3. This is not true.  Having a bike, even if you need to use it to get gas sometimes, does not render your car redundant nor useless.  Even if gas prices are so high that you cannot use the car, that does not mean your wife would appreciate you selling it.  As argued elsewhere, the usage of the in statement does not necessitate using pointers to access the data at all.

4. Obviously, this is a bizarre conclusion to make.  For your uses, surely we might agree that the array-like syntax isn't commonly useful, but for other usage - indeed, for common usage - I really can't see such a wild statement being true.

Furthermore, saying that having an array-style syntax is an invitation to writing bad code is something some can (and have) said about arrays, pointers, classes, class-less functions, couches, and generally everything else.  Yes, for your uses of it, an inexperienced novice might fall into bad habits with such syntax, but that does not again mean it applies everywhere.

-[Unknown]