August 08, 2005
In article <dd3cea$26u7$1@digitaldaemon.com>, Stewart Gordon says...
>
>It is possible to define the opCmp and opEquals methods so that unequal objects can rank equally in order.  This is allowed by the Java specs, and indeed one or two Java API classes do this.
>
>However, when this is done in D, the current implementation of associative arrays falls apart.

Oh, I should mention that all my prior experience with hash maps has been that they use equality for key comparisons, not equivalence.  I was a bit surprised when I learned D's implementation doesn't work this way.  While binary trees might be a tad faster in some cases, I'm not sure I like that anything I hash must be sortable.


Sean


August 08, 2005
In article <dd7sau$2ktr$1@digitaldaemon.com>, Sean Kelly says...
>
>In article <dd3cea$26u7$1@digitaldaemon.com>, Stewart Gordon says...
>
>>1. State in the spec that it is an incorrect practice to define opCmp to return 0 if opEquals returns true.
>
>I'm not sure I like this.  While I generally make equivalence a looser match than equality, it's reasonable that an application might violate this condition.

My response was poorly worded (one of these days I'll learn not to post right after waking up).  But now this has me confused.  Are you saying that the spec should forbid two objects from being equivalent if they are equal?  Or did you mean "if opEquals returns false?"  Either way, I don't like it.


Sean


August 08, 2005
Sean Kelly wrote:

<snip>
>>2. Keep it as an acceptable practice, but warn people properly that such a type may not work correctly as an AA key.
> 
> Why wouldn't it work properly?
<snip>

Try the example in the post I linked to above and you'll see.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on on the 'group where everyone may benefit.
August 08, 2005
In article <dd7udk$2n7b$1@digitaldaemon.com>, Stewart Gordon says...
>
>Sean Kelly wrote:
>
><snip>
>>>2. Keep it as an acceptable practice, but warn people properly that such a type may not work correctly as an AA key.
>> 
>> Why wouldn't it work properly?
><snip>
>
>Try the example in the post I linked to above and you'll see.

I don't have writefln available in my current setup :p.  But it looks like the point you were trying to make is that 'C' and 'c' are considered equivalent keys, which I would consider correct behavior given the current implementation. Whether or not hash maps should be changed to use unordered slists for collisions so opEquals can be used instead is a separate issue IMO.


Sean


August 08, 2005
Stewart Gordon <smjg_1998@yahoo.com> wrote:

[...]
> allocate the optimum number of hash slots and balance the binary trees.

Optimal number of hash slots under which target function? And if you have to press any balancing algorithm into code for the sake of collision resolution, that would imply, that you do not trust the hashing part. Then: why use hashing at all?

[...]
> But what do you mean by jumping, exactly?

If you find that the hash slot provided by toHash is already occupied, you choose the next slot to probe for occupation by increasing the index of the current hash slot by a given amount, i.e. jump to the next hash slot. All you have to guarantee is that if there is an empty hash slot you can find it by jumping.

Up to a fill rate of about 0.9 thoretical results show, that the length of the jump distance is unimportant and the length of the average probing sequnence in search for an empty hash slot can be bound by the reciprocal of 1 minus the fill rate, i.e. 10 in the case of a fill rate of 0.9.

-manfred
August 08, 2005
On Mon, 08 Aug 2005 12:31:59 +0100, Stewart Gordon <smjg_1998@yahoo.com> wrote:
>> A recent thread on the array 'sort' operator has made me think that  perhaps we can apply the same idea to AA's. That idea is the ability to  assign a 'sort'/'opCmp' method so the array.
>
> I personally prefer the idea of having array.sort(cmp) available here. Not only is writing
>
>      array.sort(&cmpByDate);
>
> shorter than
>
>      array.opCmp = &cmpByDate;
>      array.sort;
>
> but it also doesn't bloat memory requirements.

Sure. But in the case of an AA we need to assign an operator/function/delegate for it to use in it's internal ordering on every insert/lookup/etc.

>> So, we pick:
>>
>>> 1. State in the spec that it is an incorrect practice to define opCmp to  return 0 if opEquals returns true.
>>  and instead allow the following...
>>  class IChar
>> {
>>   static int opCmpCase(IChar lhs, IChar rhs) {}
>>   ..etc..
>> }
>>  int[IChar] list;
>> list.opCmp = &IChar.opCmpCase;
>> ..add items to 'list'..
>
> I'm not sure opCmp is the best name - it looks like you're setting the function that will be used to compare this AA with another AA.  Maybe call it comparator?

We can call it whatever.

> That said, what's the point of this?  To allow some AAs to be case-sensitive and others to be case-insensitive?  If you're going to do this, do we gain much over doing assignments and lookups to aa[toupper(key)]?

To allow you to store your weird objects for which opEquals and opCmp disagree.

> If we solved the problem by giving every AA a comparator property, then every AA that uses such a keytype would need to set the comparator anyway.  So why not go the whole hog and let the type define the AA comparator?

Sure, that's exactly what I suggest below.

> <snip>
>> Why can't we have a 3rd operator opRank (or opSort) - definable in IChar,  and make the the array/AA smart. The array/AA could, upon creation, assign  a comparrison delegate to this new operator if present and opCmp otherwise.
> <snip>
>
> But calling it opSort would imply using it for .sort as well.

But that's _exactly_ what it is used for, ranking/sorting the object.

I'm suggesting that opEquals and opCmp should _always_ agree and opSort may disagree with opCmp. AA's would use opCmp, sort would use opSort. This lets you use them in AA's and sort them how you want in normal arrays.

You suggest the opposite below, either way a 3rd operator to handle your unusual type with AA's and arrays automagically picking up and using the new operator when poresent is the solution I like best.

> This is pointless IMO - normal opCmp can be used for this.  If you define opCmp such that unequal objects can rank equally, then you're stating that when sorting, you don't care what order equally-ranking objects end up in.
>
> Maybe call it opStrictCmp.  AAs (and any other applications relying on a comparator consistent with opEquals) can use this, and the default .sort can use plain opCmp.

Regan
August 09, 2005
Sean Kelly wrote:
> In article <dd7udk$2n7b$1@digitaldaemon.com>, Stewart Gordon says...
<snip>
>> Try the example in the post I linked to above and you'll see.
> 
> I don't have writefln available in my current setup :p. 

Update your compiler, or change the example to use printf.

> But it looks like the
> point you were trying to make is that 'C' and 'c' are considered equivalent
> keys, which I would consider correct behavior given the current implementation.

Correct behaviour is determined by the spec, not by some implementaion.

The natural assumption is that keys match by equality, not equivalence.

> Whether or not hash maps should be changed to use unordered slists for
> collisions so opEquals can be used instead is a separate issue IMO.

Indeed, separate from whether they should be fixed to use opEquals as well.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on on the 'group where everyone may benefit.
August 09, 2005
Regan Heath wrote:
<snip>
> I'm suggesting that opEquals and opCmp should _always_ agree and opSort  may disagree with opCmp. AA's would use opCmp, sort would use opSort. This lets you use them in AA's and sort them how you want in normal arrays.
<snip>

Which would the < <= > >= operators use under your proposal?  Since this is akin to sorting, it would make sense to use opSort, but then it becomes confusing that there is a method called opCmp but it doesn't do this.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on on the 'group where everyone may benefit.
August 09, 2005
On Tue, 09 Aug 2005 12:06:53 +0100, Stewart Gordon <smjg_1998@yahoo.com> wrote:
> Regan Heath wrote:
> <snip>
>> I'm suggesting that opEquals and opCmp should _always_ agree and opSort  may disagree with opCmp. AA's would use opCmp, sort would use opSort. This lets you use them in AA's and sort them how you want in normal arrays.
> <snip>
>
> Which would the < <= > >= operators use under your proposal?

opCmp

> Since this is akin to sorting, it would make sense to use opSort

I disagree. You're comparing not sorting thus opCmp makes sense.

More important however is _how_ you want to compare in the general case with this special type, eg.

IChar a = 'a';
IChar b = 'B';

if (a < b) { //true or false? }

Assume for the sake of argument:
  'a' = 0x61 or 91 decimal
  'B' = 0x42 or 66 decimal

Is the statement true or false?

Does the answer depend on context (if so, which is more common case or no case?) or will this type always be compared in one way?

If you generally want to do a case-insensitive compare then you're defining an opCmp that AA's cannot use.
If you want to do case-sensitive compare generally then you're defining an opCmp that AA's can use.

I suspect you want the former, right?

In which case the solution might be to have another operator (as you suggested) opCmpStrict or something used by AA's. This is what I referred to when I said "either way" earlier. This operator can be found and used automagically by the AA without requiring a change to the 'balanced binary tree' part of the implementation.

Regan

p.s. My intention in starting this part of the thread was to explore solutions that don't involve changing the AA implementation. I'm not saying that's not the solution, I'm just exploring...
August 09, 2005
In article <dda28h$21m5$1@digitaldaemon.com>, Stewart Gordon says...
>
>Sean Kelly wrote:
>>
>> But it looks like the
>> point you were trying to make is that 'C' and 'c' are considered equivalent
>> keys, which I would consider correct behavior given the current implementation.
>
>Correct behaviour is determined by the spec, not by some implementaion.

I disagree.  DMD is a reference implementation, so where the spec is vague DMD should serve as an authoritative refernce on expected behavior.


Sean