Jump to page: 1 24  
Page
Thread overview
[Vote] Should defining opCmp to be inconsistent with opEquals be an acceptable practice?
Aug 06, 2005
Stewart Gordon
Aug 07, 2005
Regan Heath
Aug 08, 2005
Stewart Gordon
Aug 08, 2005
Regan Heath
Aug 09, 2005
Stewart Gordon
Aug 09, 2005
Regan Heath
Aug 09, 2005
Stewart Gordon
Aug 09, 2005
Regan Heath
Aug 10, 2005
Stewart Gordon
Aug 10, 2005
Regan Heath
Aug 07, 2005
Ben Hinkle
Aug 08, 2005
Burton Radons
Aug 08, 2005
Ben Hinkle
Aug 07, 2005
Manfred Nowak
Aug 08, 2005
Stewart Gordon
Aug 08, 2005
Manfred Nowak
Aug 08, 2005
Stewart Gordon
Re: [Vote] Should defining opCmp to be inconsistent with opEquals be
Aug 08, 2005
Sean Kelly
Aug 08, 2005
Sean Kelly
Aug 08, 2005
Stewart Gordon
Re: [Vote] Should defining opCmp to be inconsistent with opEquals
Aug 08, 2005
Sean Kelly
Aug 09, 2005
Stewart Gordon
Aug 09, 2005
Sean Kelly
Aug 09, 2005
Manfred Nowak
Aug 09, 2005
Sean Kelly
Aug 09, 2005
Derek Parnell
Aug 09, 2005
Stewart Gordon
Aug 09, 2005
Sean Kelly
Aug 09, 2005
Derek Parnell
Aug 08, 2005
Sean Kelly
Aug 22, 2005
Manfred Nowak
August 06, 2005
It is possible to define the opCmp and opEquals methods so that unequal objects can rank equally in order.  This is allowed by the Java specs, and indeed one or two Java API classes do this.

However, when this is done in D, the current implementation of associative arrays falls apart.

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/4556

The problem is that the AA implementation assumes that if opCmp returns zero then the keys match.  As far as I can see, the spec doesn't in any way forbid behaviour to the contrary.

But there seems to be room for debate on what the best solution is. Possibilities:

1. State in the spec that it is an incorrect practice to define opCmp to return 0 if opEquals returns true.

2. Keep it as an acceptable practice, but warn people properly that such a type may not work correctly as an AA key.

3. Keep it as an acceptable practice, and fix the AA implementation to work on such types.  Before you dismiss this idea as inefficient, please read

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/4710

Cast your votes now!

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on on the 'group where everyone may benefit.
August 07, 2005
On Sat, 06 Aug 2005 23:10:26 +0100, Stewart Gordon <smjg_1998@yahoo.com> wrote:
> It is possible to define the opCmp and opEquals methods so that unequal objects can rank equally in order.

I see in your first example/link you're defining a character class for which opEquals does a case-sensitive compare, but opCmp does a case insensitive comparrison. The goal, I assume, is to have characters/strings that sort/rank in a case insensitive fashion but are only exactly equal if the case is the same. Am I right?

A recent thread on the array 'sort' operator has made me think that perhaps we can apply the same idea to AA's. That idea is the ability to assign a 'sort'/'opCmp' method so the array.

So, we pick:

> 1. State in the spec that it is an incorrect practice to define opCmp to return 0 if opEquals returns true.

and instead allow the following...

class IChar
{
  static int opCmpCase(IChar lhs, IChar rhs) {}
  ..etc..
}

int[IChar] list;
list.opCmp = &IChar.opCmpCase;
..add items to 'list'..

and/or allow assignment of a delegate, eg.

class ICharSort
{
  int[] sortOrder; //by name, then by date, then by ..
  int opCmp(IChar lhs, IChar rhs) {}
}

ICharSort s = new ICharSort();
..define sortOrder in 's'..

int[IChar] list;
list.opCmp = &s.opCmp;

..add items to 'list'..

to allow us to sort/rank in a more complex fashion.

For an AA it could either be illegal, or would cause a rehash to assign the delegate when it contains items. For an array it wouldn't matter (except that it would sort of invalidate the state i.e. if you sort, then assign it's not technically 'sorted' by it's current operator any more)

The advantage here is that we have only one array/AA implementation and there should be little (no?) performance penalty using types that do or don't have this requirement.

Of course one argument against this idea may be that the class ('IChar' in this example) should never be sorted/ranked in any other way. In which case...

Why can't we have a 3rd operator opRank (or opSort) - definable in IChar, and make the the array/AA smart. The array/AA could, upon creation, assign a comparrison delegate to this new operator if present and opCmp otherwise.

In this way there is no additional performance penalty (as it only checks once, upon creation, not every comparrison)

Thoughts?

Regan
August 07, 2005
> 1. State in the spec that it is an incorrect practice to define opCmp to return 0 if opEquals returns true.
>
> 2. Keep it as an acceptable practice, but warn people properly that such a type may not work correctly as an AA key.
>
> 3. Keep it as an acceptable practice, and fix the AA implementation to work on such types.

My first preference would be 1 since I think it would be a maintenance nightmare to allow values such that x<=y and y<=x but x!=y. It takes me back to my TA days in freshman calculus and the algebra abuse that students are capable of. But given that D has the <> and !<> operators presumably to allow users to distinguish those concepts from != and == I'll vote for 2. So either I say go with 1 and look hard at !<> and <> or go with 2 and say that if you design a type to use <> and !<> that you should give big blinking warnings.

ps - what does <>= do? It is a fun way of checking if an opCmp throws?


August 07, 2005
Stewart Gordon <smjg_1998@yahoo.com> wrote:

My vote is:

> 3. Keep it as an acceptable practice, and fix the AA implementation to work on such types.

I already mentioned, that the AA-implementation is broken: http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/22170

| Hashing needs at least one hash function, an _equal_ operator | and one of these: an upper bound on the number of elements or a | rehash-operation.

The opCmp is not needed for hashing. The current Implementation uses opCmp for traversing its unbalanced binary trees, which can degenerate to linear lists, depending on the quality of toHash and the insertion sequence.

Degenerated AA's will be terrible slow, because in the average case a lookup will traverse half of the elements in the AA, whereas a pure hash with conflict resolution by jumping can be guaranteed to bind the lookup-time by a constant, i.e. independent of the number of elements stored.

Because only opEqual is needed there is no necessity to restrict or warn.

The drawback is, that if removing must be supported the space requirements are temporarily increased instead of decreased.

So clearly choice 3 with a clear preference for a pure hash.

-manfred
August 08, 2005
2, I don't care what incorrect code does and I don't want any speed penalty from confirming that I'm writing correct code when I always test every line I write.

Object.opCmp should be changed to return float, so that it can use the semantics the compiler actually applies to the return value of comparison operators.  In that case 0 does mean equality - float.nan means unordered.

This change should've been made years ago, and the operator overloading page should be fixed up as well.  Someone send Walter a patch.

Ben Hinkle wrote:
> ps - what does <>= do? It is a fun way of checking if an opCmp throws? 

The throw semantics are irreproducible, and anyway either the standard or the implementation is wrong - it doesn't throw and I don't think there's a Phobos way to enable FP exceptions.

But directly related to your question, it returns whether the values are not unordered.  So "float.nan <>= float.nan" is false, while "4 <>= 4" is true.
August 08, 2005
"Burton Radons" <burton-radons@smocky.com> wrote in message news:dd6alf$17kj$1@digitaldaemon.com...
> 2, I don't care what incorrect code does and I don't want any speed penalty from confirming that I'm writing correct code when I always test every line I write.
>
> Object.opCmp should be changed to return float, so that it can use the semantics the compiler actually applies to the return value of comparison operators.  In that case 0 does mean equality - float.nan means unordered.
>
> This change should've been made years ago, and the operator overloading page should be fixed up as well.  Someone send Walter a patch.
>
> Ben Hinkle wrote:
>> ps - what does <>= do? It is a fun way of checking if an opCmp throws?
>
> The throw semantics are irreproducible, and anyway either the standard or the implementation is wrong - it doesn't throw and I don't think there's a Phobos way to enable FP exceptions.
>
> But directly related to your question, it returns whether the values are not unordered.  So "float.nan <>= float.nan" is false, while "4 <>= 4" is true.

True. I was talking about <>= in the context of opCmp for objects. Since opCmp currently returns an int <>= only tests if the opCmp throws. I agree if opCmp returned a float then we can start giving <>= meaning for objects. Note also that nan behavior is different from opCmp and opEqual implementing different notions of equality. To see what I mean concretely about <>= and the semantics of <> and != try running

class X {
  int val;
  int opCmp(Object y) {
    X y2 = cast(X)y;
    int diff = val - y2.val;
    if (diff < 5 && -diff < 5)
      return 0;
    return diff;
  }
  int opEqual(Object y) {
    X y2 = cast(X)y;
    return val == y2.val;
  }
}
int main() {
  X a = new X;
  X b = new X;
  a.val = 10;
  b.val = 12;
  assert( a <= b );
  assert( a >= b );
  assert( a != b );
  assert( a !<> b );
  assert( a <>= b ); // should always be true
  return 0;
}


August 08, 2005
Stewart Gordon wrote:
<snip>
> 1. State in the spec that it is an incorrect practice to define opCmp to return 0 if opEquals returns true.
<snip>

Oops, I meant if opEquals returns false.  But I'm guessing that anyone who voted for this option misread it as this anyway....

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on on the 'group where everyone may benefit.
August 08, 2005
Regan Heath wrote:

> On Sat, 06 Aug 2005 23:10:26 +0100, Stewart Gordon <smjg_1998@yahoo.com>  wrote:
> 
>> It is possible to define the opCmp and opEquals methods so that unequal  objects can rank equally in order.
> 
> I see in your first example/link you're defining a character class for  which opEquals does a case-sensitive compare, but opCmp does a case  insensitive comparrison. The goal, I assume, is to have characters/strings  that sort/rank in a case insensitive fashion but are only exactly equal if  the case is the same. Am I right?

Yes.  But my example was on the contrived side, but maybe the idea can be put to practical use.

> A recent thread on the array 'sort' operator has made me think that  perhaps we can apply the same idea to AA's. That idea is the ability to  assign a 'sort'/'opCmp' method so the array.

I personally prefer the idea of having array.sort(cmp) available here. Not only is writing

    array.sort(&cmpByDate);

shorter than

    array.opCmp = &cmpByDate;
    array.sort;

but it also doesn't bloat memory requirements.

> So, we pick:
> 
>> 1. State in the spec that it is an incorrect practice to define opCmp to  return 0 if opEquals returns true.
> 
> and instead allow the following...
> 
> class IChar
> {
>   static int opCmpCase(IChar lhs, IChar rhs) {}
>   ..etc..
> }
> 
> int[IChar] list;
> list.opCmp = &IChar.opCmpCase;
> ..add items to 'list'..

I'm not sure opCmp is the best name - it looks like you're setting the function that will be used to compare this AA with another AA.  Maybe call it comparator?

That said, what's the point of this?  To allow some AAs to be case-sensitive and others to be case-insensitive?  If you're going to do this, do we gain much over doing assignments and lookups to aa[toupper(key)]?

If we solved the problem by giving every AA a comparator property, then every AA that uses such a keytype would need to set the comparator anyway.  So why not go the whole hog and let the type define the AA comparator?

<snip>
> Why can't we have a 3rd operator opRank (or opSort) - definable in IChar,  and make the the array/AA smart. The array/AA could, upon creation, assign  a comparrison delegate to this new operator if present and opCmp otherwise.
<snip>

But calling it opSort would imply using it for .sort as well.  This is pointless IMO - normal opCmp can be used for this.  If you define opCmp such that unequal objects can rank equally, then you're stating that when sorting, you don't care what order equally-ranking objects end up in.

Maybe call it opStrictCmp.  AAs (and any other applications relying on a comparator consistent with opEquals) can use this, and the default .sort can use plain opCmp.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on on the 'group where everyone may benefit.
August 08, 2005
Manfred Nowak wrote:

> Stewart Gordon <smjg_1998@yahoo.com> wrote:
> 
> My vote is:
> 
>> 3. Keep it as an acceptable practice, and fix the AA
>> implementation to work on such types.
> 
> I already mentioned, that the AA-implementation is broken:
> http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/22170
> 
> | Hashing needs at least one hash function, an _equal_ operator
> | and one of these: an upper bound on the number of elements or a
> | rehash-operation.
> 
> The opCmp is not needed for hashing. The current Implementation uses opCmp for traversing its unbalanced binary trees, which can degenerate to linear lists, depending on the quality of toHash and the insertion sequence.

That's why it talks about calling .rehash after populating the AA.  But yes, it doesn't specify how thoroughly it should reorganise the array. I guess a sensible approach (under the current AA imp) is to recalculate all the hash values, allocate the optimum number of hash slots and balance the binary trees.

> Degenerated AA's will be terrible slow, because in the average case a lookup will traverse half of the elements in the AA, whereas a pure hash with conflict resolution by jumping can be guaranteed to bind the lookup-time by a constant, i.e. independent of the number of elements stored.
<snip>

You have a point.  But what do you mean by jumping, exactly?

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on on the 'group where everyone may benefit.
August 08, 2005
In article <dd3cea$26u7$1@digitaldaemon.com>, Stewart Gordon says...
>
>It is possible to define the opCmp and opEquals methods so that unequal objects can rank equally in order.  This is allowed by the Java specs, and indeed one or two Java API classes do this.

Yes.  I do this in C++ all the time and consider it an important feature.

>However, when this is done in D, the current implementation of associative arrays falls apart.
>
>http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/4556
>
>The problem is that the AA implementation assumes that if opCmp returns zero then the keys match.  As far as I can see, the spec doesn't in any way forbid behaviour to the contrary.

That's because they do match, at least as far as an ordered binary tree are concerned.  The C++ standard defines this as equivalence, ie. two objects have the same sort order.

>But there seems to be room for debate on what the best solution is. Possibilities:
>
>1. State in the spec that it is an incorrect practice to define opCmp to return 0 if opEquals returns true.

I'm not sure I like this.  While I generally make equivalence a looser match than equality, it's reasonable that an application might violate this condition.

>2. Keep it as an acceptable practice, but warn people properly that such a type may not work correctly as an AA key.

Why wouldn't it work properly?

>3. Keep it as an acceptable practice, and fix the AA implementation to work on such types.  Before you dismiss this idea as inefficient, please read
>
>http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/4710

If hash maps are going to change to use equality rather than equivalence then it has to happen soon.  But frankly, what I really want is a way to supply a comparison delegate on declaration.


Sean


« First   ‹ Prev
1 2 3 4