WAT: opCmp and opEquals woes (page 10) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » WAT: opCmp and opEquals woes (page 10)

July 25, 2014

Re: WAT: opCmp and opEquals woes

Posted by Walter Bright
in reply to Regan Heath

Walter Bright

Posted in reply to Regan Heath

On 7/25/2014 4:10 AM, Regan Heath wrote:
> Sure, Andrei makes a valid point .. for a minority of cases.  The majority case
> will be that opEquals and opCmp==0 will agree.  In those minority cases where
> they are intended to disagree the user will have intentionally defined both, to
> be different.  I cannot think of any case where a user will intend for these to
> be different, then not define both to ensure it.

You've agreed with my point, then, that autogenerating opEquals as memberwise equality (not opCmp==0) if one is not supplied will be correct unless the user code is already broken.

July 25, 2014

Re: WAT: opCmp and opEquals woes

Posted by Ola Fosheim Grøstad
in reply to H. S. Teoh

Ola Fosheim Grøstad

Posted in reply to H. S. Teoh

On Friday, 25 July 2014 at 20:13:57 UTC, H. S. Teoh via Digitalmars-d wrote:
> Nobody said anything
> about making it *impossible* to define non-linear orderings.

But opCmp() already make it impossible to define some binary relations that are order-like for <, <= etc. You cannot return both -1 and 1 at the same time. And >= is defined a the complement of <. What does 0 signify, does it signify equality or that two values cannot be ordered? Take for instance intervals:

Defining [0,1] < [2,3] makes sense, but should [1,2] vs [0,3] give 0, or is that only for [1,2] vs [1,2]?

It makes a lot of sense to provide "sortability traits" for the type, such as total-order etc, but then one shouldn't limit the implementation of operators like this without tying it to a trait of some kind. Either the presence of opCmp is a trait of the type, or the trait has to be provided by some other means.

Related to some other comments in the thread, some sort algorithms designed for scenarios where you have many similar values need equality for proper or efficient sorting. Like quicksort implementations where you partition into 3 sets (left, pivot-equality, right). On most CPUs you get both less-than and equality information from a single compare so how you implement opCmp and sorting is crucial for performance…

July 25, 2014

Re: WAT: opCmp and opEquals woes

Posted by H. S. Teoh
in reply to Jonathan M Davis

H. S. Teoh

Posted in reply to Jonathan M Davis

On Fri, Jul 25, 2014 at 06:56:40PM +0000, Jonathan M Davis via Digitalmars-d wrote: [...]
> So, if we remove the new check for a user-defined opEquals when opCmp is defined, then you don't have to define opEquals.

This is even worse, since it may silently introduce runtime breakage. The original type may have had opCmp defined just so it can be used with AA's, and now after the upgrade to 2.066, suddenly the custom opCmp code they wrote specifically to work with AA's doesn't get used anymore, yet the compiler silently accepts the code, and the problem won't be found until runtime.

> If we do what H.S. Teoh suggests, then you'll have to define it if you want to avoid the additional checks that opCmp would be doing that opEquals wouldn't do, but if you didn't care, then you wouldn't.

Which IMO fits the D motto of being correct first, and performant if you ask for it.

> If we leave it as it is in git master, then you'd always have to define it if you defined opCmp and wanted to use it as an AA key, and since opCmp was used for AA keys before, that means that _every_ type which didn't define opEquals but was used as an AA key will suddenly have to define opEquals and toHash and will thus now be broken. So, the current situation in git master is the worst all around.
[...]

Yes, it's clear that *something* need to be done about the current situation in git master. It's just a question of what.

Having said that, though, I do agree that the current situation in git master is the most *pedantically* correct, in the sense that it now properly enforces correct usage of opCmp/opEquals/toHash. If we were starting from scratch, I would definitely vote for the current behaviour. The problem, however, is that we have to deal with the historical baggage of existing code that was written to conform to the old buggy AA design, and we'd like to minimize gratuitous code breakage. The current situation in git master is a very poor way of handling this.

If people are opposed to making the default opEquals to be opCmp()==0
(if the user defines opCmp), could it at least be used as a deprecation
path for phasing out the old usage of opCmp for AA keys and migrating to
opEquals? That is, we introduce opEquals = (opCmp()==0) for 2.066 in
order to minimize code breakage, but put a deprecation notice on it, and
then remove it in 2.067 (or whatever the release will be given the
current deprecation cycle).

T

-- 
MASM = Mana Ada Sistem, Man!

July 25, 2014

Re: opCmp and opEquals woes

Posted by Walter Bright
in reply to Manu

Walter Bright

Posted in reply to Manu

On 7/25/2014 5:15 AM, Manu via Digitalmars-d wrote:
> On 25 July 2014 22:06, Jacob Carlborg via Digitalmars-d
>     By Walter and Andrei's definition opCmp is not to be used for equivalent,
>     therefor opCmp does never need to be equal to 0.
> Yes it does, <= and >= are both things that you can type.

Incorrect, as an object may not even have a notion of equality. Nothing requires opCmp to ever return 0.

Of course, such an opCmp would never have worked with AAs anyway.

July 25, 2014

Re: WAT: opCmp and opEquals woes

Posted by Walter Bright
in reply to Jacob Carlborg

Walter Bright

Posted in reply to Jacob Carlborg

On 7/25/2014 4:18 AM, Jacob Carlborg wrote:
> On 25/07/14 11:46, Jonathan M Davis wrote:
>
>> Code that worked perfectly fine before is now slower, because it's using
>> opCmp for opEquals when it wasn't before.
>
> Who says opCmp need to be slower than opEquals.

Consider:

   struct S { int a,b; }

   int opCmp(S s2) {
        return (a == s.a) ? s.b - b : s.a - a;
   }

   bool opEquals(S s2) {
        return *cast(long*)&this == *cast(long*)&s2;
   }

Because of byte ordering variations, the cast trick wouldn't work reliably for opCmp.

Do people do such things? Yes, since opEquals can very likely be in critical performance loops.

July 25, 2014

Re: WAT: opCmp and opEquals woes

Posted by Walter Bright
in reply to Marc Schütz

Walter Bright

Posted in reply to Marc Schütz

On 7/25/2014 12:03 PM, "Marc Schütz" <schuetzm@gmx.net>" wrote:
> No, if a type had only defined opCmp because of the previous AA
> (mis)implementation,

It was not a misimplementation. The previous implementation used a hash lookup with a binary tree for collisions, hence it needed cmp. It was perfectly correct. The newer one uses a linear list for collisions, hence it only needs ==.

July 25, 2014

Re: WAT: opCmp and opEquals woes

Posted by H. S. Teoh
in reply to Marc Schütz

H. S. Teoh

Posted in reply to Marc Schütz

On Fri, Jul 25, 2014 at 07:10:50PM +0000, via Digitalmars-d wrote:
> On Friday, 25 July 2014 at 18:54:15 UTC, Daniel Gibson wrote:
[...]
> >Well, to be fair the documentation, is pretty explicit about it, the headings are "Overloading == and !=" and "Overloading <, <=, <, and <=".
> 
> Whatever the outcome of the discussion will be, it needs to be documented much better. The current documentation doesn't say anything about whether or not, and how opEquals and opCmp relate. I doesn't even mention that they are supposed to be consistent.
> 
> I'm just afraid that it will not be noticed, because it will be "hidden" in the documentation. If the status quo is kept, you just won't know you've written wrong code, even though the compiler has all the means to tell you.
> 
> >The D1 documentation even had a rationale why there's both opEquals and opCmp, no idea why that was dropped for D2.
> >
> >However, I read about opCmp at some time and in the meantime forgot about the "not for ==" part - but this is probably a problem with my brain (or the long timespan) and not with the documentation.
> 
> Well, you're not the only one :-(

Yeah, we definitely need to improve the docs so that the distinction is clearer.


T

-- 
Дерево держится корнями, а человек - друзьями.

July 25, 2014

Re: WAT: opCmp and opEquals woes

Posted by H. S. Teoh
in reply to Walter Bright

H. S. Teoh

Posted in reply to Walter Bright

On Fri, Jul 25, 2014 at 02:05:13PM -0700, Walter Bright via Digitalmars-d wrote:
> On 7/25/2014 12:03 PM, "Marc Schütz" <schuetzm@gmx.net>" wrote:
> >No, if a type had only defined opCmp because of the previous AA
> >(mis)implementation,
> 
> It was not a misimplementation. The previous implementation used a hash lookup with a binary tree for collisions, hence it needed cmp. It was perfectly correct. The newer one uses a linear list for collisions, hence it only needs ==.

So it sounds like Jonathan's solution is the best after all: get rid of the error message that demands that opEquals be defined when opCmp is defined.  Previous code that defined opCmp for AA keys will then continue to work (except if their opCmp was inconsistent with the default opEquals, in which case they are already buggy and we're not making things worse).

On a related note, it would be nice if AA key types that did define opCmp can have better than linear collision resolution. But that belongs in another discussion.

On another related note, the compiler's treatment of opCmp is inconsistent:

	struct S {
		int opCmp(S s) { return 0; }
	}
	int[S] aa1;	// OK

	struct T {
		int opCmp(T s) const { return 0; }
	}
	int[T] aa2;	// Compile error: must also define opEquals

Worse yet:

	struct S
	{
		int x;
		int opCmp(S s) /*const*/ {
			return s.x - x;
		}
	}

	void main() {
		auto s1 = S(1);
		auto s2 = S(2);

		assert(s1 > s2);
		assert(s2 < s1);

		assert(typeid(s1).compare(&s1, &s2) > 0);
		assert(typeid(s2).compare(&s2, &s1) < 0);
	}

This produces a runtime error:

	object.Error@(0): TypeInfo.compare is not implemented

Uncommenting the const in opCmp fixes the problem.  WAT?

T

-- 
Без труда не выловишь и рыбку из пруда.

July 25, 2014

Re: WAT: opCmp and opEquals woes

Posted by Jonathan M Davis
in reply to Walter Bright

Jonathan M Davis

Posted in reply to Walter Bright

On Friday, 25 July 2014 at 21:01:49 UTC, Walter Bright wrote:
> On 7/25/2014 4:18 AM, Jacob Carlborg wrote:
>> On 25/07/14 11:46, Jonathan M Davis wrote:
>>
>>> Code that worked perfectly fine before is now slower, because it's using
>>> opCmp for opEquals when it wasn't before.
>>
>> Who says opCmp need to be slower than opEquals.
>
> Consider:
>
>    struct S { int a,b; }
>
>    int opCmp(S s2) {
>         return (a == s.a) ? s.b - b : s.a - a;
>    }
>
>    bool opEquals(S s2) {
>         return *cast(long*)&this == *cast(long*)&s2;
>    }
>
> Because of byte ordering variations, the cast trick wouldn't work reliably for opCmp.
>
> Do people do such things? Yes, since opEquals can very likely be in critical performance loops.

Doesn't the compiler-generated opEquals do a memcmp when it can? Obviously, it can't always, but in the simple POD cases (that don't involve floating point values anyway), it should be able to it.

- Jonathan M Davis

July 25, 2014

Re: WAT: opCmp and opEquals woes

Posted by Jonathan M Davis
in reply to Walter Bright

Jonathan M Davis

Posted in reply to Walter Bright

On Friday, 25 July 2014 at 20:28:23 UTC, Walter Bright wrote:
> Nope. If the user doesn't write opEquals, the compiler will generate one for him, using the same method as it did for ==.

Well, that was the case in 2.065, whereas 2.066 currently gives an error if you use a type as a key in an AA without defining opEquals for it. That change needs to be reverted.

- Jonathan M Davis

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation