Jump to page: 1 2
Thread overview
Ordering comparisons
Mar 07, 2017
Jonathan M Davis
Mar 07, 2017
ketmar
Mar 07, 2017
Jacob Carlborg
Mar 07, 2017
deadalnix
Mar 07, 2017
H. S. Teoh
Mar 07, 2017
Stefan Koch
Mar 07, 2017
H. S. Teoh
March 06, 2017
While reviewing work on array comparisons, Vladimir found an odd issue:

https://issues.dlang.org/show_bug.cgi?id=17244

Investigation reveals that during array comparison for inequality, structs are compared by means of memcmp - even if they don't define opCmp, and regardless of the types of their fields. This has obvious unpleasant consequences:

* Floating point values don't compare well with memcmp

* A struct S { int x; } compares differently on little endian and big endian system (!)

* A struct containing a field that defines opCmp ignores it

* ... and many other nefarious issues

The question is what to do to minimize breakage yet "break the bad code". The most backward-compatible solution is to define opCmp automatically to do a field-by-field lexicographical comparison. The most radical solution is disable ordering comparisons unless explicitly implemented by the user.


Andrei
March 06, 2017
On Monday, March 06, 2017 20:27:56 Andrei Alexandrescu via Digitalmars-d wrote:
> While reviewing work on array comparisons, Vladimir found an odd issue:
>
> https://issues.dlang.org/show_bug.cgi?id=17244
>
> Investigation reveals that during array comparison for inequality, structs are compared by means of memcmp - even if they don't define opCmp, and regardless of the types of their fields. This has obvious unpleasant consequences:

> The question is what to do to minimize breakage yet "break the bad code". The most backward-compatible solution is to define opCmp automatically to do a field-by-field lexicographical comparison. The most radical solution is disable ordering comparisons unless explicitly implemented by the user.

If we can, we should probably make structs that have opCmp use opCmp like they should, and those that don't continue to use memcmp for the moment but get a deprecation message. Then after whatever amount of time we think is appropriate, we disable the comparisons. So, we don't break anything immediately, but ultimately end up with the correct situation of comparisons not working unless opCmp being defined. That being said, depending on how all of that is implemented, I don't know how easy that approach is.

- Jonathan M Davis

March 06, 2017
On 03/06/2017 08:27 PM, Andrei Alexandrescu wrote:
>
> * A struct S { int x; } compares differently on little endian and big
> endian system (!)
>

This one is very surprising. How is that so, if both structs being compared are of the same endian-ness?
March 07, 2017
Nick Sabalausky (Abscissa) wrote:

> On 03/06/2017 08:27 PM, Andrei Alexandrescu wrote:
>>
>> * A struct S { int x; } compares differently on little endian and big
>> endian system (!)
>>
>
> This one is very surprising. How is that so, if both structs being compared are of the same endian-ness?

if we're talking about equality, yes. but opCmp is "less/greater" comparison. so, 0x290a will be:
 0x0a
 0x29
on little-endian, and:
 0x29
 0x0a
on big-endian.

let's take 0x1930, and compare it with 0x290a, using memcmp:

for big-endian, first byte:
0x29:0x19: hit, 0x290a is greater.

for little-endian, first byte:
0x0a:0x30: hit, 0x290a is *lesser* than 0x1930. oops.
March 07, 2017
On 2017-03-07 04:44, Nick Sabalausky (Abscissa) wrote:

> This one is very surprising. How is that so, if both structs being
> compared are of the same endian-ness?

The structs for a given run will be of the same endian-ness. But if you run the same code on two different systems, one with little endian and one with big endian, you will get different results when sorting an array, for example. If I understand this correctly.

-- 
/Jacob Carlborg
March 07, 2017
On Tuesday, 7 March 2017 at 01:27:56 UTC, Andrei Alexandrescu wrote:
> The question is what to do to minimize breakage yet "break the bad code". The most backward-compatible solution is to define opCmp automatically to do a field-by-field lexicographical comparison. The most radical solution is disable ordering comparisons unless explicitly implemented by the user.
>

There should be no assumption that structs are comparable, so the later.

March 07, 2017
On 03/06/2017 10:44 PM, Nick Sabalausky (Abscissa) wrote:
> On 03/06/2017 08:27 PM, Andrei Alexandrescu wrote:
>>
>> * A struct S { int x; } compares differently on little endian and big
>> endian system (!)
>>
>
> This one is very surprising. How is that so, if both structs being
> compared are of the same endian-ness?

On a big endian system, comparing integers with memcmp works correctly because higher-rank bytes are compared before lower-rank bytes. On a little endian system, lower-rank bytes are compared first, which would make e.g. 256 less than 1. Porting code that relies on comparing arrays of structs for ordering is likely to be a nightmare. We really need to fix this. -- Andrei
March 07, 2017
On Tuesday, 7 March 2017 at 02:51:37 UTC, Jonathan M Davis wrote:

> If we can, we should probably make structs that have opCmp use opCmp like they should, and those that don't continue to use memcmp for the moment but get a deprecation message. Then after whatever amount of time we think is appropriate, we disable the comparisons. So, we don't break anything immediately, but ultimately end up with the correct situation of comparisons not working unless opCmp being defined.
+1
This is the only sensible approach.
March 07, 2017
On Mon, Mar 06, 2017 at 08:27:56PM -0500, Andrei Alexandrescu via Digitalmars-d wrote:
> While reviewing work on array comparisons, Vladimir found an odd issue:
> 
> https://issues.dlang.org/show_bug.cgi?id=17244
> 
> Investigation reveals that during array comparison for inequality, structs are compared by means of memcmp - even if they don't define opCmp, and regardless of the types of their fields. This has obvious unpleasant consequences:

Couple of points:

(1) I may be remembering wrong, but I thought structs had always been intended to be compared field-wise?  I remember when working on AA's that the compiler would emit a default implementation of opEquals that did member-wise comparisons. I had always assumed that something similar was done with inequalities... or was that just unfounded extrapolation?

(2) I didn't realize that arrays allowed inequalities by default, though in retrospect it does make sense (since we do define string inequalities, and string are arrays).  But I would have expected that structs would *not* allow inequalities by default, because it's not clear that a default ordering relation makes sense for every struct. Consider for example, a struct representation of a complex number:

	struct Complex { float re, im; }

It would be wrong to assume a default ordering relation on this struct because the complex numbers do not have a linear ordering.


[...]
> The question is what to do to minimize breakage yet "break the bad code".  The most backward-compatible solution is to define opCmp automatically to do a field-by-field lexicographical comparison. The most radical solution is disable ordering comparisons unless explicitly implemented by the user.
[...]

I wouldn't say disabling ordering comparisons is "radical".  In fact, I think it makes the most sense by assuming the least about the user's intended semantics for the struct.  Lexicographical ordering by default makes sense for arrays (e.g., strings), but I don't think it makes sense for general structs. Without the user stating what his intents are, it seems unfounded to presume lexicographic ordering by default.  If the user has a struct of orthogonal information, e.g., name, phone number, address, there is no reason why changing the order of fields (during code refactoring, for example) should result in a completely different ordering just because the language assumes lexicographical ordering by default.  It's better to make it an error to involve structs in inequalities if the user didn't explicitly define opCmp, than to silently accept such comparisons, which are likely to be buggy, and then have unusual behaviour later when the user reorders fields and a seemingly unrelated part of the code suddenly begins producing different output.

I'd say allowing inequalities on structs by default is surprising behaviour, whereas asking the user to explicitly specify an ordering is more expected.


T

-- 
If the comments and the code disagree, it's likely that *both* are wrong. -- Christopher
March 07, 2017
On 03/07/2017 12:54 PM, H. S. Teoh via Digitalmars-d wrote:
> (1) I may be remembering wrong, but I thought structs had always been
> intended to be compared field-wise?  I remember when working on AA's
> that the compiler would emit a default implementation of opEquals that
> did member-wise comparisons. I had always assumed that something
> similar was done with inequalities... or was that just unfounded
> extrapolation?

We currently do memcmp.

Equality by memberwise comparison is almost always meaningful; ordering by lexicographic comparison of members is not.


Andrei
« First   ‹ Prev
1 2