Sorting floating-point values, and NaN (page 4)

On Tuesday, 12 November 2013 at 21:03:03 UTC, tn wrote: >> assert((equiv(a, b) && equiv(b, c)) <= equiv(a, c)); >> >> ("<=" on Booleans is actually implication.) > > Shouldn't the implication be to the other direction? Then it becomes > > assert((equiv(a, b) && equiv(b, c)) => equiv(a, c)); <= as in ≤ (less or equal), not ⇐ (reverse implies). <= can be used on booleans as "implies" due to the way it treats booleans as integers (true as 1, false as 0).

On 11/12/13 12:45 PM, Vladimir Panteleev wrote: >> assert((equiv(a, b) && equiv(b, c)) <= equiv(a, c)); > > OK, we're on the same page so far (although you've presented the problem > more eloquently). > >> ("<=" on Booleans is actually implication.) > > (Cool!) (... the disadvantage being b is evaluated even if a is false. Shouldn't; false implies everything.) >> That test will fail with NaNs and should be part of isSorted, sort etc. > > OK, but which a, b and c will be checked? Taking all adjacent triples > will not work with two adjacent NaNs. We can make that work if we insert the tests at the end of a run of equivalent values. That would still miss other cases though in isSorted (I think sort() can actually be much more thorough there). The point here is that we should work together on an innovative solution. Getting bogged in the "there's a problem with the language here" mindset prevents the forming of good ideas. >> We should also check such as: >> >> assert(!less(a, a)); >> assert(less(a, b) <= !less(b, a)); > > Again, for which a/b? In the isSorted case, for all adjacent values inspected. For sort, assertions can be added after each less() that does work. Andrei

On Tuesday, 12 November 2013 at 21:09:34 UTC, Vladimir Panteleev wrote: > On Tuesday, 12 November 2013 at 21:03:03 UTC, tn wrote: >>> assert((equiv(a, b) && equiv(b, c)) <= equiv(a, c)); >>> >>> ("<=" on Booleans is actually implication.) >> >> Shouldn't the implication be to the other direction? Then it becomes >> >> assert((equiv(a, b) && equiv(b, c)) => equiv(a, c)); > > <= as in ≤ (less or equal), not ⇐ (reverse implies). <= can be used on booleans as "implies" due to the way it treats booleans as integers (true as 1, false as 0). Thanks for the explanation. I thought it was pseudocode. That is a horribly confusing trick.

November 12, 2013

Re: Sorting floating-point values, and NaN

Posted by tn
in reply to Andrei Alexandrescu

Permalink

Posted in reply to Andrei Alexandrescu

Permalink

On Tuesday, 12 November 2013 at 21:07:48 UTC, Andrei Alexandrescu wrote:
> On 11/12/13 1:03 PM, tn wrote:
>> On Tuesday, 12 November 2013 at 20:03:37 UTC, Andrei Alexandrescu wrote:
>>> Consider we define equiv(a, b) as (a, b) => !less(a, b) && !less(b, a).
>>>
>>> If at least one of the numbers is NaN, all comparisons return false.
>>> That puts NaN in the same equivalence class with all numbers,
>>> including numbers that are deemed in distinct equivalence classes.
>>> That's where NaNs mess things up, and that's what we need to assert.
>>> Consider three numbers a, b, and c. Then the following must pass:
>>>
>>> assert((equiv(a, b) && equiv(b, c)) <= equiv(a, c));
>>>
>>> ("<=" on Booleans is actually implication.)
>>
>> Shouldn't the implication be to the other direction? Then it becomes
>>
>> assert((equiv(a, b) && equiv(b, c)) => equiv(a, c));
>
> It is in the other direction, but the latter doesn't compile :o). Again, it just turns out "a <= b" has the meaning of "a implies b". It's not a particularly intuitive way to express it.
>
>>> That test will fail with NaNs and should be part of isSorted, sort etc.
>>
>> But it requires that the input contains at least three elements. And it
>> has to test a lot of possible triples (a,b,c), which I think requires at
>> least O(n^2) time. And it does not fail consistently whenever the input
>> contains at least one NaN (consider an input that contains only NaNs).
>
> I agree. Ideas?

If you want the isSorted function to always fail if the input contains NaNs, then I have no other ideas than what I have already mentioned. It is just not possible by using only "<" comparison. Something more is needed.

The other way would be to define isSorted to return true if and only if the _comparable_ elements in the input are sorted (thus ignoring possible NaNs in between). This is actually implementable with only "<" comparison, but probably requires O(n^2) time in the worst case.

On Tuesday, 12 November 2013 at 21:15:39 UTC, Andrei Alexandrescu wrote: > We can make that work if we insert the tests at the end of a run of equivalent values. I've thought about this as well, but then there are cases like... [4, 5, NaN, 3, NaN, 5, 6] "5, NaN, 3, NaN, 5" is a run of equivalent values as long as neighboring pairs are concerned. > The point here is that we should work together on an innovative solution. Getting bogged in the "there's a problem with the language here" mindset prevents the forming of good ideas. Agreed.

On Tuesday, 12 November 2013 at 22:17:10 UTC, Vladimir Panteleev wrote: > On Tuesday, 12 November 2013 at 21:15:39 UTC, Andrei Alexandrescu wrote: >> We can make that work if we insert the tests at the end of a run of equivalent values. > > I've thought about this as well, but then there are cases like... > > [4, 5, NaN, 3, NaN, 5, 6] > > "5, NaN, 3, NaN, 5" is a run of equivalent values as long as neighboring pairs are concerned. > >> The point here is that we should work together on an innovative solution. Getting bogged in the "there's a problem with the language here" mindset prevents the forming of good ideas. > > Agreed. I may be late to the party, but I think it might be a bit unfair for sort to be able to reliably "catch" this issue. Heck, if you call *sort* with NaN, how should it even behave? Error? Exception? With AssumeSorted, I think it should throw an Error, if it can *reliably* do so. -------- That said, we may have been looking at this problem the wrong way. Instead of trying to verify the comparison inside the sort, why not do the check in the comparator itself? It is the comparator that should "know" how to deal with NaN: Either it considers it bigger/smaller than everything, and sorts accordingly, or doesn't know how to deal with it, and throws an Error/Exception, depending on the users' wishes. We could make sort/assumeSorted default comparator simply assert on NaN: This should be able to keep the handling of the NaN relatively customizable, and cheap in release, without having to hack into sort/assumeSorted, to do some weird bidirectional checks. This keeps the default fast and safe, and if users want more speed/customization, they can do whatever the heck they like. My 0.02$

On Tuesday, 12 November 2013 at 20:03:37 UTC, Andrei Alexandrescu wrote: >> 2) Improve documentation and visibility of this problem. For example, >> add this to the documentation of sort: >> >> "The predicate must be transitive (`a<b && b<c` implies `a<c`) in order >> for `sort` to behave properly - otherwise, the program may fail on >> certain inputs (but not others) when not compiled in release mode. Note >> that `a<b` is not transitive for floating points, because the expression >> will always be `false` when either `a` or `b` is `NaN`." > > Great idea, just file an enh request or just write the pull request. https://github.com/D-Programming-Language/phobos/pull/1691

Forums