Jump to page: 1 24  
Page
Thread overview
Sorting floating-point values, and NaN
Nov 12, 2013
Vladimir Panteleev
Nov 12, 2013
qznc
Nov 12, 2013
tn
Nov 12, 2013
tn
Nov 12, 2013
Walter Bright
Nov 12, 2013
tn
Nov 12, 2013
Jonathan M Davis
Nov 12, 2013
Vladimir Panteleev
Nov 12, 2013
tn
Nov 12, 2013
Jonathan M Davis
Nov 12, 2013
Vladimir Panteleev
Nov 12, 2013
Vladimir Panteleev
Nov 12, 2013
Bigsandwich
Nov 12, 2013
Walter Bright
Nov 12, 2013
Walter Bright
Nov 12, 2013
Vladimir Panteleev
Nov 12, 2013
Walter Bright
Nov 12, 2013
Vladimir Panteleev
Nov 12, 2013
tn
Nov 12, 2013
Vladimir Panteleev
Nov 12, 2013
tn
Nov 12, 2013
Vladimir Panteleev
Nov 12, 2013
Vladimir Panteleev
Nov 13, 2013
monarch_dodra
Nov 12, 2013
tn
Nov 12, 2013
tn
Nov 12, 2013
Vladimir Panteleev
Nov 12, 2013
tn
Nov 13, 2013
Vladimir Panteleev
November 12, 2013
double[] arr = [1, 2, 3, double.nan, 1, 2];
assert(arr.isSorted);
arr.sort();

This code will fail with an assert error, but not the one on the second line. Rather, it will fail inside std.range, when sort calls assumeSorted, which checks elements in an order different than sort and isSorted.

Here's a case where the odd way NaN interacts with generic code messes things up in an ugly way.

This is concerning. It's very easy to overlook this while writing an application, and it can become a hidden vulnerability.

We can't fix the operators... but, perhaps, we could define a safe default comparison predicate (e.g. std.algorithm.less) for use with sorting and related tasks? Aside from bitwise comparison, is it even possible to efficiently compare floating-point values in a way suitable for sorting?
November 12, 2013
On Tuesday, 12 November 2013 at 04:41:56 UTC, Vladimir Panteleev wrote:
> double[] arr = [1, 2, 3, double.nan, 1, 2];
> assert(arr.isSorted);
> arr.sort();
>
> This code will fail with an assert error, but not the one on the second line. Rather, it will fail inside std.range, when sort calls assumeSorted, which checks elements in an order different than sort and isSorted.
>
> Here's a case where the odd way NaN interacts with generic code messes things up in an ugly way.
>
> This is concerning. It's very easy to overlook this while writing an application, and it can become a hidden vulnerability.
>
> We can't fix the operators... but, perhaps, we could define a safe default comparison predicate (e.g. std.algorithm.less) for use with sorting and related tasks? Aside from bitwise comparison, is it even possible to efficiently compare floating-point values in a way suitable for sorting?

You cannot sort NaNs. A comparison with NaN must always evaluate to false.

One could change isSorted to check for false instead of true, so it would accept NaN. That would probably break too much code, though.

  arr[n] < arr[n+1]  => !(arr[n+1] < arr[n])

Apparently, isSorted contains an antisymmetry check, which is not triggered, because it relies on true results.
November 12, 2013
On Tuesday, November 12, 2013 05:41:55 Vladimir Panteleev wrote:
> double[] arr = [1, 2, 3, double.nan, 1, 2];
> assert(arr.isSorted);
> arr.sort();
> 
> This code will fail with an assert error, but not the one on the second line. Rather, it will fail inside std.range, when sort calls assumeSorted, which checks elements in an order different than sort and isSorted.
> 
> Here's a case where the odd way NaN interacts with generic code messes things up in an ugly way.
> 
> This is concerning. It's very easy to overlook this while writing an application, and it can become a hidden vulnerability.
> 
> We can't fix the operators... but, perhaps, we could define a safe default comparison predicate (e.g. std.algorithm.less) for use with sorting and related tasks? Aside from bitwise comparison, is it even possible to efficiently compare floating-point values in a way suitable for sorting?

The check could be special-cased for floating point values so that it takes NaN into account, but NaN pretty much screws with everything. Once it's there, pretty much anything involving math or comparisons is going to go bad. So, you could argue that it's a bug in the program if sort is passed a NaN and that the assertion in sort is therefore perfectly valid. You can't really sort anything with NaN in it anyway.

- Jonathan M Davis
November 12, 2013
On Tuesday, 12 November 2013 at 07:33:57 UTC, qznc wrote:
> On Tuesday, 12 November 2013 at 04:41:56 UTC, Vladimir Panteleev wrote:
>> double[] arr = [1, 2, 3, double.nan, 1, 2];
>> assert(arr.isSorted);
>> arr.sort();
>>
>> This code will fail with an assert error, but not the one on the second line. Rather, it will fail inside std.range, when sort calls assumeSorted, which checks elements in an order different than sort and isSorted.
>>
>> Here's a case where the odd way NaN interacts with generic code messes things up in an ugly way.
>>
>> This is concerning. It's very easy to overlook this while writing an application, and it can become a hidden vulnerability.
>>
>> We can't fix the operators... but, perhaps, we could define a safe default comparison predicate (e.g. std.algorithm.less) for use with sorting and related tasks? Aside from bitwise comparison, is it even possible to efficiently compare floating-point values in a way suitable for sorting?
>
> You cannot sort NaNs. A comparison with NaN must always evaluate to false.
>
> One could change isSorted to check for false instead of true, so it would accept NaN. That would probably break too much code, though.
>
>   arr[n] < arr[n+1]  => !(arr[n+1] < arr[n])

But that is exactly what it does.

https://github.com/D-Programming-Language/phobos/blob/8372da44f66f20c67b69b4b8fb39233ce42d493c/std/algorithm.d#L10021

And that is the reason the second line "assert(arr.isSorted);" passes, while it should fail as the array is clearly not sorted. Instead modifying

  !(arr[n+1] < arr[n])  =>  arr[n] <= arr[n+1]

would make the function work correctly (return false if the range contains nans), but then the template parameter "less" should be replaced by "lessOrEqual". An alternative would be to check for nans explicitly.
November 12, 2013
On Tuesday, 12 November 2013 at 08:54:35 UTC, tn wrote:
> On Tuesday, 12 November 2013 at 07:33:57 UTC, qznc wrote:
>> On Tuesday, 12 November 2013 at 04:41:56 UTC, Vladimir Panteleev wrote:
>>> double[] arr = [1, 2, 3, double.nan, 1, 2];
>>> assert(arr.isSorted);
>>> arr.sort();
>>>
>>> This code will fail with an assert error, but not the one on the second line. Rather, it will fail inside std.range, when sort calls assumeSorted, which checks elements in an order different than sort and isSorted.
>>>
>>> Here's a case where the odd way NaN interacts with generic code messes things up in an ugly way.
>>>
>>> This is concerning. It's very easy to overlook this while writing an application, and it can become a hidden vulnerability.
>>>
>>> We can't fix the operators... but, perhaps, we could define a safe default comparison predicate (e.g. std.algorithm.less) for use with sorting and related tasks? Aside from bitwise comparison, is it even possible to efficiently compare floating-point values in a way suitable for sorting?
>>
>> You cannot sort NaNs. A comparison with NaN must always evaluate to false.
>>
>> One could change isSorted to check for false instead of true, so it would accept NaN. That would probably break too much code, though.
>>
>>  arr[n] < arr[n+1]  => !(arr[n+1] < arr[n])
>
> But that is exactly what it does.
>
> https://github.com/D-Programming-Language/phobos/blob/8372da44f66f20c67b69b4b8fb39233ce42d493c/std/algorithm.d#L10021
>
> And that is the reason the second line "assert(arr.isSorted);" passes, while it should fail as the array is clearly not sorted. Instead modifying
>
>   !(arr[n+1] < arr[n])  =>  arr[n] <= arr[n+1]
>
> would make the function work correctly (return false if the range contains nans), but then the template parameter "less" should be replaced by "lessOrEqual". An alternative would be to check for nans explicitly.

Actually, you could also use "!>=" instead of "<" so that the compasison would become:

  !(arr[n+1] !>= arr[n])

This is also correct for floating point numbers. However, it might be problematic for user defined types that also implement something similar to nan (eg. bigfloat). I could not find any documentation on how the unordered comparison operators (<>, !<>=, !<=, !<, !>=, !>, !<>) translate into opCmp calls.
November 12, 2013
On 11/12/2013 1:33 AM, tn wrote:
> I could not find any documentation on how the unordered
> comparison operators (<>, !<>=, !<=, !<, !>=, !>, !<>) translate into opCmp calls.

That's because they just don't translate to opCmp calls. At one point I was going to set up an opExtendedCmp or something like that for those operators, but then we went ahead and deprecated those operators, rendering the idea moot.
November 12, 2013
On Tuesday, November 12, 2013 10:33:26 tn wrote:
>  I could not find any
> documentation on how the unordered comparison operators (<>,
> !<>=, !<=, !<, !>=, !>, !<>) translate into opCmp calls.

As I understand it, they only work with the built-in floating point types and are supposed to be deprecated. So, writing new code which uses them - particularly generic code - wouldn't make sense.

- Jonathan M Davis
November 12, 2013
On Tuesday, 12 November 2013 at 09:40:52 UTC, Walter Bright wrote:
> On 11/12/2013 1:33 AM, tn wrote:
>> I could not find any documentation on how the unordered
>> comparison operators (<>, !<>=, !<=, !<, !>=, !>, !<>) translate into opCmp calls.
>
> That's because they just don't translate to opCmp calls.

Well, they seem to translate to something, because this works as expected:

  struct S {
      private double v;
      auto opCmp(S rhs) { return v - rhs.v; }
  }

  S v1 = S(1);
  S v2 = S(2);
  S vn = S(double.nan);
  assert((v2 < v1) == false);
  assert((v1 < v2) == true);
  assert((v1 < v1) == false);
  assert((v1 < vn) == false);
  assert((v2 !>= v1) == false);
  assert((v1 !>= v2) == true);
  assert((v1 !>= v1) == false);
  assert((v1 !>= vn) == true);


> but then we went ahead and deprecated those operators

Maybe the spec then needs to be updated:

http://dlang.org/expression.html#RelExpression
http://dlang.org/expression.html#floating_point_comparisons
November 12, 2013
On 11/12/13 12:54 AM, tn wrote:
> On Tuesday, 12 November 2013 at 07:33:57 UTC, qznc wrote:
>> On Tuesday, 12 November 2013 at 04:41:56 UTC, Vladimir Panteleev wrote:
>>> double[] arr = [1, 2, 3, double.nan, 1, 2];
>>> assert(arr.isSorted);
>>> arr.sort();
>>>
>>> This code will fail with an assert error, but not the one on the
>>> second line. Rather, it will fail inside std.range, when sort calls
>>> assumeSorted, which checks elements in an order different than sort
>>> and isSorted.
>>>
>>> Here's a case where the odd way NaN interacts with generic code
>>> messes things up in an ugly way.
>>>
>>> This is concerning. It's very easy to overlook this while writing an
>>> application, and it can become a hidden vulnerability.
>>>
>>> We can't fix the operators... but, perhaps, we could define a safe
>>> default comparison predicate (e.g. std.algorithm.less) for use with
>>> sorting and related tasks? Aside from bitwise comparison, is it even
>>> possible to efficiently compare floating-point values in a way
>>> suitable for sorting?
>>
>> You cannot sort NaNs. A comparison with NaN must always evaluate to
>> false.
>>
>> One could change isSorted to check for false instead of true, so it
>> would accept NaN. That would probably break too much code, though.
>>
>>   arr[n] < arr[n+1]  => !(arr[n+1] < arr[n])
>
> But that is exactly what it does.
>
> https://github.com/D-Programming-Language/phobos/blob/8372da44f66f20c67b69b4b8fb39233ce42d493c/std/algorithm.d#L10021
>
>
> And that is the reason the second line "assert(arr.isSorted);" passes,
> while it should fail as the array is clearly not sorted. Instead modifying
>
>    !(arr[n+1] < arr[n])  =>  arr[n] <= arr[n+1]
>
> would make the function work correctly (return false if the range
> contains nans), but then the template parameter "less" should be
> replaced by "lessOrEqual". An alternative would be to check for nans
> explicitly.

I think NaNs are singular unordered values that make invalid inputs for either sort or isSorted. We should simply add an assert to isSorted.

Andrei

November 12, 2013
On Tuesday, 12 November 2013 at 08:10:16 UTC, Jonathan M Davis wrote:
> On Tuesday, November 12, 2013 05:41:55 Vladimir Panteleev wrote:
>> double[] arr = [1, 2, 3, double.nan, 1, 2];
>> assert(arr.isSorted);
>> arr.sort();
>> 
>> This code will fail with an assert error, but not the one on the
>> second line. Rather, it will fail inside std.range, when sort
>> calls assumeSorted, which checks elements in an order different
>> than sort and isSorted.
>> 
>> Here's a case where the odd way NaN interacts with generic code
>> messes things up in an ugly way.
>> 
>> This is concerning. It's very easy to overlook this while writing
>> an application, and it can become a hidden vulnerability.
>> 
>> We can't fix the operators... but, perhaps, we could define a
>> safe default comparison predicate (e.g. std.algorithm.less) for
>> use with sorting and related tasks? Aside from bitwise
>> comparison, is it even possible to efficiently compare
>> floating-point values in a way suitable for sorting?
>
> The check could be special-cased for floating point values so that it takes NaN
> into account, but NaN pretty much screws with everything. Once it's there,
> pretty much anything involving math or comparisons is going to go bad. So, you
> could argue that it's a bug in the program if sort is passed a NaN and that
> the assertion in sort is therefore perfectly valid. You can't really sort
> anything with NaN in it anyway.

Here is the problem:

1) NaN values are accepted as input almost anywhere where doubles are. std.conv recognizes the string "nan" and converts it to double.nan,
2) It is unreasonable to expect every D user out there dealing with floating point numbers to be forced to check every source of floating point values in their program against NaNs.
3) If the program is compiled in debug mode, it will crash quasi-randomly, since the assumeSorted assert does not check every pair.

For these reasons, this problem can become a sort of time bomb in their application: even with 100% test coverage, one NaN in the wrong place inputted by a malicious end user can cause a situation the program's authors have not foreseen.

It can be argued that it is a flaw in the language in that it allowed such a situation to arise in the first place.

P.S. Please enable RFC 2646 (format=flowed) support in your email client. It is emitting unreflowable text, which causes jagged quotes as seen above.
« First   ‹ Prev
1 2 3 4