August 11, 2012
On Friday, August 10, 2012 15:10:47 Walter Bright wrote:
> What can I say? I run across this repeatedly, and that's exactly why Phobos (with Don's help) has its own implementations, rather than simply calling the corresponding C ones.

I think that it's pretty typical for programmers to think that something like a standard library function is essentially bug-free - especially for an older language like C. And unless you see results that are clearly wrong or someone else points out the problem, I don't know why you'd ever think that there was one. I certainly had no clue that C implementations had issues with floating point arithmetic before it was pointed out here. Regardless though, it's great that D gets it right.

- Jonathan M Davis
August 11, 2012
On Friday, 10 August 2012 at 22:11:23 UTC, Walter Bright wrote:
> On 8/10/2012 8:31 AM, TJB wrote:
>> On Thursday, 9 August 2012 at 18:35:22 UTC, Walter Bright wrote:
>>> On 8/9/2012 10:40 AM, dsimcha wrote:
>>>> I'd emphasize the following:
>>>
>>> I'd like to add to that:
>>>
>>> 1. Proper support for 80 bit floating point types. Many compilers' libraries
>>> have inaccurate 80 bit math functions, or don't implement 80 bit floats at
>>> all. 80 bit floats reduce the incidence of creeping roundoff error.
>>
>> How unique to D is this feature?  Does this imply that things like BLAS and
>> LAPACK, random number generators, statistical distribution functions, and other
>> numerical software should be rewritten in pure D rather than calling out to
>> external C or Fortran codes?
>
> I attended a talk given by a physicist a few months ago where he was using C transcendental functions. I pointed out to him that those functions were unreliable, producing wrong bits in a manner that suggested to me that they were internally truncating to double precision.
>
> He expressed astonishment and told me I must be mistaken.
>
> What can I say? I run across this repeatedly, and that's exactly why Phobos (with Don's help) has its own implementations, rather than simply calling the corresponding C ones.
>
> I encourage you to run your own tests, and draw your own conclusions.

Hopefully this will help make the case that D is the best choice for numerical programmers. I want to do my part to convince economists.

Another reason to implement BLAS and LAPACK in pure D is that the old routines like dgemm, cgemm, sgemm, and zgemm (all defined for different types) seem ripe for templatization.

Almost thou convinceth me ...

TJB

August 11, 2012
Walter Bright wrote:
> It catches only a subset of these at compile time. I can craft any number of ways of getting it to miss diagnosing it. Consider this one:
>
>     float z;
>     if (condition1)
>          z = 5;
>     ... lotsa code ...
>     if (condition2)
>          z++;
>
> To diagnose this correctly, the static analyzer would have to determine that condition1 produces the same result as condition2, or not. This is impossible to prove. So the static analyzer either gives up and lets it pass, or issues an incorrect diagnostic. So our intrepid programmer is forced to write:
>
>     float z = 0;
>     if (condition1)
>          z = 5;
>     ... lotsa code ...
>     if (condition2)
>          z++;

Yes, but that's not really an issue since the compiler informs the coder of it's limitation. You're simply forced to initialize the variable in this situation.


> Now, as it may turn out, for your algorithm the value "0" is an out-of-range, incorrect value. Not a problem as it is a dead assignment, right?
>
> But then the maintenance programmer comes along and changes condition1 so it is not always the same as condition2, and now the z++ sees the invalid "0" value sometimes, and a silent bug is introduced.
>
> This bug will not remain undetected with the default NaN initialization.

I had a debate on here a few months ago about the merits of default-to-NaN and others brought up similar situations. but since we can write:

    float z = float.nan;
    ...

explicitly, then this could be thought of as a debugging feature available to the programmer. The problem I've always had with defaulting to NaN is that it's inconsistent with integer types, and while there may be merit to the idea of defaulting all types to NaN/Null, it's simply unavailable for half of the number spectrum. I can only speak for myself, but I much prefer consistency over anything else because it means there's less discrepancies I need to remember when hacking things together. It also steepens the learning curve.

More importantly, what we have now is code where bugs-- like the one you mentioned above --are still possible with Ints, but also easy to miss since "the other number type" behaves differently and programmers may accidentally assume a NaN will propagate where it will not.


> This is incorrect, as the optimizer is perfectly capable of removing dead assignments like:
>
>    f = nan;
>    f = 0.0f;
>
> The first assignment is optimized away.

I thought there was some optimization by avoiding assignment, but IDK enough about memory at that level. Now I'm confused as to the point of 'float x = void' type annotations. :-\


> Whether you agree with it being a good feature or not, it is a feature unique to D and merits discussion when talking about D's suitability for numerical programming.

True, and I misspoke by saying it wasn't a "selling point". I only meant to raise issue with a feature that has been more of an annoyance rather than a boon to me personally. That said, I also agree that this thread was the wrong place to raise issue with it.
August 11, 2012
On 8/10/2012 9:01 PM, F i L wrote:
> I had a debate on here a few months ago about the merits of default-to-NaN and
> others brought up similar situations. but since we can write:
>
>      float z = float.nan;
>      ...

That is a good solution, but in my experience programmers just throw in an =0, as it is simple and fast, and they don't normally think about NaN's.

> explicitly, then this could be thought of as a debugging feature available to
> the programmer. The problem I've always had with defaulting to NaN is that it's
> inconsistent with integer types, and while there may be merit to the idea of
> defaulting all types to NaN/Null, it's simply unavailable for half of the number
> spectrum. I can only speak for myself, but I much prefer consistency over
> anything else because it means there's less discrepancies I need to remember
> when hacking things together. It also steepens the learning curve.

It's too bad that ints don't have a NaN value, but interestingly enough, valgrind does default initialize them to some internal NaN, making it a most excellent bug detector.


> More importantly, what we have now is code where bugs-- like the one you
> mentioned above --are still possible with Ints, but also easy to miss since "the
> other number type" behaves differently and programmers may accidentally assume a
> NaN will propagate where it will not.

Sadly, D has to map onto imperfect hardware :-(

We do have NaN values for chars (0xFF) and pointers (the villified 'null'). Think how many bugs the latter has exposed, and then think of all the floating point code with no such obvious indicator of bad initialization.

> I thought there was some optimization by avoiding assignment, but IDK enough
> about memory at that level. Now I'm confused as to the point of 'float x = void'
> type annotations. :-\

It would be used where the static analysis is not able to detect that the initializer is dead.
August 11, 2012
On 8/10/2012 9:32 PM, Walter Bright wrote:
> On 8/10/2012 9:01 PM, F i L wrote:
>> I had a debate on here a few months ago about the merits of default-to-NaN and
>> others brought up similar situations. but since we can write:
>>
>>      float z = float.nan;
>>      ...
>
> That is a good solution, but in my experience programmers just throw in an =0,
> as it is simple and fast, and they don't normally think about NaN's.

Let me amend that. I've never seen anyone use float.nan, or whatever NaN is in the language they were using. They always use =0. I doubt that yelling at them will change anything.
August 11, 2012
F i L wrote:
> Walter Bright wrote:
>> It catches only a subset of these at compile time. I can craft any number of ways of getting it to miss diagnosing it. Consider this one:
>>
>>    float z;
>>    if (condition1)
>>         z = 5;
>>    ... lotsa code ...
>>    if (condition2)
>>         z++;
>> 
>> [...]
>
> Yes, but that's not really an issue since the compiler informs the coder of it's limitation. You're simply forced to initialize the variable in this situation.

I just want to clarify something here. In C#, only class/struct fields are defaulted to a usable value. Locals have to be explicitly set before they're used.. so, expanding on your example above:

    float z;
    if (condition1)
        z = 5;
    else
        z = 6; // 'else' required

    ... lotsa code ...
    if (condition2)
        z++;

On the first condition, without an 'else z = ...', or if the condition was removed at a later time, then you'll get a compiler error and be forced to explicitly assign 'z' somewhere above using it. So C# and D work in "similar" ways in this respect except that C# catches these issues at compile-time, whereas in D you need to:

  1. run the program
  2. get bad result
  3. hunt down bug

NaNs in C# are "mostly" (citations needed) set to ensure fields are initialized in a constructor:

    class Foo
    {
        float f = float.NaN; // Can't 'f' use unless Foo is
                             // properly constructed.
    }
August 11, 2012
Walter Bright wrote:
> Sadly, D has to map onto imperfect hardware :-(
>
> We do have NaN values for chars (0xFF) and pointers (the villified 'null'). Think how many bugs the latter has exposed, and then think of all the floating point code with no such obvious indicator of bad initialization.

Yes, if 'int' had a NaN state it would be great. (Though I remember hearing about a hardware that did support it.. somewhere).


August 11, 2012
On 8/10/2012 9:55 PM, F i L wrote:
> On the first condition, without an 'else z = ...', or if the condition was
> removed at a later time, then you'll get a compiler error and be forced to
> explicitly assign 'z' somewhere above using it. So C# and D work in "similar"
> ways in this respect except that C# catches these issues at compile-time,
> whereas in D you need to:
>
>    1. run the program
>    2. get bad result
>    3. hunt down bug

However, and I've seen this happen, people will satisfy the compiler complaint by initializing the variable to any old value (usually 0), because that value will never get used. Later, after other things change in the code, that value suddenly gets used, even though it may be an incorrect value for the use.
August 11, 2012
Walter Bright wrote:
> That is a good solution, but in my experience programmers just throw in an =0, as it is simple and fast, and they don't normally think about NaN's.

See! Programmers just want usable default values :-P


> It's too bad that ints don't have a NaN value, but interestingly enough, valgrind does default initialize them to some internal NaN, making it a most excellent bug detector.

I heard somewhere before there's actually an (Intel?) CPU which supports NaN ints... but maybe that's just hearsay.


> Sadly, D has to map onto imperfect hardware :-(
>
> We do have NaN values for chars (0xFF) and pointers (the villified 'null'). Think how many bugs the latter has exposed, and then think of all the floating point code with no such obvious indicator of bad initialization.

Ya, but I don't think pointers/refs and floats are comparable because one is copy semantics and the other is not. Conceptually, pointers are only references to data while numbers are actual data. It makes sense that one would default to different things. Thought if Int did have a NaN value, I'm not sure which way I would side on this issue. I still think I would prefer having some level of compile-time indication or my errors simply because it saves time when you're making something.


> It would be used where the static analysis is not able to detect that the initializer is dead.

Good to know.


> However, and I've seen this happen, people will satisfy the compiler complaint by initializing the variable to any old value (usually 0), because that value will never get used. Later, after other things change in the code, that value suddenly gets used, even though it may be an incorrect value for the use.

Maybe the perfect solution is to have the compiler initialize the value to NaN, but it also does a bit of static analysis and gives a compiler error when it can determine your variable is being used before being assigned for the sake of productivity.

In fact, for the sake of consistency, you could always enforce that (compiler error) rule on every local variable, so even ints would be required to have explicit initialization before use.

I still prefer float class members to be defaulted to a usable value, for the sake of consistency with ints.
August 11, 2012
On Saturday, 11 August 2012 at 04:33:38 UTC, Walter Bright wrote:
> It's too bad that ints don't have a NaN value, but interestingly enough, valgrind does default initialize them to some internal NaN, making it a most excellent bug detector.

 The compiler could always have flags specifying if variables were used, and if they are false they are as good as NaN. Only downside is a performance hit unless you Mark it as a release binary. It really comes down to if it's worth implementing or considered a big change (unless it's a flag you have to specially turn on)

example:

  int a;

  writeln(a++); //compile-time error, or throws an exception on at runtime (read access before being set)

internally translated as:
  int a;
  bool _is_a_used = false;

  if (!_a__is_a_used)
    throw new exception("a not initialized before use!");
    //passing to functions will throw the exception,
    //unless the signature is 'out'
  writeln(a);

  ++a;
  _a__is_a_used= true;


> Sadly, D has to map onto imperfect hardware :-(

 Not so much imperfect hardware, just the imperfect 'human' variable.

> We do have NaN values for chars (0xFF) and pointers (the villified 'null'). Think how many bugs the latter has exposed, and then think of all the floating point code with no such obvious indicator of bad initialization.