August 11, 2012
On Friday, 10 August 2012 at 22:01:46 UTC, Walter Bright wrote:
> It catches only a subset of these at compile time. I can craft any number of ways of getting it to miss diagnosing it. Consider this one:
>
>     float z;
>     if (condition1)
>          z = 5;
>     ... lotsa code ...
>     if (condition2)
>          z++;
>
> To diagnose this correctly, the static analyzer would have to determine that condition1 produces the same result as condition2, or not. This is impossible to prove. So the static analyzer either gives up and lets it pass, or issues an incorrect diagnostic. So our intrepid programmer is forced to write:
>
>     float z = 0;
>     if (condition1)
>          z = 5;
>     ... lotsa code ...
>     if (condition2)
>          z++;
>
> Now, as it may turn out, for your algorithm the value "0" is an out-of-range, incorrect value. Not a problem as it is a dead assignment, right?
>
> But then the maintenance programmer comes along and changes condition1 so it is not always the same as condition2, and now the z++ sees the invalid "0" value sometimes, and a silent bug is introduced.
>
> This bug will not remain undetected with the default NaN initialization.

The compiler in languages like C# doesn't try to prove that the variable is NOT set and then emits an error. It tries to prove that the variable IS set, and if it can't prove that, it's an error.

It's not an incorrect diagnostic, it does exactly what it's supposed to do and the programmer has to be explicit when one takes on the responsibility of initialization. I don't see anybody complaining about this feature in C#, most experienced C# programmers I've talked to love it (I much prefer it too).

Leaving a local variable initially uninitialized (or rather, not explicitly initialized) is a good way to portray the intention that it's going to be conditionally initialized later. In C#, if your program compiles, your variable is guaranteed to be initialized later but before use. This is a useful guarantee when reading/maintaining code.

In D, on the other hand, it's possible to write D code like:

for(size_t i; i < length; ++i)
{
    ...
}

And I've actually seen this kind of code a lot in the wild. It boggles my mind that you think that this code should be legal. I think it's lazy - the intention is not clear. Is the default initializer being intentionally relied on, or was it unintentional? I've seen both cases. The for-loop example is an extreme one for demonstrative purposes, most examples are less obvious.

Saying that most programmers will explicitly initialize floating point numbers to 0 instead of NaN when taking on initialization responsibility is a cop-out - float.init and float.nan are obviously the values you should be going for. The benefit is easy for programmers to understand, especially if they already understand why float.init is NaN. You say yelling at them probably won't help - why not? I personally use float.init/double.init etc. in my own code, and I'm sure other informed programmers do too. I can understand why people don't do it in, say, C, with NaN being less defined there afaik. D promotes NaN actively and programmers should be eager to leverage NaN explicitly too.

It's also important to note that C# works the same as D for non-local variables - they all have a defined default initializer (the C# equivalent of T.init is default(T)). Another point is that the local-variable analysis is limited to the scope of a single function body, it does not do inter-procedural analysis.

I think this would be a great thing for D, and I believe that all code this change breaks is actually broken to begin with.

August 11, 2012
On 8/11/2012 1:30 AM, Era Scarecrow wrote:
> On Saturday, 11 August 2012 at 04:33:38 UTC, Walter Bright wrote:
>> It's too bad that ints don't have a NaN value, but interestingly enough,
>> valgrind does default initialize them to some internal NaN, making it a most
>> excellent bug detector.
>
>   The compiler could always have flags specifying if variables were used, and if
> they are false they are as good as NaN. Only downside is a performance hit
> unless you Mark it as a release binary. It really comes down to if it's worth
> implementing or considered a big change (unless it's a flag you have to
> specially turn on)

Not so easy. Suppose you pass a pointer to the variable to another function. Does that function set it?
August 11, 2012
On 8/11/2012 1:57 AM, Jakob Ovrum wrote:
> The compiler in languages like C# doesn't try to prove that the variable is NOT
> set and then emits an error. It tries to prove that the variable IS set, and if
> it can't prove that, it's an error.
>
> It's not an incorrect diagnostic, it does exactly what it's supposed to do

Of course it is doing what the language requires, but it is an incorrect diagnostic because a dead assignment is required.

And being a dead assignment, it can lead to errors when the code is later modified, as I explained. I also dislike on aesthetic grounds meaningless code being required.

> In D, on the other hand, it's possible to write D code like:
>
> for(size_t i; i < length; ++i)
> {
>      ...
> }
>
> And I've actually seen this kind of code a lot in the wild. It boggles my mind
> that you think that this code should be legal. I think it's lazy - the intention
> is not clear. Is the default initializer being intentionally relied on, or was
> it unintentional? I've seen both cases. The for-loop example is an extreme one
> for demonstrative purposes, most examples are less obvious.

That perhaps is your experience with other languages (that do not default initialize) showing. I don't think that default initialization is so awful. In fact, C++ enables one to specify default initialization for user defined types. Are you against that, too?


> Saying that most programmers will explicitly initialize floating point numbers
> to 0 instead of NaN when taking on initialization responsibility is a cop-out -

You can certainly say it's a copout, but it's what I see them do. I've never seen them initialize to NaN, but I've seen the "just throw in a 0" many times.


> float.init and float.nan are obviously the values you should be going for. The
> benefit is easy for programmers to understand, especially if they already
> understand why float.init is NaN. You say yelling at them probably won't help -
> why not?

Because experience shows that even the yellers tend to do the short, convenient one rather than the longer, correct one. Bruce Eckel wrote an article about this years ago in reference to why Java exception specifications were a failure and actually caused people to write bad code, including those who knew better.



August 11, 2012
On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:
> On 8/11/2012 1:57 AM, Jakob Ovrum wrote:
> Because experience shows that even the yellers tend to do the short, convenient one rather than the longer, correct one. Bruce Eckel wrote an article about this years ago in reference to why Java exception specifications were a failure and actually caused people to write bad code, including those who knew better.

I have to agree here.

I spend my work time between JVM and .NET based languages, and
checked exceptions are on my top 5 list of what went wrong with Java.

You see lots of

try {
 ...
} catch (Exception e) {
  e.printStackException();
}

in enterprise code.

--
Paulo
August 11, 2012
On 8/11/12 3:11 AM, F i L wrote:
> I still prefer float class members to be defaulted to a usable value,
> for the sake of consistency with ints.

Actually there's something that just happened two days ago to me that's relevant to this, particularly because it's in a different language (SQL) and different domain (Machine Learning).

I was working with an iterative algorithm implemented in SQL, which performs some aggregate computation, on some 30 billions of samples. The algorithm is rather intricate, and each iteration takes the previous one's result as input.

Somehow at the end there were NaNs in the sample data I was looking at (there weren't supposed to). So I started investigating; the NaNs could appear only in a rare data corruption case. And indeed before long I found 4 (four) samples out of 30 billion that were corrupt. After one iteration, there were 300K NaNs. After two iterations, a few millions. After four, 800M samples were messed up. NaNs did save the day.

Although this case is not about default values but about the result of a computation (in this case 0.0/0.0), I think it still reveals the usefulness of having a singular value in the floating point realm.


Andrei
August 11, 2012
On Saturday, 11 August 2012 at 09:26:42 UTC, Walter Bright wrote:
> On 8/11/2012 1:30 AM, Era Scarecrow wrote:

>> The compiler could always have flags specifying if variables were used, and if they are false they are as good as NaN. Only downside is a performance hit unless you Mark it as a release binary. It really comes down to if it's worth implementing or considered a big change (unless it's a flag you have to specially turn on)
>
> Not so easy. Suppose you pass a pointer to the variable to another function. Does that function set it?

 I suppose there could be a second hidden pointer/bool as part of calls, but then it's completely incompatible with any C calling convention, meaning that is probably out of the question.

 Either a) pointers are low level enough that like casting; At which case it's all up to the programmer. or b) same as before that unless it's an 'out' parameter is specified, it would likely throw an exception at that point, (Since attempting to read/pass the address of an uninitialized variable is the same as accessing it directly). Afterall having a false positive is better than not being involved at all right?

 Of course with that in mind, specifying a variable to begin as void (uninitialized) could be it's own form of initialization? (Meaning it wouldn't be checking those even though they hold known garbage)
August 11, 2012
On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:
> Of course it is doing what the language requires, but it is an incorrect diagnostic because a dead assignment is required.
>
> And being a dead assignment, it can lead to errors when the code is later modified, as I explained. I also dislike on aesthetic grounds meaningless code being required.

It is not meaningless, it's declarative. The same resulting code as now would be generated, but it's easier for the maintainer to understand what's being meant.

> That perhaps is your experience with other languages (that do not default initialize) showing. I don't think that default initialization is so awful. In fact, C++ enables one to specify default initialization for user defined types. Are you against that, too?

No, because user-defined types can have explicitly initialized members. I do think that member fields relying on the default initializer are ambiguous and should be explicit, but flow analysis on aggregate members is not going to work in any current programming language. D already works similarly to C# on this point.

And for the record, I have more experience with D than C#. I barely use C#, but I'm not afraid to point out its good parts even though D is my personal favourite.

> You can certainly say it's a copout, but it's what I see them do. I've never seen them initialize to NaN, but I've seen the "just throw in a 0" many times.

Again, I agree with this - except the examples are not from D, and certainly not from the future D that is being proposed. I don't blame anyone from steering away from NaN in other C-style languages.

I do, however, believe that D programmers are perfectly capable of doing the right thing if informed. And let's face it - there's a lot that relies on education in D, like whether to receive a string parameter as const or immutable, and using scope on a subset of callback parameters. Both of these examples require more typing than the intuitive/straight-forward choice (always receive `string` and no `scope` on delegates), but informed D programmers still choose the more lengthy, correct version.

Consider `pure` member functions - turns out most of them are actually pure because the implicit `this` parameter is allowed to be mutated and it's rare for a member function to mutate global state, yet we all strive to correctly decorate our methods `pure` when applicable.

> Because experience shows that even the yellers tend to do the short, convenient one rather than the longer, correct one. Bruce Eckel wrote an article about this years ago in reference to why Java exception specifications were a failure and actually caused people to write bad code, including those who knew better.

I don't think the comparison is fair.

Compared to Java exception specifications, the difference between '0' and 'float.nan'/'float.init' is negligible, especially in generic functions when the desired initializer would typically be 'T.init'.

Java exception specifications have widespread implications for the entire codebase, while the difference between '0' and 'float.nan' is constant and entirely a local improvement.


August 11, 2012
Andrei Alexandrescu wrote:
> [ ... ]
>
> Although this case is not about default values but about the result of a computation (in this case 0.0/0.0), I think it still reveals the usefulness of having a singular value in the floating point realm.

My argument was never against the usefulness of NaN for debugging... only that it should be considered a debugging feature and explicitly defined, rather than intruding on convenience and consistency (with Int) by being the default.

I completely agree NaNs are important for debugging floating point math, in fact D's default-to-NaN has caught a couple of my construction mistakes before. The problem, is that this sort of construction mistake is bigger than just floating point and NaN. You can mis-set a variable, float or not, or you can not set an int when you should have.

So the question becomes not what benefit NaN is for debugging, but what a persons thought process is when creating/debugging code, and herein lies the heart of my qualm. In D we have a bit of a conceptual double standard within the number community. I have to remember these rules when I'm creating something, not just when I'm debugging it. As often as D may have caught a construction mistake specifically related to floats in my code, 10x more so it's produced NaN's where I intended a number, because I forgot about the double standard when adding a field or creating a variable.

A C++ guy might not think twice about this because he's used to having to default values all the time (IDK, I'm not that guy), but to a C# guy, D's approach feels more like a regression, and that's a paper-cut on someone's opinion of the language.

August 11, 2012
On 8/11/2012 12:33 PM, F i L wrote:
> In D we have a bit of a conceptual double standard within the
> number community. I have to remember these rules when I'm creating something,
> not just when I'm debugging it. As often as D may have caught a construction
> mistake specifically related to floats in my code, 10x more so it's produced
> NaN's where I intended a number, because I forgot about the double standard when
> adding a field or creating a variable.

I'd rather have a 100 easy to find bugs than 1 unnoticed one that went out in the field.


> A C++ guy might not think twice about this because he's used to having to
> default values all the time (IDK, I'm not that guy),

Only if a default constructor is defined for the type, which it often is not, and you'll get garbage for a default initialization.


August 11, 2012
F i L:

> Walter Bright wrote:
>> 3. Floating point values are default initialized to NaN.
>
> This isn't a good feature, IMO. C# handles this much more conveniently

An alternative possibility is to:
1) Default initialize variables just as currently done in D, with 0s, NaNs, etc;
2) Where the compiler is certain a variable is read before any possible initialization, it generates a compile-time error;
3) Warnings for unused variables and unused last assignments.

Where the compiler is not sure, not able to tell, or sees there is one or more paths where the variable is initialized, it gives no errors, and eventually the code will use the default initialized values, as currently done in D.


The D compiler is already doing this a little, if you compile this with -O:

class Foo {
  void bar() {}
}
void main() {
  Foo f;
  f.bar();
}

You get at compile-time:
temp.d(6): Error: null dereference in function _Dmain


A side effect of those rules is that this code doesn't compile, and similarly lot of current D code:

class Foo {}
void main() {
  Foo f;
  assert(f is null);
}


Bye,
bearophile