View mode: basic / threaded / horizontal-split · Log in · Help
August 11, 2012
Re: Which D features to emphasize for academic review article
On Friday, 10 August 2012 at 22:01:46 UTC, Walter Bright wrote:
> It catches only a subset of these at compile time. I can craft 
> any number of ways of getting it to miss diagnosing it. 
> Consider this one:
>
>     float z;
>     if (condition1)
>          z = 5;
>     ... lotsa code ...
>     if (condition2)
>          z++;
>
> To diagnose this correctly, the static analyzer would have to 
> determine that condition1 produces the same result as 
> condition2, or not. This is impossible to prove. So the static 
> analyzer either gives up and lets it pass, or issues an 
> incorrect diagnostic. So our intrepid programmer is forced to 
> write:
>
>     float z = 0;
>     if (condition1)
>          z = 5;
>     ... lotsa code ...
>     if (condition2)
>          z++;
>
> Now, as it may turn out, for your algorithm the value "0" is an 
> out-of-range, incorrect value. Not a problem as it is a dead 
> assignment, right?
>
> But then the maintenance programmer comes along and changes 
> condition1 so it is not always the same as condition2, and now 
> the z++ sees the invalid "0" value sometimes, and a silent bug 
> is introduced.
>
> This bug will not remain undetected with the default NaN 
> initialization.

The compiler in languages like C# doesn't try to prove that the 
variable is NOT set and then emits an error. It tries to prove 
that the variable IS set, and if it can't prove that, it's an 
error.

It's not an incorrect diagnostic, it does exactly what it's 
supposed to do and the programmer has to be explicit when one 
takes on the responsibility of initialization. I don't see 
anybody complaining about this feature in C#, most experienced C# 
programmers I've talked to love it (I much prefer it too).

Leaving a local variable initially uninitialized (or rather, not 
explicitly initialized) is a good way to portray the intention 
that it's going to be conditionally initialized later. In C#, if 
your program compiles, your variable is guaranteed to be 
initialized later but before use. This is a useful guarantee when 
reading/maintaining code.

In D, on the other hand, it's possible to write D code like:

for(size_t i; i < length; ++i)
{
    ...
}

And I've actually seen this kind of code a lot in the wild. It 
boggles my mind that you think that this code should be legal. I 
think it's lazy - the intention is not clear. Is the default 
initializer being intentionally relied on, or was it 
unintentional? I've seen both cases. The for-loop example is an 
extreme one for demonstrative purposes, most examples are less 
obvious.

Saying that most programmers will explicitly initialize floating 
point numbers to 0 instead of NaN when taking on initialization 
responsibility is a cop-out - float.init and float.nan are 
obviously the values you should be going for. The benefit is easy 
for programmers to understand, especially if they already 
understand why float.init is NaN. You say yelling at them 
probably won't help - why not? I personally use 
float.init/double.init etc. in my own code, and I'm sure other 
informed programmers do too. I can understand why people don't do 
it in, say, C, with NaN being less defined there afaik. D 
promotes NaN actively and programmers should be eager to leverage 
NaN explicitly too.

It's also important to note that C# works the same as D for 
non-local variables - they all have a defined default initializer 
(the C# equivalent of T.init is default(T)). Another point is 
that the local-variable analysis is limited to the scope of a 
single function body, it does not do inter-procedural analysis.

I think this would be a great thing for D, and I believe that all 
code this change breaks is actually broken to begin with.
August 11, 2012
Re: Which D features to emphasize for academic review article
On 8/11/2012 1:30 AM, Era Scarecrow wrote:
> On Saturday, 11 August 2012 at 04:33:38 UTC, Walter Bright wrote:
>> It's too bad that ints don't have a NaN value, but interestingly enough,
>> valgrind does default initialize them to some internal NaN, making it a most
>> excellent bug detector.
>
>   The compiler could always have flags specifying if variables were used, and if
> they are false they are as good as NaN. Only downside is a performance hit
> unless you Mark it as a release binary. It really comes down to if it's worth
> implementing or considered a big change (unless it's a flag you have to
> specially turn on)

Not so easy. Suppose you pass a pointer to the variable to another function. 
Does that function set it?
August 11, 2012
Re: Which D features to emphasize for academic review article
On 8/11/2012 1:57 AM, Jakob Ovrum wrote:
> The compiler in languages like C# doesn't try to prove that the variable is NOT
> set and then emits an error. It tries to prove that the variable IS set, and if
> it can't prove that, it's an error.
>
> It's not an incorrect diagnostic, it does exactly what it's supposed to do

Of course it is doing what the language requires, but it is an incorrect 
diagnostic because a dead assignment is required.

And being a dead assignment, it can lead to errors when the code is later 
modified, as I explained. I also dislike on aesthetic grounds meaningless code 
being required.

> In D, on the other hand, it's possible to write D code like:
>
> for(size_t i; i < length; ++i)
> {
>      ...
> }
>
> And I've actually seen this kind of code a lot in the wild. It boggles my mind
> that you think that this code should be legal. I think it's lazy - the intention
> is not clear. Is the default initializer being intentionally relied on, or was
> it unintentional? I've seen both cases. The for-loop example is an extreme one
> for demonstrative purposes, most examples are less obvious.

That perhaps is your experience with other languages (that do not default 
initialize) showing. I don't think that default initialization is so awful. In 
fact, C++ enables one to specify default initialization for user defined types. 
Are you against that, too?


> Saying that most programmers will explicitly initialize floating point numbers
> to 0 instead of NaN when taking on initialization responsibility is a cop-out -

You can certainly say it's a copout, but it's what I see them do. I've never 
seen them initialize to NaN, but I've seen the "just throw in a 0" many times.


> float.init and float.nan are obviously the values you should be going for. The
> benefit is easy for programmers to understand, especially if they already
> understand why float.init is NaN. You say yelling at them probably won't help -
> why not?

Because experience shows that even the yellers tend to do the short, convenient 
one rather than the longer, correct one. Bruce Eckel wrote an article about this 
years ago in reference to why Java exception specifications were a failure and 
actually caused people to write bad code, including those who knew better.
August 11, 2012
Re: Which D features to emphasize for academic review article
On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:
> On 8/11/2012 1:57 AM, Jakob Ovrum wrote:
> Because experience shows that even the yellers tend to do the 
> short, convenient one rather than the longer, correct one. 
> Bruce Eckel wrote an article about this years ago in reference 
> to why Java exception specifications were a failure and 
> actually caused people to write bad code, including those who 
> knew better.

I have to agree here.

I spend my work time between JVM and .NET based languages, and
checked exceptions are on my top 5 list of what went wrong with 
Java.

You see lots of

try {
 ...
} catch (Exception e) {
  e.printStackException();
}

in enterprise code.

--
Paulo
August 11, 2012
Re: Which D features to emphasize for academic review article
On 8/11/12 3:11 AM, F i L wrote:
> I still prefer float class members to be defaulted to a usable value,
> for the sake of consistency with ints.

Actually there's something that just happened two days ago to me that's 
relevant to this, particularly because it's in a different language 
(SQL) and different domain (Machine Learning).

I was working with an iterative algorithm implemented in SQL, which 
performs some aggregate computation, on some 30 billions of samples. The 
algorithm is rather intricate, and each iteration takes the previous 
one's result as input.

Somehow at the end there were NaNs in the sample data I was looking at 
(there weren't supposed to). So I started investigating; the NaNs could 
appear only in a rare data corruption case. And indeed before long I 
found 4 (four) samples out of 30 billion that were corrupt. After one 
iteration, there were 300K NaNs. After two iterations, a few millions. 
After four, 800M samples were messed up. NaNs did save the day.

Although this case is not about default values but about the result of a 
computation (in this case 0.0/0.0), I think it still reveals the 
usefulness of having a singular value in the floating point realm.


Andrei
August 11, 2012
Re: Which D features to emphasize for academic review article
On Saturday, 11 August 2012 at 09:26:42 UTC, Walter Bright wrote:
> On 8/11/2012 1:30 AM, Era Scarecrow wrote:

>> The compiler could always have flags specifying if variables 
>> were used, and if they are false they are as good as NaN. Only 
>> downside is a performance hit unless you Mark it as a release 
>> binary. It really comes down to if it's worth implementing or 
>> considered a big change (unless it's a flag you have to 
>> specially turn on)
>
> Not so easy. Suppose you pass a pointer to the variable to 
> another function. Does that function set it?

 I suppose there could be a second hidden pointer/bool as part of 
calls, but then it's completely incompatible with any C calling 
convention, meaning that is probably out of the question.

 Either a) pointers are low level enough that like casting; At 
which case it's all up to the programmer. or b) same as before 
that unless it's an 'out' parameter is specified, it would likely 
throw an exception at that point, (Since attempting to read/pass 
the address of an uninitialized variable is the same as accessing 
it directly). Afterall having a false positive is better than not 
being involved at all right?

 Of course with that in mind, specifying a variable to begin as 
void (uninitialized) could be it's own form of initialization? 
(Meaning it wouldn't be checking those even though they hold 
known garbage)
August 11, 2012
Re: Which D features to emphasize for academic review article
On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:
> Of course it is doing what the language requires, but it is an 
> incorrect diagnostic because a dead assignment is required.
>
> And being a dead assignment, it can lead to errors when the 
> code is later modified, as I explained. I also dislike on 
> aesthetic grounds meaningless code being required.

It is not meaningless, it's declarative. The same resulting code 
as now would be generated, but it's easier for the maintainer to 
understand what's being meant.

> That perhaps is your experience with other languages (that do 
> not default initialize) showing. I don't think that default 
> initialization is so awful. In fact, C++ enables one to specify 
> default initialization for user defined types. Are you against 
> that, too?

No, because user-defined types can have explicitly initialized 
members. I do think that member fields relying on the default 
initializer are ambiguous and should be explicit, but flow 
analysis on aggregate members is not going to work in any current 
programming language. D already works similarly to C# on this 
point.

And for the record, I have more experience with D than C#. I 
barely use C#, but I'm not afraid to point out its good parts 
even though D is my personal favourite.

> You can certainly say it's a copout, but it's what I see them 
> do. I've never seen them initialize to NaN, but I've seen the 
> "just throw in a 0" many times.

Again, I agree with this - except the examples are not from D, 
and certainly not from the future D that is being proposed. I 
don't blame anyone from steering away from NaN in other C-style 
languages.

I do, however, believe that D programmers are perfectly capable 
of doing the right thing if informed. And let's face it - there's 
a lot that relies on education in D, like whether to receive a 
string parameter as const or immutable, and using scope on a 
subset of callback parameters. Both of these examples require 
more typing than the intuitive/straight-forward choice (always 
receive `string` and no `scope` on delegates), but informed D 
programmers still choose the more lengthy, correct version.

Consider `pure` member functions - turns out most of them are 
actually pure because the implicit `this` parameter is allowed to 
be mutated and it's rare for a member function to mutate global 
state, yet we all strive to correctly decorate our methods `pure` 
when applicable.

> Because experience shows that even the yellers tend to do the 
> short, convenient one rather than the longer, correct one. 
> Bruce Eckel wrote an article about this years ago in reference 
> to why Java exception specifications were a failure and 
> actually caused people to write bad code, including those who 
> knew better.

I don't think the comparison is fair.

Compared to Java exception specifications, the difference between 
'0' and 'float.nan'/'float.init' is negligible, especially in 
generic functions when the desired initializer would typically be 
'T.init'.

Java exception specifications have widespread implications for 
the entire codebase, while the difference between '0' and 
'float.nan' is constant and entirely a local improvement.
August 11, 2012
Re: Which D features to emphasize for academic review article
Andrei Alexandrescu wrote:
> [ ... ]
>
> Although this case is not about default values but about the 
> result of a computation (in this case 0.0/0.0), I think it 
> still reveals the usefulness of having a singular value in the 
> floating point realm.

My argument was never against the usefulness of NaN for 
debugging... only that it should be considered a debugging 
feature and explicitly defined, rather than intruding on 
convenience and consistency (with Int) by being the default.

I completely agree NaNs are important for debugging floating 
point math, in fact D's default-to-NaN has caught a couple of my 
construction mistakes before. The problem, is that this sort of 
construction mistake is bigger than just floating point and NaN. 
You can mis-set a variable, float or not, or you can not set an 
int when you should have.

So the question becomes not what benefit NaN is for debugging, 
but what a persons thought process is when creating/debugging 
code, and herein lies the heart of my qualm. In D we have a bit 
of a conceptual double standard within the number community. I 
have to remember these rules when I'm creating something, not 
just when I'm debugging it. As often as D may have caught a 
construction mistake specifically related to floats in my code, 
10x more so it's produced NaN's where I intended a number, 
because I forgot about the double standard when adding a field or 
creating a variable.

A C++ guy might not think twice about this because he's used to 
having to default values all the time (IDK, I'm not that guy), 
but to a C# guy, D's approach feels more like a regression, and 
that's a paper-cut on someone's opinion of the language.
August 11, 2012
Re: Which D features to emphasize for academic review article
On 8/11/2012 12:33 PM, F i L wrote:
> In D we have a bit of a conceptual double standard within the
> number community. I have to remember these rules when I'm creating something,
> not just when I'm debugging it. As often as D may have caught a construction
> mistake specifically related to floats in my code, 10x more so it's produced
> NaN's where I intended a number, because I forgot about the double standard when
> adding a field or creating a variable.

I'd rather have a 100 easy to find bugs than 1 unnoticed one that went out in 
the field.


> A C++ guy might not think twice about this because he's used to having to
> default values all the time (IDK, I'm not that guy),

Only if a default constructor is defined for the type, which it often is not, 
and you'll get garbage for a default initialization.
August 11, 2012
Re: Which D features to emphasize for academic review article
F i L:

> Walter Bright wrote:
>> 3. Floating point values are default initialized to NaN.
>
> This isn't a good feature, IMO. C# handles this much more 
> conveniently

An alternative possibility is to:
1) Default initialize variables just as currently done in D, with 
0s, NaNs, etc;
2) Where the compiler is certain a variable is read before any 
possible initialization, it generates a compile-time error;
3) Warnings for unused variables and unused last assignments.

Where the compiler is not sure, not able to tell, or sees there 
is one or more paths where the variable is initialized, it gives 
no errors, and eventually the code will use the default 
initialized values, as currently done in D.


The D compiler is already doing this a little, if you compile 
this with -O:

class Foo {
  void bar() {}
}
void main() {
  Foo f;
  f.bar();
}

You get at compile-time:
temp.d(6): Error: null dereference in function _Dmain


A side effect of those rules is that this code doesn't compile, 
and similarly lot of current D code:

class Foo {}
void main() {
  Foo f;
  assert(f is null);
}


Bye,
bearophile
1 2 3 4 5 6 7
Top | Discussion index | About this forum | D home