September 27, 2009
Jeremie Pelletier wrote:
> What if using 'Object obj;' raises a warning "unitialized variable" and makes everyone wanting non-null references happy, and 'Object obj = null;' raises no warning and makes everyone wanting to keep the current system (all two of us!) happy.
> 
> I believe it's a fair compromise.

It's a large improvement, but only for local variables. If your segfault has to do with a local variable, unless your function is monstrously large, it should be easy to fix, without changing the type system.

The larger use case is when you have an aggregate member that cannot be null. This can be solved via contracts, but they are tedious to write and ubiquitous.
September 27, 2009
Walter Bright wrote:
> ...

Admittedly I didn't read the whole thread.  It is hueg liek xbox.

I'll try and explain this non-nullable by default thing in my own way.

Consider a programmer wanting to define a variable.  I will draw a decision tree that they would use in a language that has non-nullable (and nullable) references:

          Programmer needs to declare reference variable...
                               |
                               |
                               |
                      Do they know how to
       yes <--------    initialize it?     --------> no
        |                                            |
        |                                            |
        |                                            |
        v                                            |
 Type t = someExpression();                          |
                                                     v
                                    yes <--------- Brains? ---> no
                                     |                          |
                                     |                          |
                                     v                          v
                                 Type? t;               Type t = dummy;
                         (Explicitly declare)         (Why would anyone)
                         (it to be nullable)             (do this?!?)


So having both kinds of reference types works out like that.

Working with nulls as in current D is as easy as using a nullable type.
 When you need to pass a nullable type to a non-nullable variable or as
a non-nullable function argument, you just manually check for the null
like you should anyways:

Type? t;
... code ...
// If you're lazy.
assert(t);
func(t);

OR, better yet:

Type? t;
... code ...
if ( t )
    func(t);
else
    // Explicitly handle the null value,
    // attempting error recovery if appropriate.

I actually don't know if the syntax would be that nice, but I can dream.

But I still haven't addressed the second part of this:
Which is default?  nullable or non-nullable?
Currently nullable is the default.

Let's consult a table.

+---------------------+--------------+--------------+
|                     |  default is  |  default is  |
|                     | non-nullable |   nullable   |
+---------------------+--------------+--------------+
| Programmer DOESN'T  |   Compiler   | Segfault in  |
| initialize the var. |    error.    | distant file |
| ((s)he forgets)     |   Fast fix.  |      *       |
+---------------------+--------------+--------------+
| Programmer DOES     |  Everything  |  Everything  |
| initialize the var. |   is fine.   |   is fine.   |
+---------------------+--------------+--------------+
| Programmer uses     |  They don't. |  They don't. |
|    dummy variable.  |Nullable used.| Segfault in  |
|                     |  segfault**  | distant file*|
+---------------------+--------------+--------------+

* They will have hours of good fun finding where the segfault-causing null came from.  If the project is non-trivial, the null may have crossed hands over a number of function calls, ditched the police by hiding in a static variable or some class until the heat dies down, or whoops aliasing.  Sometimes stack traces help, sometimes they don't.  We don't even have stack traces without hacking our D installs :/

** Same as *, but less likely since functions are more likely to reject possibly null values, and thus head off the null's escape routes at compile time.


I can see a couple issues with non-nullable by default:
- This:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=96834
- It complicates the language just a bit more.  I'm willing to
grudgingly honor this as a reason for not implementing the feature.
September 27, 2009
Christopher Wright wrote:
> Jeremie Pelletier wrote:
>> What if using 'Object obj;' raises a warning "unitialized variable" and makes everyone wanting non-null references happy, and 'Object obj = null;' raises no warning and makes everyone wanting to keep the current system (all two of us!) happy.
>>
>> I believe it's a fair compromise.
> 
> It's a large improvement, but only for local variables. If your segfault has to do with a local variable, unless your function is monstrously large, it should be easy to fix, without changing the type system.
> 
> The larger use case is when you have an aggregate member that cannot be null. This can be solved via contracts, but they are tedious to write and ubiquitous.

But how would you enforce a nonnull type over an aggregate in the first place? If you can, you could also apply the same initializer semantics I suggested earlier.

Look at this for example:

struct A {
	Object cannotBeNull;
}

void main() {
	A* a = new A;
}

Memory gets initialized to zero, and you have a broken non-null type. You could have the compiler throw an error here, but the compiler cannot possibly know about all data creation methods such as malloc, calloc or any other external allocator.

You could even do something like:

Object* foo = calloc(Object.sizeof);

and the compiler would let you dereference foo resulting in yet another broken nonnull variable.

Non-nulls are a cute idea when you have a type system that is much stricter than D's, but there are just way too many workarounds to make it crash in D.
September 27, 2009
On 27/09/2009 05:45, Michel Fortin wrote:
> On 2009-09-26 23:28:30 -0400, Michel Fortin <michel.fortin@michelf.com>
> said:
>
>> On 2009-09-26 22:07:00 -0400, Walter Bright
>> <newshound1@digitalmars.com> said:
>>
>>> [...] The facilities in D enable one to construct a non-nullable
>>> type, and they are appropriate for many designs. I just don't see
>>> them as a replacement for *all* reference types.
>>
>> As far as I understand this thread, no one here is arguing that
>> non-nullable references/pointers should replace *all*
>> reference/pointer types. The argument made is that non-nullable should
>> be the default and nullable can be specified explicitly any time you
>> need it.
>>
>> So if you need a reference you use "Object" as the type, and if you
>> want that reference to be nullable you write "Object?". The static
>> analysis can then assert that your code properly check for null prior
>> dereferencing a nullable type and issues a compilation error if not.
>
> I just want to add: some people here are suggesting the compiler adds
> code to check for null and throw exceptions... I believe like you that
> this is the wrong approach because, like you said, it makes people add
> dummy try/catch statements to ignore the error. What you want a
> prorammer to do is check for null and properly handle the situation
> before the error occurs, and this is exactly what the static analysis
> approach I suggest forces.
>
> Take this example where "a" is non-nullable and "b" is nullable:
>
> string test(Object a, Object? b)
> {
> auto x = a.toString();
> auto y = b.toString();
>
> return x ~ y;
> }
>
> This should result in a compiler error on line 4 with a message telling
> you that "b" needs to be checked for null prior use. The programmer must
> then fix his error with an if (or some other control structure), like this:
>
> string test(Object a, Object? b)
> {
> audo result = a.toString();
> if (b)
> result ~= b.toString();
>
> return result;
> }
>
> And now the compiler will let it pass. This is what I'd like to see.
> What do you think?
>
> I'm not totally against throwing exceptions in some cases, but the above
> approach would be much more useful. Unfortunatly, throwing exceptions it
> the best you can do with a library type approach.
>

If you refer to my posts than I want to clarify:
I fully agree with you that in the above this can and should be compile-time checked. This is a stricter approach and might seem annoying to some programmers but is far safer.
Using non-null references by default will also restrict writing these checks to only the places where it is actually needed.

In my posts I was simply answering Walter's claim.
Walter was saying that returning null is a valid and in fact better way to indicate errors instead of returning some "default" value which will cause the program to generate bad output.
My response to that was that if there's an error, the function should instead throw an exception which provides more information and better error handling.

null is a bad way to indicate errors precisely because of the point you make - the compiler does not enforce the programmer to explicitly handle the null case unlike the Option type in FP languages.



September 27, 2009
Jason House wrote:
>> Also, by "safe" I presume you mean "memory safe" which means free
>> of memory corruption. Null pointer exceptions are memory safe. A
>> null pointer could be caused by memory corruption, but it cannot
>> *cause* memory corruption.
> 
> I reject this argument too :( To me, code isn't safe if it crashes.

Well, we can't discuss this if we cannot agree on terms. The conventional definition of memory safe means no memory corruption. A null pointer dereference is not memory corruption. You can call it something else, but if you call it "unsafe" then people will misunderstand you.


> Did Boeing avoid checking for fault modes that were easily and
> reliably detectable? It seems stupid to argue that it's ok for an
> altimeter can send bogus data as long as it's easy to detect. All you
> have to do is turn off autopilot. Who cares, right?

Errors in incorrectly initialized data are not easily and reliably detectable. A null pointer, on the other hand, *is* reliably detectable by the hardware.

Boeing's philosophy is that if the airplane cannot tolerate a particular system failing abruptly and completely, then the design is faulty. That's also the FAA regulations. Safety is achieved NOT by designing systems that cannot fail, but by designing systems that can survive failure.

In particular, if the airplane cannot handle turning off the autopilot, it will be rejected by both Boeing and the FAA. Name any single part or system on a Boeing airliner, and if it vanishes abruptly in a puff of smoke, the airliner will survive it.

There is no "the autopilot is receiving corrupted data, but what the hell, we'll keep it turned on anyway". It's inconceivable.

The only reasonable thing a program can do if it discovers it is in an unknown state is to stop immediately. The only reasonable way to use a program is to be able to tolerate its complete failure.


> Why should I use D for production code if it's designed to segfault?
> Software isn't used for important things like autopilot, controlling
> the brakes in my car, or dispensing medicine in hospitals. There's no
> problem allowing that stuff to crash. You can always recover the core
> file, and it's always trivial to reproduce the scenario...

It's not designed to segfault. It's designed to expose errors, not hide them. The system that uses the autopilot is designed to survive total failure of the autopilot. The same for your brakes in your car (ever wonder why there are dual brake systems, and if your power assist fails you can still use the brakes?). I don't know how the ABS works, but I would bet you plenty that if the computer controlling it fails, the brakes will still function. And you bet your life (literally) that if a computer dispensing radiation or medicine into your body better stop immediately if it detects it is in an unknown state.

Do you *really* want the radiation machine to continue operating if it has self-detected a program bug? Do you really want to BET YOUR LIFE that the software in it is perfect? Do you think that requiring the software be literally perfect is a reasonable, achievable, and safe requirement?

I don't. Not for a minute. And NOTHING Boeing designs relies on perfection for safety, either. In fact, the opposite is true, the designs are all based on "what if this fails?" If the answer is "people die" then the engineers are sent back to the trenches.

Hospitals are way, way behind on this approach. Even adding simple checklists (pilots starting using them 70 years ago) have reduced accidental deaths in hospitals by 30%, a staggering improvement.

> Mix in other things like malfunctioning debug data, and I wonder why
> I even use D.

The debug data is a serious problem, and I think I've got it corrected now.
September 27, 2009
Nick Sabalausky wrote:

I agree with you that if the compiler can detect null dereferences at compile time, it should.


>> Also, by "safe" I presume you mean "memory safe" which means free of memory corruption. Null pointer exceptions are memory safe. A null pointer could be caused by memory corruption, but it cannot *cause* memory corruption.
> 
> No, he's using the real meaning of "safe", not the misleadingly-limited "SafeD" version of "safe" (which I'm still convinced is going to get some poor soul into serious trouble from mistakingly thinking their SafeD program is much safer than it really is). Out here in reality, "safe" also means a lack of ability to crash, or at least some level of protection against it. 

Memory safety is something that can be guaranteed (presuming the compiler is correctly implemented). There is no way to guarantee that a non-trivial program cannot crash. It's the old halting problem.

> You seem to be under the impression that nothing can be made uncrashable without introducing the possibility of corrupted state. That's hogwash.

I read that statement several times and I still don't understand what it means.

BTW, hardware null pointer checking is a safety feature, just like array bounds checking is.
September 27, 2009
On 27/09/2009 03:35, Walter Bright wrote:
> Yigal Chripun wrote:
>> An exception trace is *far* better than a segfault and that does not
>> require null values.
>
> Seg faults are exceptions, too. You can even catch them (on windows)!

No, segfaults are *NOT* exceptions. the setup you mention is windows only as Andrei said and for *nix is irrelevant. I develop on Unix (solaris) and segfault are a pain to deal with.

furthermore, even *IF* segfaults were transformed in D to exceptions that still doesn't make them proper exceptions because true exceptions are thrown at the place of the error which is not true for segfaults.


T foo() {
  T t;
  ...logic
  if (error) return null;
  return t;
}

now, foo is buried deep in a lib.

user code has:

T t = someLib.foo();
... logic

t.fubar = 4; //segfault t is null

how is it better to segfault in t.fubar as opposed to throw an exception inside foo?




September 27, 2009
Jeremie Pelletier wrote:
> This may be a good time to ask about how these variables which can be declared anywhere in the function scope are implemented.
> 
> void bar(bool foo) {
>     if(foo) {
>         int a = 1;
>         ...
>     }
>     else {
>         int a = 2;
>         ...
>     }
> 
> }
> 
> is the stack frame using two ints, or is the compiler seeing only one? I never bothered to check it out and just declared 'int a = void;' at the beginning of the routine to keep the stack frames as small as possible.

They are completely independent variables. One may get assigned to a register, and not the other.
September 27, 2009
Jeremie Pelletier wrote:
> void bar(bool foo) {
>     if(foo) {
>         int a = 1;
>         ...
>     }
>     else {
>         int a = 2;
>         ...
>     }
> 
> }
> 
> is the stack frame using two ints, or is the compiler seeing only one? I never bothered to check it out and just declared 'int a = void;' at the beginning of the routine to keep the stack frames as small as possible.

OT, but declaring the variable at the top of the function increases stack size.

Example with changed variable names:

  void bar(bool foo) {
    if (foo) {
      int a = 1;
    } else {
      int b = 2;
    }
    int c = 3;
  }

In this example, there are clearly three different (and differently named) variables, but their lifetimes do not overlap.  Only one variable can exist at a time, therefore the compiler only needs to allocate space for one variable.  Now, if you move your declaration to the top:

  void bar(bool foo) {
    int a = void;
    if (foo) {
      a = 1;
    } else {
      a = 2; // Reuse variable.
    }
    int c = 3;
  }

You now only have two variables, but both of them coexist at the end of the function.  Unless the compiler applies a clever optimization, the compiler is now forced to allocate space for two variables on the stack.


-- 
Rainer Deyke - rainerd@eldwood.com
September 27, 2009
Walter Bright wrote:
> Nick Sabalausky wrote:
> 
> I agree with you that if the compiler can detect null dereferences at compile time, it should.
> 
> 
>>> Also, by "safe" I presume you mean "memory safe" which means free of memory corruption. Null pointer exceptions are memory safe. A null pointer could be caused by memory corruption, but it cannot *cause* memory corruption.
>>
>> No, he's using the real meaning of "safe", not the misleadingly-limited "SafeD" version of "safe" (which I'm still convinced is going to get some poor soul into serious trouble from mistakingly thinking their SafeD program is much safer than it really is). Out here in reality, "safe" also means a lack of ability to crash, or at least some level of protection against it.
> 
> Memory safety is something that can be guaranteed (presuming the compiler is correctly implemented). There is no way to guarantee that a non-trivial program cannot crash. It's the old halting problem.
> 

Okay, I'm gonna have to call you out on this one because it's simply incorrect.

The halting problem deals with a valid program state - halting.

We cannot check if every program halts because halting is an instruction that must be allowed at almost any point in the program.

Why do crashes have to be allowed? They're not an allowed instruction!

A compiler can be turing complete and still not allow crashes. There is nothing wrong with this, and it has *nothing* to do with the halting problem.

>> You seem to be under the impression that nothing can be made uncrashable without introducing the possibility of corrupted state. That's hogwash.
> 
> I read that statement several times and I still don't understand what it means.
> 
> BTW, hardware null pointer checking is a safety feature, just like array bounds checking is.

PS: You can't convert segfaults into exceptions under Linux, as far as I know.
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18