March 06, 2012
On Tuesday, March 06, 2012 05:11:30 Martin Nowak wrote:
> There are two independent discussions being conflated here. One about
> getting more
> information out of crashes even in release mode and the other about
> adding runtime checks to prevent crashing merely in debug builds.

A segfault should _always_ terminate a program - as should dereferencing a null pointer. Those are fatal errors. If we had extra checks, they would have to result in NullPointerErrors, not NullPointerExceptions. It's horribly broken to try and recover from dereferencing a null pointer. So, the question then becomes whether adding the checks and getting an Error thrown is worth doing as opposed to simply detecting it and printing out a stack trace. And throwing an Error is arguably _worse_, because it means that you can't get a useful core dump.

Really, I think that checking for null when dereferencing is out of the question. What we need is to detect it and print out a stacktrace. That will maximize the debug information without costing performance.

- Jonathan M Davis
March 06, 2012
On 03/05/2012 11:27 PM, Jonathan M Davis wrote:
> On Tuesday, March 06, 2012 05:11:30 Martin Nowak wrote:
>> There are two independent discussions being conflated here. One about
>> getting more
>> information out of crashes even in release mode and the other about
>> adding runtime checks to prevent crashing merely in debug builds.
>
> A segfault should _always_ terminate a program - as should dereferencing a
> null pointer. Those are fatal errors. If we had extra checks, they would have
> to result in NullPointerErrors, not NullPointerExceptions. It's horribly
> broken to try and recover from dereferencing a null pointer. So, the question
> then becomes whether adding the checks and getting an Error thrown is worth
> doing as opposed to simply detecting it and printing out a stack trace. And
> throwing an Error is arguably _worse_, because it means that you can't get a
> useful core dump.
>
> Really, I think that checking for null when dereferencing is out of the
> question. What we need is to detect it and print out a stacktrace. That will
> maximize the debug information without costing performance.
>
> - Jonathan M Davis

Why is it fatal?

I'd like to be able to catch these.  I tend to run into a lot of fairly benign sources of these, and they should be try-caught so that the user doesn't get the boot unnecessarily.  Unnecessary crashing can lose user data.  Maybe a warning message is sufficient: "hey that last thing you did didn't turn out so well; please don't do that again." followed by some automatic emailing of admins.  And the email would contain a nice stack trace with line numbers and stack values and... I can dream huh.

I might be convinced that things like segfaults in the /general case/ are fatal.  It could be writing to memory outside the bounds of an array which is both not bounds-checked and may or may not live on the stack. Yuck, huh.  But this is not the same as a null-dereference:

Foo f = null;
f.bar = 4;  // This is exception worthy, yes,
            // but how does it affect unrelated parts of the program?

March 06, 2012
On Monday, March 05, 2012 23:58:48 Chad J wrote:
> On 03/05/2012 11:27 PM, Jonathan M Davis wrote:
> > On Tuesday, March 06, 2012 05:11:30 Martin Nowak wrote:
> >> There are two independent discussions being conflated here. One about
> >> getting more
> >> information out of crashes even in release mode and the other about
> >> adding runtime checks to prevent crashing merely in debug builds.
> > 
> > A segfault should _always_ terminate a program - as should dereferencing a null pointer. Those are fatal errors. If we had extra checks, they would have to result in NullPointerErrors, not NullPointerExceptions. It's horribly broken to try and recover from dereferencing a null pointer. So, the question then becomes whether adding the checks and getting an Error thrown is worth doing as opposed to simply detecting it and printing out a stack trace. And throwing an Error is arguably _worse_, because it means that you can't get a useful core dump.
> > 
> > Really, I think that checking for null when dereferencing is out of the question. What we need is to detect it and print out a stacktrace. That will maximize the debug information without costing performance.
> > 
> > - Jonathan M Davis
> 
> Why is it fatal?
> 
> I'd like to be able to catch these.  I tend to run into a lot of fairly benign sources of these, and they should be try-caught so that the user doesn't get the boot unnecessarily.  Unnecessary crashing can lose user data.  Maybe a warning message is sufficient: "hey that last thing you did didn't turn out so well; please don't do that again." followed by some automatic emailing of admins.  And the email would contain a nice stack trace with line numbers and stack values and... I can dream huh.
> 
> I might be convinced that things like segfaults in the /general case/ are fatal.  It could be writing to memory outside the bounds of an array which is both not bounds-checked and may or may not live on the stack. Yuck, huh.  But this is not the same as a null-dereference:
> 
> Foo f = null;
> f.bar = 4;  // This is exception worthy, yes,
>              // but how does it affect unrelated parts of the program?

If you dereference a null pointer, there is a serious bug in your program. Continuing is unwise. And if it actually goes so far as to be a segfault (since the hardware caught it rather than the program), it is beyond a doubt unsafe to continue. On rare occasion, it might make sense to try and recover from dereferencing a null pointer, but it's like catching an AssertError. It's rarely a good idea. Continuing would mean trying to recover from a logic error in your program. Your program obviously already assumed that the variable wasn't null, or it would have checked for null. So from the point of view of your program's logic, you are by definition in an undefined state, and continuing will have unexpected and potentially deadly behavior.

- Jonathan M Davis
March 06, 2012
On 03/06/2012 12:07 AM, Jonathan M Davis wrote:
>
> If you dereference a null pointer, there is a serious bug in your program.
> Continuing is unwise. And if it actually goes so far as to be a segfault
> (since the hardware caught it rather than the program), it is beyond a doubt
> unsafe to continue. On rare occasion, it might make sense to try and recover
> from dereferencing a null pointer, but it's like catching an AssertError. It's
> rarely a good idea. Continuing would mean trying to recover from a logic error
> in your program. Your program obviously already assumed that the variable
> wasn't null, or it would have checked for null. So from the point of view of
> your program's logic, you are by definition in an undefined state, and
> continuing will have unexpected and potentially deadly behavior.
>
> - Jonathan M Davis

This could be said for a lot of things: array-out-of-bounds exceptions, file-not-found exceptions, conversion exception, etc.  If the programmer thought about it, they would have checked the array length, checked for file existence before opening it, been more careful about converting things, etc.

To me, the useful difference between fatal and non-fatal things is how well isolated the failure is.  Out of memory errors and writes into unexpected parts of memory are very bad things and can corrupt completely unrelated sections of code.  The other things I've mentioned, null-dereference included, cannot do this.

Null-dereferences and such can be isolated to sections of code.  A section of code might become compromised by the dereference, but the code outside of that section is still fine and can continue working.

Example:

// riskyShenanigans does some dubious things with nullable references.
// It was probably written late at night after one too many caffeine
//   pills and alcoholic beverages.  This guy is operating under much
//   worse conditions and easier objectives than the guy writing
//   someFunc().
//
// Thankfully, it can be isolated.
//
int riskyShenanigans()
{
	Foo f = new Foo();
	... blah blah blah ...
	f = null; // surprise!
	... etc etc ...

	// Once this happens, we can't count on 'f' or
	//   anything else in this function to be valid
	//   anymore.
	return f.bar;
}

// The author of someFunc() is trying to be a bit more careful.
// In fact, they'll even go so far as to make this thing nothrow.
// Maybe it's a server process and it's not allowed to die.
//
nothrow void someFunc()
{
	int cheesecake = 7;
	int donut = 0;

	// Here we will make sure that riskyShenanigans() is
	//   well isolated from everything else.
	try
	{
		// All statefulness inside this scope cannot be
		//   trusted when the null dereference happens.
		donut = riskyShenanigans();
	}
	catch( NullDereferenceException e )
	{
		// donut can be recovered if we are very
		//   explicit about it.
		// It MUST be restored to some known state
		//   before we consider using it again.
		donut = 0; // (so we restore it.)
	}

	// At this point, we HAVE accounted for null-dereferences.
	// donut is either a valid value, or it is zero.
	// We know what it will behave like.
	omnom(donut);

	// An even stronger case:
	// cheesecake had nothing to do with riskyShenanigans.
	// It is completely impossible for that null-dereference
	//   to have touched the cheesecake in this code.
	omnom(cheesecake);
}

And if riskyShenanigans were to modify global state... well, it's no longer so well isolated anymore.  This is just a disadvantage of global state, and it will be true with many other possible exceptions too.

Long story short: I don't see how an unexpected behavior in one part of a program will necessarily create unexpected behavior in all parts of the program, especially when good encapsulation is practiced.

Thoughts?
March 06, 2012
On Friday, 2 March 2012 at 04:53:02 UTC, Jonathan M Davis wrote:
> It's defined. The operating system protects you. You get a segfault on *nix and
> an access violation on Windows. Walter's take on it is that there is no point
> in checking for what the operating system is already checking for - especially
> when it adds additional overhead. Plenty of folks disagree, but that's the way
> it is.
> - Jonathan M Davis

One thing we must consider is that this violates scope safety.

This scope(failure) doesn't execute:

import std.stdio;

void main() {
    Object o = null;
    scope(failure) writeln("error");
    o.opCmp(new Object());
}

That's _very_ inconsistent with the scope(failure) guarantee of _always_ executing.

NMS
March 06, 2012
On Tuesday, March 06, 2012 07:16:52 Nathan M. Swan wrote:
> On Friday, 2 March 2012 at 04:53:02 UTC, Jonathan M Davis wrote:
> > It's defined. The operating system protects you. You get a
> > segfault on *nix and
> > an access violation on Windows. Walter's take on it is that
> > there is no point
> > in checking for what the operating system is already checking
> > for - especially
> > when it adds additional overhead. Plenty of folks disagree, but
> > that's the way
> > it is.
> > - Jonathan M Davis
> 
> One thing we must consider is that this violates scope safety.
> 
> This scope(failure) doesn't execute:
> 
> import std.stdio;
> 
> void main() {
>      Object o = null;
>      scope(failure) writeln("error");
>      o.opCmp(new Object());
> }
> 
> That's _very_ inconsistent with the scope(failure) guarantee of
> _always_ executing.

scope(failure) is _not_ guaranteed to always execute on failure. It is _only_ guaranteed to run when an Exception is thrown. Any other Throwable - Errors included - skip all finally blocks, scope statements, and destructors. That's one of the reasons why it's so horrible to try and catch an Error.

If dereferencing null pointers was checked for, it would result in an Error just like RangeError, which skips all destructors, finally blocks, and scope statements. Such problems are considered unrecoverable. If they occur, your program is in an invalid state, and it's better to kill it then to continue. If you want to recover from attempting to derefence a null object, then you need to check before you dereference it.

- Jonathan M Davis
March 06, 2012
On Tuesday, 6 March 2012 at 06:27:31 UTC, Jonathan M Davis wrote:
> scope(failure) is _not_ guaranteed to always execute on failure. It is _only_
> guaranteed to run when an Exception is thrown. Any other Throwable - Errors
> included - skip all finally blocks, scope statements, and destructors. That's
> one of the reasons why it's so horrible to try and catch an Error.

Maybe not guaranteed, but this happens:

code:
import std.stdio;

void main() {
    scope(failure) writeln("bad things just happened");
    int[] x = new int[4_000_000_000_000_000_000];
}

output:
bad things just happened
core.exception.OutOfMemoryError
March 06, 2012
On 2012-03-06 02:17, Michel Fortin wrote:
> On 2012-03-05 22:31:34 +0000, "Steven Schveighoffer"
> <schveiguy@yahoo.com> said:
>
>> On Mon, 05 Mar 2012 05:38:20 -0500, Walter Bright
>> <newshound2@digitalmars.com> wrote:
>>
>>> I don't get this at all. I find it trivial to run the program with a
>>> debugger:
>>>
>>> gdb foo
>>> >run
>>>
>>> that's it.
>>
>> This argument continually irks me to no end. It seems like the trusty
>> (rusty?) sword you always pull out when defending the current
>> behavior, but it falls flat on its face when a programmer is faced
>> with a Seg Fault that has occurred on a program that was running for
>> several days/weeks, possibly not in his development environment, and
>> now he must run it via a debugger to wait another several days/weeks
>> to (hopefully) get the same error.
>>
>> Please stop using this argument, it's only valid on trivial bugs that
>> crash immediately during development.
>
> Walter's argument about using gdb doesn't make sense in many scenarios.
> He's probably used a little too much to programs which are short lived
> and have easily reproducible inputs (like compilers).
>
> That said, throwing an exception might not be a better response all the
> time. On my operating system (Mac OS X) when a program crashes I get a
> nice crash log with the date, a stack trace for each thread with named
> functions, the list of all loaded libraries, and the list of VM regions
> dumped into ~/Library/Logs/CrashReporter/. That's very useful when you
> have a customer experiencing a crash with your software, as you can ask
> for the crash log. Can't you do the same on other operating systems?
>
> Whereas if an exception is thrown without it being catched I get a stack
> trace on the console and nothing else, which is both less informative an
> easier to lose than a crash log sitting there on the disk.

If possible, it would be nice to have both. If I do have a tool that is shorted lived and I'm developing on it, I don't want to have to look up the exception in the log files.

-- 
/Jacob Carlborg
March 06, 2012
On 2012-03-06 03:04, Steven Schveighoffer wrote:
> Certainly for Mac OS X, it should do the most informative appropriate
> thing for the OS it's running on. Does the above happen for D programs
> currently on Mac OS X?

When an exception if thrown and uncaught it will print the stack trace to in the terminal (if run in the terminal). If the program ends with a segmentation fault the stack trace will be outputted to a log file.

-- 
/Jacob Carlborg
March 06, 2012
On 2012-03-06 08:53, Jacob Carlborg wrote:
> On 2012-03-06 03:04, Steven Schveighoffer wrote:
>> Certainly for Mac OS X, it should do the most informative appropriate
>> thing for the OS it's running on. Does the above happen for D programs
>> currently on Mac OS X?
>
> When an exception if thrown and uncaught it will print the stack trace
> to in the terminal (if run in the terminal). If the program ends with a
> segmentation fault the stack trace will be outputted to a log file.
>

Outputting to a log file is handle by the OS and not by druntime.

-- 
/Jacob Carlborg