March 07, 2012
On 03/07/2012 02:40 AM, Chad J wrote:
> But to initialize non-null fields, I suspect we would need to be able to
> do stuff like this:
>
> class Foo
> {
> int dummy;
> }
>
> class Bar
> {
> Foo foo = new Foo();
>
> this() { foo.dummy = 5; }
> }
>
> Which would be lowered by the compiler into this:
>
> class Bar
> {
> // Assume we've already checked for bogus assignments.
> // It is now safe to make this nullable.
> Nullable!(Foo) foo;
>
> this()
> {
> // Member initialization is done first.
> foo = new Foo();
>
> // Then programmer-supplied ctor code runs after.
> foo.dummy = 5;
> }
> }
>
> I remember C# being able to do this. I never understood why D doesn't
> allow this. Without it, I have to repeat myself a lot, and that is just
> wrong ;).]

It is not sufficient.

class Bar{
    Foo foo = new Foo(this);
    void method(){...}
}
class Foo{
    this(Bar bar){bar.foo.method();}
}
March 07, 2012
On Mon, 05 Mar 2012 22:51:28 -0500, Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> On Monday, March 05, 2012 21:04:20 Steven Schveighoffer wrote:
>> On Mon, 05 Mar 2012 20:17:32 -0500, Michel Fortin
>>
>> <michel.fortin@michelf.com> wrote:
>> > That said, throwing an exception might not be a better response all  
>> the
>> > time. On my operating system (Mac OS X) when a program crashes I get a
>> > nice crash log with the date, a stack trace for each thread with named
>> > functions, the list of all loaded libraries, and the list of VM  
>> regions
>> > dumped into ~/Library/Logs/CrashReporter/. That's very useful when you
>> > have a customer experiencing a crash with your software, as you can  
>> ask
>> > for the crash log. Can't you do the same on other operating systems?
>>
>> It depends on the OS facilities and the installed libraries for such
>> features.  It's eminently possible, and I think on Windows, you can catch
>> such exceptions too in external programs to do the same sort of dumping.
>> On Linux, you get a "Segmentation Fault" message (or nothing if you have
>> no terminal showing the output), and the program goes away.  That's the
>> default behavior.  I think it's better in any case to do *something* other
>> than just print "Segmentation Fault" by default.  If someone has a way to
>> hook this in a better fashion, we can include that, but I hazard to guess
>> it will not be on stock Linux boxes.
>
> All you have to do is add a signal handler which handles SIGSEV and have it
> print out a stacktrace. It's pretty easy to do. It _is_ the sort of thing that
> programs may want to override (to handle other signals), so I'm not quite sure
> what the best way to handle that is without causing problems for them (e.g.
> initialization order could affect which handler is added last and is therefore
> the one used). Maybe a function should be added to druntime which wraps the
> glibc function so that programs can add their signal handler through _it_, and
> if that happens, the default one won't be used.

Install the default (stack-trace printing) handler before calling any of the static constructors.  Any call to signal after that will override the installed handler.

-Steve
March 07, 2012
On Tue, 06 Mar 2012 23:07:24 -0500, Walter Bright <newshound2@digitalmars.com> wrote:

> On 3/6/2012 8:05 PM, Walter Bright wrote:
>> What I'm talking about is the idea that one can recover from seg faults
>> resulting from program bugs.
>
> I've written about this before, but I want to emphasize that attempting to recover from program BUGS is the absolutely WRONG way to go about writing fail-safe, critical, fault-tolerant software.

100% agree.  I just want as much information about the bug as possible before the program exits.

-Steve
March 07, 2012
On Mon, 05 Mar 2012 23:58:48 -0500, Chad J <chadjoan@__spam.is.bad__gmail.com> wrote:

> On 03/05/2012 11:27 PM, Jonathan M Davis wrote:
>> On Tuesday, March 06, 2012 05:11:30 Martin Nowak wrote:
>>> There are two independent discussions being conflated here. One about
>>> getting more
>>> information out of crashes even in release mode and the other about
>>> adding runtime checks to prevent crashing merely in debug builds.
>>
>> A segfault should _always_ terminate a program - as should dereferencing a
>> null pointer. Those are fatal errors. If we had extra checks, they would have
>> to result in NullPointerErrors, not NullPointerExceptions. It's horribly
>> broken to try and recover from dereferencing a null pointer. So, the question
>> then becomes whether adding the checks and getting an Error thrown is worth
>> doing as opposed to simply detecting it and printing out a stack trace. And
>> throwing an Error is arguably _worse_, because it means that you can't get a
>> useful core dump.
>>
>> Really, I think that checking for null when dereferencing is out of the
>> question. What we need is to detect it and print out a stacktrace. That will
>> maximize the debug information without costing performance.
>>
>> - Jonathan M Davis
>
> Why is it fatal?

A segmentation fault indicates that a program tried to access memory that is not available.  Since the 0 page is never allocated, any null pointer dereferencing results in a seg fault.

However, there are several causes of seg faults:

1. You forgot to initialize a variable.
2. Your memory has been corrupted, and some corrupted pointer now points into no-mem land.
3. You are accessing memory that has been deallocated.

Only 1 is benign.  2 and 3 are fatal.  Since you cannot know which of these three happened, the only valid choice is to terminate.

I think the correct option is to print a stack trace, and abort the program.

> I'd like to be able to catch these.  I tend to run into a lot of fairly benign sources of these, and they should be try-caught so that the user doesn't get the boot unnecessarily.  Unnecessary crashing can lose user data.  Maybe a warning message is sufficient: "hey that last thing you did didn't turn out so well; please don't do that again." followed by some automatic emailing of admins.  And the email would contain a nice stack trace with line numbers and stack values and... I can dream huh.

You cannot be sure if your program is in a sane state.

> I might be convinced that things like segfaults in the /general case/ are fatal.  It could be writing to memory outside the bounds of an array which is both not bounds-checked and may or may not live on the stack. Yuck, huh.  But this is not the same as a null-dereference:
>
> Foo f = null;
> f.bar = 4;  // This is exception worthy, yes,
>              // but how does it affect unrelated parts of the program?

Again, this is a simple case.  There is also this case:

Foo f = new Foo();
... // some code that corrupts f so that it is now null
f.bar = 4;

This is not a "continue execution" case, and cannot be distinguished from the simple case by compiler or library code.

Philosophically, any null pointer access is a program error, not a user error, and should not be considered for "normal" execution.  Terminating execution is the only right choice.

-Steve
March 07, 2012
On 03/07/2012 07:57 AM, Steven Schveighoffer wrote:
> On Mon, 05 Mar 2012 23:58:48 -0500, Chad J
> <chadjoan@__spam.is.bad__gmail.com> wrote:
>
>>
>> Why is it fatal?
>
> A segmentation fault indicates that a program tried to access memory
> that is not available. Since the 0 page is never allocated, any null
> pointer dereferencing results in a seg fault.
>
> However, there are several causes of seg faults:
>
> 1. You forgot to initialize a variable.
> 2. Your memory has been corrupted, and some corrupted pointer now points
> into no-mem land.
> 3. You are accessing memory that has been deallocated.
>
> Only 1 is benign. 2 and 3 are fatal. Since you cannot know which of
> these three happened, the only valid choice is to terminate.
>
> I think the correct option is to print a stack trace, and abort the
> program.
>

Alright, I think I see where the misunderstanding is coming from.

I have only ever encountered (1).  And I've encountered it a lot.

I didn't even consider (2) and (3) as possibilities.  Those are far from my mind.

I still have a nagging doubt though: since the dereference in question is null, then there is no way for that particular dereference to corrupt other memory.  The only way this happens in (2) and (3) is that related code tries to write to invalid memory.  But if we have other measures in place to prevent that (bounds checking, other hardware signals, etc), then how is it still possible to corrupt memory?

>
> [...]
>
> -Steve

March 07, 2012
On Wed, 07 Mar 2012 09:22:27 -0500, Chad J <chadjoan@__spam.is.bad__gmail.com> wrote:

> On 03/07/2012 07:57 AM, Steven Schveighoffer wrote:
>> On Mon, 05 Mar 2012 23:58:48 -0500, Chad J
>> <chadjoan@__spam.is.bad__gmail.com> wrote:
>>
>>>
>>> Why is it fatal?
>>
>> A segmentation fault indicates that a program tried to access memory
>> that is not available. Since the 0 page is never allocated, any null
>> pointer dereferencing results in a seg fault.
>>
>> However, there are several causes of seg faults:
>>
>> 1. You forgot to initialize a variable.
>> 2. Your memory has been corrupted, and some corrupted pointer now points
>> into no-mem land.
>> 3. You are accessing memory that has been deallocated.
>>
>> Only 1 is benign. 2 and 3 are fatal. Since you cannot know which of
>> these three happened, the only valid choice is to terminate.
>>
>> I think the correct option is to print a stack trace, and abort the
>> program.
>>
>
> Alright, I think I see where the misunderstanding is coming from.
>
> I have only ever encountered (1).  And I've encountered it a lot.

(1) occurs a lot, and in most cases, happens reliably.  Most QA cycles should find them.  There should be no case in which this is not a program error, to be fixed.
(2) and (3) are sinister because errors that occur are generally far away from the root cause, and the memory you are using is compromised.  For example, a memory corruption can cause an error several hours later when you try to use the corrupted memory.

If allowed to continue, such corrupt memory programs can cause lots of problems, e.g. corrupt your saved data, or run malicious code (buffer overflow attack).  It's not worth saving anything.

> I didn't even consider (2) and (3) as possibilities.  Those are far from my mind.
>
> I still have a nagging doubt though: since the dereference in question is null, then there is no way for that particular dereference to corrupt other memory.  The only way this happens in (2) and (3) is that related code tries to write to invalid memory.  But if we have other measures in place to prevent that (bounds checking, other hardware signals, etc), then how is it still possible to corrupt memory?

The null dereference may be a *result* of memory corruption.

example:

class Foo {void foo(){}}

void main()
{
   int[2] x = [1, 2];
   Foo f = new Foo;

   x.ptr[2] = 0; // oops killed f
   f.foo(); // segfault
}

Again, this one is benign, but it doesn't have to be.  I could have just nullified my return stack pointer, etc. along with f.

The larger point is, a SEGV means memory is not as it is expected.  Once you don't trust your memory, you might as well stop.

-Steve
March 07, 2012
On Wednesday, 7 March 2012 at 14:23:18 UTC, Chad J wrote:
> On 03/07/2012 07:57 AM, Steven Schveighoffer wrote:
>> On Mon, 05 Mar 2012 23:58:48 -0500, Chad J
>> <chadjoan@__spam.is.bad__gmail.com> wrote:
>>
>>>
>>> Why is it fatal?
>>
>> A segmentation fault indicates that a program tried to access memory
>> that is not available. Since the 0 page is never allocated, any null
>> pointer dereferencing results in a seg fault.
>>
>> However, there are several causes of seg faults:
>>
>> 1. You forgot to initialize a variable.
>> 2. Your memory has been corrupted, and some corrupted pointer now points
>> into no-mem land.
>> 3. You are accessing memory that has been deallocated.
>>
>> Only 1 is benign. 2 and 3 are fatal. Since you cannot know which of
>> these three happened, the only valid choice is to terminate.
>>
>> I think the correct option is to print a stack trace, and abort the
>> program.
>>
>
> Alright, I think I see where the misunderstanding is coming from.
>
> I have only ever encountered (1).  And I've encountered it a lot.
>
> I didn't even consider (2) and (3) as possibilities.  Those are far from my mind.
>
> I still have a nagging doubt though: since the dereference in question is null, then there is no way for that particular dereference to corrupt other memory.  The only way this happens in (2) and (3) is that related code tries to write to invalid memory.  But if we have other measures in place to prevent that (bounds checking, other hardware signals, etc), then how is it still possible to corrupt memory?
>
>>
>> [...]
>>
>> -Steve

I spoke too soon!
We missed one:

1. You forgot to initialize a variable.
2. Your memory has been corrupted, and some corrupted pointer
 now points into no-mem land.
3. You are accessing memory that has been deallocated.
4. null was being used as a sentinal value, and it snuck into
 a place where the value should not be a sentinal anymore.

I will now change what I said to reflect this:

I think I see where the misunderstanding is coming from.

I encounter (1) from time to time.  It isn't a huge problem because usually if I declare something the next thing on my mind is initializing it.  Even if I forget, I'll catch it in early testing.  It tends to never make it to anyone else's desk, unless it's a regression.  Regressions like this aren't terribly common though.  If you make my program crash from (1), I'll live.

I didn't even consider (2) and (3) as possibilities.  Those are far from my mind.  I think I'm used to VM languages at this point (C#, Java, Actionscript 3, Haxe, Synergy/DE|DBL, etc).  In the VM, (2) and (3) can't happen.  I never worry about those.  Feel free to crash these in D.

I encounter (4) a lot.  I really don't want my programs crashed when (4) happens.  Such crashes would be super annoying, and they can happen at very bad times.

------

Now then, I have 2 things to say about this:

- Why can't we distinguish between these?  As I said in my previous thoughts, we should have ways of ruling out (2) and (3), thus ensuring that our NullDerefException was caused by only (1) or (4).  It's possible in VM languages, but given that the VM is merely a cheesey abstraction, I beleive that it's always possible to accomplish the same things in D %100 of the time.  Usually this requires isolating the system bits from the abstractions.  Saying it can't be done would be giving up way too easily, and you can miss the hidden treasure that way.

- If I'm given some sensible way of handling sentinal values then (4) will become a non-issue.  Then that leaves (1-3), and I am OK if those cause mandatory crashing.  I know I'm probably opening an old can of worms, but D is quite powerful and I think we should be able to solve this stuff.  My instincts tell me that managing sentinal values with special patterns in memory (ex: null values or separate boolean flags) all have pitfalls (null-derefs or SSOT violations that lead to desync).  Perhaps D's uber-powerful type system can rescue us?

The only other problem with this is... what if our list is not exhaustive, and (5) exists?


March 07, 2012
On Wednesday, 7 March 2012 at 14:23:18 UTC, Chad J wrote:
> On 03/07/2012 07:57 AM, Steven Schveighoffer wrote:
>> On Mon, 05 Mar 2012 23:58:48 -0500, Chad J
>> <chadjoan@__spam.is.bad__gmail.com> wrote:
>>
>>>
>>> Why is it fatal?
>>
>> A segmentation fault indicates that a program tried to access memory
>> that is not available. Since the 0 page is never allocated, any null
>> pointer dereferencing results in a seg fault.
>>
>> However, there are several causes of seg faults:
>>
>> 1. You forgot to initialize a variable.
>> 2. Your memory has been corrupted, and some corrupted pointer now points
>> into no-mem land.
>> 3. You are accessing memory that has been deallocated.
>>
>> Only 1 is benign. 2 and 3 are fatal. Since you cannot know which of
>> these three happened, the only valid choice is to terminate.
>>
>> I think the correct option is to print a stack trace, and abort the
>> program.
>>
>
> Alright, I think I see where the misunderstanding is coming from.
>
> I have only ever encountered (1).  And I've encountered it a lot.
>
> I didn't even consider (2) and (3) as possibilities.  Those are far from my mind.
>
> I still have a nagging doubt though: since the dereference in question is null, then there is no way for that particular dereference to corrupt other memory.  The only way this happens in (2) and (3) is that related code tries to write to invalid memory.  But if we have other measures in place to prevent that (bounds checking, other hardware signals, etc), then how is it still possible to corrupt memory?
>
>>
>> [...]
>>
>> -Steve

I spoke too soon!
We missed one:

1. You forgot to initialize a variable.
2. Your memory has been corrupted, and some corrupted pointer
   now points into no-mem land.
3. You are accessing memory that has been deallocated.
4. null was being used as a sentinal value, and it snuck into
   a place where the value should not be a sentinal anymore.

I will now change what I said to reflect this:

I think I see where the misunderstanding is coming from.

I encounter (1) from time to time.  It isn't a huge problem
because usually if I declare something the next thing on my mind
is initializing it.  Even if I forget, I'll catch it in early
testing.  It tends to never make it to anyone else's desk, unless
it's a regression.  Regressions like this aren't terribly common
though.  If you make my program crash from (1), I'll live.

I didn't even consider (2) and (3) as possibilities.  Those are
far from my mind.  I think I'm used to VM languages at this point
(C#, Java, Actionscript 3, Haxe, Synergy/DE|DBL, etc).  In the
VM, (2) and (3) can't happen.  I never worry about those.  Feel
free to crash these in D.

I encounter (4) a lot.  I really don't want my programs crashed
when (4) happens.  Such crashes would be super annoying, and they
can happen at very bad times.

------

Now then, I have 2 things to say about this:

- Why can't we distinguish between these?  As I said in my
previous thoughts, we should have ways of ruling out (2) and (3),
thus ensuring that our NullDerefException was caused by only (1)
or (4).  It's possible in VM languages, but given that the VM is
merely a cheesey abstraction, I beleive that it's always possible
to accomplish the same things in D %100 of the time.  Usually
this requires isolating the system bits from the abstractions.
Saying it can't be done would be giving up way too easily, and
you can miss the hidden treasure that way.

- If I'm given some sensible way of handling sentinal values then
(4) will become a non-issue.  Then that leaves (1-3), and I am OK
if those cause mandatory crashing.  I know I'm probably opening
an old can of worms, but D is quite powerful and I think we
should be able to solve this stuff.  My instincts tell me that
managing sentinal values with special patterns in memory (ex:
null values or separate boolean flags) all have pitfalls
(null-derefs or SSOT violations that lead to desync).  Perhaps
D's uber-powerful type system can rescue us?

The only other problem with this is... what if our list is not
exhaustive, and (5) exists?


March 07, 2012
On Wed, 07 Mar 2012 10:10:32 -0500, Chad J <chadjoan@__spam.is.bad__gmail.com> wrote:

> On Wednesday, 7 March 2012 at 14:23:18 UTC, Chad J wrote:
>> On 03/07/2012 07:57 AM, Steven Schveighoffer wrote:
>>> On Mon, 05 Mar 2012 23:58:48 -0500, Chad J
>>> <chadjoan@__spam.is.bad__gmail.com> wrote:
>>>
>>>>
>>>> Why is it fatal?
>>>
>>> A segmentation fault indicates that a program tried to access memory
>>> that is not available. Since the 0 page is never allocated, any null
>>> pointer dereferencing results in a seg fault.
>>>
>>> However, there are several causes of seg faults:
>>>
>>> 1. You forgot to initialize a variable.
>>> 2. Your memory has been corrupted, and some corrupted pointer now points
>>> into no-mem land.
>>> 3. You are accessing memory that has been deallocated.
>>>
>>> Only 1 is benign. 2 and 3 are fatal. Since you cannot know which of
>>> these three happened, the only valid choice is to terminate.
>>>
>>> I think the correct option is to print a stack trace, and abort the
>>> program.
>>>
>>
>> Alright, I think I see where the misunderstanding is coming from.
>>
>> I have only ever encountered (1).  And I've encountered it a lot.
>>
>> I didn't even consider (2) and (3) as possibilities.  Those are far from my mind.
>>
>> I still have a nagging doubt though: since the dereference in question is null, then there is no way for that particular dereference to corrupt other memory.  The only way this happens in (2) and (3) is that related code tries to write to invalid memory.  But if we have other measures in place to prevent that (bounds checking, other hardware signals, etc), then how is it still possible to corrupt memory?
>>
>>>
>>> [...]
>>>
>>> -Steve
>
> I spoke too soon!
> We missed one:
>
> 1. You forgot to initialize a variable.
> 2. Your memory has been corrupted, and some corrupted pointer
>   now points into no-mem land.
> 3. You are accessing memory that has been deallocated.
> 4. null was being used as a sentinal value, and it snuck into
>   a place where the value should not be a sentinal anymore.
>
> I will now change what I said to reflect this:
>
> I think I see where the misunderstanding is coming from.
>
> I encounter (1) from time to time.  It isn't a huge problem because usually if I declare something the next thing on my mind is initializing it.  Even if I forget, I'll catch it in early testing.  It tends to never make it to anyone else's desk, unless it's a regression.  Regressions like this aren't terribly common though.  If you make my program crash from (1), I'll live.
>
> I didn't even consider (2) and (3) as possibilities.  Those are far from my mind.  I think I'm used to VM languages at this point (C#, Java, Actionscript 3, Haxe, Synergy/DE|DBL, etc).  In the VM, (2) and (3) can't happen.  I never worry about those.  Feel free to crash these in D.
>
> I encounter (4) a lot.  I really don't want my programs crashed when (4) happens.  Such crashes would be super annoying, and they can happen at very bad times.

You can use sentinels other than null.

-Steve
March 07, 2012
On Wed, Mar 07, 2012 at 09:22:27AM -0500, Chad J wrote:
> On 03/07/2012 07:57 AM, Steven Schveighoffer wrote:
[...]
> >However, there are several causes of seg faults:
> >
> >1. You forgot to initialize a variable.
> >2. Your memory has been corrupted, and some corrupted pointer now
> >points into no-mem land.
> >3. You are accessing memory that has been deallocated.
> >
> >Only 1 is benign. 2 and 3 are fatal. Since you cannot know which of these three happened, the only valid choice is to terminate.
> >
> >I think the correct option is to print a stack trace, and abort the program.
> >
> 
> Alright, I think I see where the misunderstanding is coming from.
> 
> I have only ever encountered (1).  And I've encountered it a lot.
> 
> I didn't even consider (2) and (3) as possibilities.  Those are far
> from my mind.
> 
> I still have a nagging doubt though: since the dereference in question is null, then there is no way for that particular dereference to corrupt other memory.  The only way this happens in (2) and (3) is that related code tries to write to invalid memory. But if we have other measures in place to prevent that (bounds checking, other hardware signals, etc), then how is it still possible to corrupt memory?
[...]

It's not that the null pointer itself corrupts memory. It's that the null pointer is a sign that something may have corrupted memory *before* you got to that point.

The point is, it's impossible to tell whether the null pointer was merely the result of forgetting to initialize something, or it's a symptom of a far more sinister problem. The source of the problem could potentially be very far away, in unrelated code, and only when you tried to access the pointer, you discover that something is wrong.

At that point, it may very well be the case that the null pointer isn't just a benign uninitialized pointer, but the result of a memory corruption, perhaps an exploit in the process of taking over your application, or some internal consistency error that is in the process of destroying user data. Trying to continue is a bad idea, since you'd be letting the exploit take over, or allowing user data to get even more corrupted than it already is.


T

-- 
Be in denial for long enough, and one day you'll deny yourself of things you wish you hadn't.