March 07, 2012
On Wednesday, March 07, 2012 07:55:35 H. S. Teoh wrote:
> It's not that the null pointer itself corrupts memory. It's that the null pointer is a sign that something may have corrupted memory *before* you got to that point.
> 
> The point is, it's impossible to tell whether the null pointer was merely the result of forgetting to initialize something, or it's a symptom of a far more sinister problem. The source of the problem could potentially be very far away, in unrelated code, and only when you tried to access the pointer, you discover that something is wrong.
> 
> At that point, it may very well be the case that the null pointer isn't just a benign uninitialized pointer, but the result of a memory corruption, perhaps an exploit in the process of taking over your application, or some internal consistency error that is in the process of destroying user data. Trying to continue is a bad idea, since you'd be letting the exploit take over, or allowing user data to get even more corrupted than it already is.

Also, while D does much more to protect you from stuff like memory corruption than C/C++ does, it's still a systems language. Stuff like that can definitely happen. If you're writing primarily in SafeD, then it's very much minimized, but it's not necessarily eliminated. All it takes is a bug in @system code which could corrupt memory, and voila, you have corrupted memory, and an @safe function could get a segfault even though it's correct code. It's likely to be a very rare occurrence, but it's possible. A since when you get a segfault, you can't know what caused it, you have to assume that it could have been caused by one of the nastier possibilites rather than a relatively benign one.

And since ultimately, your program should be checking for null before derefencing a variable in any case where it could be null, segfaulting due to dereferencing a null pointer is a program bug which should be caught in testing - like assertions in general are - rather than having the program attempt to recover from it. And if you do _that_, the odds of a segfault being due to something very nasty just go up, making it that much more of a bad idea to try and recover from one.

- Jonathan M Davis
March 08, 2012
On 03/07/2012 04:41 AM, Timon Gehr wrote:
> On 03/07/2012 02:40 AM, Chad J wrote:
>> But to initialize non-null fields, I suspect we would need to be able to
>> do stuff like this:
>>
>> class Foo
>> {
>> int dummy;
>> }
>>
>> class Bar
>> {
>> Foo foo = new Foo();
>>
>> this() { foo.dummy = 5; }
>> }
>>
>> Which would be lowered by the compiler into this:
>>
>> class Bar
>> {
>> // Assume we've already checked for bogus assignments.
>> // It is now safe to make this nullable.
>> Nullable!(Foo) foo;
>>
>> this()
>> {
>> // Member initialization is done first.
>> foo = new Foo();
>>
>> // Then programmer-supplied ctor code runs after.
>> foo.dummy = 5;
>> }
>> }
>>
>> I remember C# being able to do this. I never understood why D doesn't
>> allow this. Without it, I have to repeat myself a lot, and that is just
>> wrong ;).]
>
> It is not sufficient.
>
> class Bar{
> Foo foo = new Foo(this);
> void method(){...}
> }
> class Foo{
> this(Bar bar){bar.foo.method();}
> }

Lowered it a bit to try to compile, because it seems Foo doesn't have a method() :

import std.stdio;

class Bar{
  Foo foo;
  this()
  {
    foo = new Foo(this);
  }
  void method(){ writefln("poo"); }
}
class Foo{
  this(Bar bar){bar.foo.method();}
}

void main()
{
}

And, it doesn't:
main.d(12): Error: no property 'method' for type 'main.Foo'

Though, more to the point:
I would probably forbid "Foo foo = new Foo(this);".  The design that leads to this is creating circular dependencies, which is usually bad to begin with.  Would we lose much of value?
March 08, 2012
On 03/08/2012 01:24 AM, Chad J wrote:
> On 03/07/2012 04:41 AM, Timon Gehr wrote:
>> On 03/07/2012 02:40 AM, Chad J wrote:
>>> But to initialize non-null fields, I suspect we would need to be able to
>>> do stuff like this:
>>>
>>> class Foo
>>> {
>>> int dummy;
>>> }
>>>
>>> class Bar
>>> {
>>> Foo foo = new Foo();
>>>
>>> this() { foo.dummy = 5; }
>>> }
>>>
>>> Which would be lowered by the compiler into this:
>>>
>>> class Bar
>>> {
>>> // Assume we've already checked for bogus assignments.
>>> // It is now safe to make this nullable.
>>> Nullable!(Foo) foo;
>>>
>>> this()
>>> {
>>> // Member initialization is done first.
>>> foo = new Foo();
>>>
>>> // Then programmer-supplied ctor code runs after.
>>> foo.dummy = 5;
>>> }
>>> }
>>>
>>> I remember C# being able to do this. I never understood why D doesn't
>>> allow this. Without it, I have to repeat myself a lot, and that is just
>>> wrong ;).]
>>
>> It is not sufficient.
>>
>> class Bar{
>> Foo foo = new Foo(this);
>> void method(){...}
>> }
>> class Foo{
>> this(Bar bar){bar.foo.method();}
>> }
>
> Lowered it a bit to try to compile, because it seems Foo doesn't have a
> method() :
>
> import std.stdio;
>
> class Bar{
> Foo foo;
> this()
> {
> foo = new Foo(this);
> }
> void method(){ writefln("poo"); }
> }
> class Foo{
> this(Bar bar){bar.foo.method();}
> }
>
> void main()
> {
> }
>
> And, it doesn't:
> main.d(12): Error: no property 'method' for type 'main.Foo'
>

Just move the method from Bar to Foo.

import std.stdio;

class Bar{
    Foo foo;
    this()
    {
        foo = new Foo(this);
    }
}
class Foo{
    this(Bar bar){bar.foo.method();}
    void method(){ writefln("poo"); }
}

void main()
{
    auto bar = new Bar;
}



> Though, more to the point:
> I would probably forbid "Foo foo = new Foo(this);". The design that
> leads to this is creating circular dependencies, which is usually bad to
> begin with.

Circular object references are often justified.

> Would we lose much of value?

Well this would amount to forbidding escaping an object from its constructor, as well as forbidding calling any member functions from the constructor. Also, if you *need* to create a circular structure, you'd have to use sentinel objects. Those are worse than null.

March 08, 2012
On 03/07/2012 10:21 AM, Steven Schveighoffer wrote:
> On Wed, 07 Mar 2012 10:10:32 -0500, Chad J
> <chadjoan@__spam.is.bad__gmail.com> wrote:
>
>> On Wednesday, 7 March 2012 at 14:23:18 UTC, Chad J wrote:
>>
>> I spoke too soon!
>> We missed one:
>>
>> 1. You forgot to initialize a variable.
>> 2. Your memory has been corrupted, and some corrupted pointer
>> now points into no-mem land.
>> 3. You are accessing memory that has been deallocated.
>> 4. null was being used as a sentinal value, and it snuck into
>> a place where the value should not be a sentinal anymore.
>>
>> I will now change what I said to reflect this:
>>
>> I think I see where the misunderstanding is coming from.
>>
>> I encounter (1) from time to time. It isn't a huge problem because
>> usually if I declare something the next thing on my mind is
>> initializing it. Even if I forget, I'll catch it in early testing. It
>> tends to never make it to anyone else's desk, unless it's a
>> regression. Regressions like this aren't terribly common though. If
>> you make my program crash from (1), I'll live.
>>
>> I didn't even consider (2) and (3) as possibilities. Those are far
>> from my mind. I think I'm used to VM languages at this point (C#,
>> Java, Actionscript 3, Haxe, Synergy/DE|DBL, etc). In the VM, (2) and
>> (3) can't happen. I never worry about those. Feel free to crash these
>> in D.
>>
>> I encounter (4) a lot. I really don't want my programs crashed when
>> (4) happens. Such crashes would be super annoying, and they can happen
>> at very bad times.
>
> You can use sentinels other than null.
>
> -Steve

Example?

Here, if you want, I'll start with a typical case.  Please make it right.

class UnreliableResource
{
	this(string sourceFile) {...}
	this(uint userId) {...}
	void doTheThing() {...}
}

void main()
{
	// Set this to a sentinal value for cases where the source does
	//   not exist, thus preventing proper initialization of res.
	UnreliableResource res = null;

	// The point here is that obtaining this unreliable resource
	//   is tricky business, and therefore complicated.
	//
	if ( std.file.exists("some_special_file") )
	{
		res = new UnreliableResource("some_special_file");
	}
	else
	{
		uint uid = getUserIdSomehow();
		if ( isValidUserId(uid) )
		{
			res = new UnreliableResource(uid);
		}
	}

	// Do some other stuff.
	...
	
	// Now use the resource.
	try
	{
		thisCouldBreakButItWont(res);
	}
	// Fairly safe if we were in a reasonable VM.
	catch ( NullDerefException e )
	{
		writefln("This shouldn't happen, but it did.");
	}
}

void thisCouldBreakButItWont(UnreliableResource res)
{
	if ( res != null )
	{
		res.doTheThing();
	}
	else
	{
		doSomethingUsefulThatCanHappenWhenResIsNotAvailable();
		writefln("Couldn't find the resource thingy.");
		writefln("Resetting the m-rotor.  (NOOoooo!)");
	}
}

Please follow these constraints:

- Do not use a separate boolean variable for determining whether or not 'res' could be created.  This violates a kind of SSOT (http://en.wikipedia.org/wiki/Single_Source_of_Truth) because it allows cases where the hypothetical "resIsInitialized" variable is true but res isn't actually initialized, or where "resIsInitialized" is false but res is actually initialized.  It also doesn't throw catchable exceptions when the uninitialized class has methods called on it.  In my pansy VM-based languages I always prefer to risk the null sentinal.

- Do not modify the implementation of UnreliableResource.  It's not always possible.

- Try to make the solution something that could, in principle, be placed into Phobos and reused without a lot of refactoring in the original code.

...

Now I will think about this a bit...

This reminds me a lot of algebraic data types.  I kind of want to say something like:
auto res = empty | UnreliableResource;

and then unwrap it:

	...
	thisCantBreakAnymore(res);
}

void thisCantBreakAnymore(UnreliableResource res)
{
	res.doTheThing();
}

void thisCantBreakAnymore(empty)
{
	doSomethingUsefulThatCanHappenWhenResIsNotAvailable();
	writefln("Couldn't find the resource thingy.");
	writefln("Resetting the m-rotor.  (NOOoooo!)");
}


I'm not absolutely sure I'd want to go that path though, and since D is unlikely to do any of those things, I just want to be able to catch an exception if the sentinel value tries to have the "doTheThing()" method called on it.

I can maybe see invariants being used for this:

class UnreliableResource
{
	bool initialized = false;

	invariant
	{
		if (!initialized)
			throw new Exception("Not initialized.");
	}

	void initialize(string sourceFile)
	{
		...
	}

	void initialize(uint userId)
	{
		...
	}

	void doTheThing() {...}
}

But as I think about it, this approach already has a lot of problems:

- It violates the condition that UnreliableResource shouldn't be modified to solve the problem.  Sometimes the class in question is upstream or otherwise not available for modification.

- I have to add this stupid boilerplate to every class.

- There could be a mixin template to ease the boilerplate, but the D spec states that there can be only one invariant in a class.  Using such a mixin would nix my ability to have an invariant for other things.

- Calling initialize(...) would violate the invariant.  It can't be initialized in the constructor because we need to be able to have the instance exist temporarily in a state where it is constructed from a nullary do-nothing constructor and remains uninitialized until a beneficial codepath initializes it properly.

- It will not be present in release mode.  This could be a deal-breaker in some cases.

- Using this means that instances of UnreliableResource should just never be null, and thus I am required to do an allocation even when the program will take codepaths that don't actually use the class.  I'm usually not concerned too much with premature optimization, but allocations are probably a nasty thing to sprinkle about unnecessarily.



Maybe a proxy struct with opDispatch and such could be used to get around these limitations?
Ex usage: Initializable!(UnreliableResource) res;
March 08, 2012
On 03/07/2012 02:09 PM, Jonathan M Davis wrote:
> On Wednesday, March 07, 2012 07:55:35 H. S. Teoh wrote:
>> It's not that the null pointer itself corrupts memory. It's that the
>> null pointer is a sign that something may have corrupted memory *before*
>> you got to that point.
>>
>> The point is, it's impossible to tell whether the null pointer was
>> merely the result of forgetting to initialize something, or it's a
>> symptom of a far more sinister problem. The source of the problem could
>> potentially be very far away, in unrelated code, and only when you tried
>> to access the pointer, you discover that something is wrong.
>>
>> At that point, it may very well be the case that the null pointer isn't
>> just a benign uninitialized pointer, but the result of a memory
>> corruption, perhaps an exploit in the process of taking over your
>> application, or some internal consistency error that is in the process
>> of destroying user data. Trying to continue is a bad idea, since you'd
>> be letting the exploit take over, or allowing user data to get even more
>> corrupted than it already is.
>
> Also, while D does much more to protect you from stuff like memory corruption
> than C/C++ does, it's still a systems language. Stuff like that can definitely
> happen. If you're writing primarily in SafeD, then it's very much minimized,
> but it's not necessarily eliminated. All it takes is a bug in @system code
> which could corrupt memory, and voila, you have corrupted memory, and an @safe
> function could get a segfault even though it's correct code. It's likely to be
> a very rare occurrence, but it's possible. A since when you get a segfault,
> you can't know what caused it, you have to assume that it could have been
> caused by one of the nastier possibilites rather than a relatively benign one.
>
> And since ultimately, your program should be checking for null before
> derefencing a variable in any case where it could be null, segfaulting due to
> dereferencing a null pointer is a program bug which should be caught in
> testing - like assertions in general are - rather than having the program
> attempt to recover from it. And if you do _that_, the odds of a segfault being
> due to something very nasty just go up, making it that much more of a bad idea
> to try and recover from one.
>
> - Jonathan M Davis

I can see where you're coming from now.

As I mentioned in another post, my lack of consideration for this indicator of memory corruption is probably a reflection of my bias towards VM'd languages.

I still don't buy the whole "it's a program bug that should be caught in testing".  I mean... true, but sometimes it isn't.  Especially since testing and assertions can never be %100 thorough.  What then?  Sorry, enjoy your suffering?

At that point I would like to have a better way to do sentinel values. I'd at least like to get an exception of some kind if I try to access a value that /shouldn't/ be there (as opposed to something that /should/ be there but /isn't/).

Combine that with sandboxing and I might just be satisfied for the time being.

See my reply to Steve for more details.  It's the one that starts like this:

> Example?
>
> Here, if you want, I'll start with a typical case.  Please make it right.
>
> class UnreliableResource
> {
>     this(string sourceFile) {...}
>     this(uint userId) {...}
>     void doTheThing() {...}
> }
March 08, 2012
On 03/07/2012 07:39 PM, Timon Gehr wrote:
> On 03/08/2012 01:24 AM, Chad J wrote:
>
>> Though, more to the point:
>> I would probably forbid "Foo foo = new Foo(this);". The design that
>> leads to this is creating circular dependencies, which is usually bad to
>> begin with.
>
> Circular object references are often justified.
>
>> Would we lose much of value?
>
> Well this would amount to forbidding escaping an object from its
> constructor, as well as forbidding calling any member functions from the
> constructor. Also, if you *need* to create a circular structure, you'd
> have to use sentinel objects. Those are worse than null.
>

OK, that does sound unusually harsh.
March 08, 2012
On Wednesday, March 07, 2012 20:44:59 Chad J wrote:
> On 03/07/2012 10:21 AM, Steven Schveighoffer wrote:
> > You can use sentinels other than null.
> > 
> > -Steve
> 
> Example?

Create an instance of the class which is immutable and represents an invalid value. You could check whether something is that value with the is operator, since there's only one of it. You could even make it a derived class and have all of its functions throw a particular exception if someone tries to call them.

- Jonathan M Davis
March 08, 2012
On 03/07/2012 10:08 PM, Jonathan M Davis wrote:
> On Wednesday, March 07, 2012 20:44:59 Chad J wrote:
>> On 03/07/2012 10:21 AM, Steven Schveighoffer wrote:
>>> You can use sentinels other than null.
>>>
>>> -Steve
>>
>> Example?
>
> Create an instance of the class which is immutable and represents an invalid
> value. You could check whether something is that value with the is operator,
> since there's only one of it. You could even make it a derived class and have
> all of its functions throw a particular exception if someone tries to call
> them.
>
> - Jonathan M Davis

Makes sense.  Awfully labor-intensive though.  Doesn't work well on classes that can't be easily altered.  That is, it violates this:
> - Do not modify the implementation of UnreliableResource.  It's not always possible.

But, maybe it can be turned it into a template and made to work for arrays too...
March 08, 2012
On Wednesday, March 07, 2012 22:36:50 Chad J wrote:
> On 03/07/2012 10:08 PM, Jonathan M Davis wrote:
> > On Wednesday, March 07, 2012 20:44:59 Chad J wrote:
> >> On 03/07/2012 10:21 AM, Steven Schveighoffer wrote:
> >>> You can use sentinels other than null.
> >>> 
> >>> -Steve
> >> 
> >> Example?
> > 
> > Create an instance of the class which is immutable and represents an invalid value. You could check whether something is that value with the is operator, since there's only one of it. You could even make it a derived class and have all of its functions throw a particular exception if someone tries to call them.
> > 
> > - Jonathan M Davis
> 
> Makes sense.  Awfully labor-intensive though.  Doesn't work well on
> 
> classes that can't be easily altered.  That is, it violates this:
> > - Do not modify the implementation of UnreliableResource.  It's not always possible.
> But, maybe it can be turned it into a template and made to work for arrays too...

Personally, I'd probably just use null. But if you want a sentinel other than null, it's quite feasible.

- Jonathan M Davis
March 08, 2012
On 03/07/2012 10:40 PM, Jonathan M Davis wrote:
> On Wednesday, March 07, 2012 22:36:50 Chad J wrote:
>> On 03/07/2012 10:08 PM, Jonathan M Davis wrote:
>>> On Wednesday, March 07, 2012 20:44:59 Chad J wrote:
>>>> On 03/07/2012 10:21 AM, Steven Schveighoffer wrote:
>>>>> You can use sentinels other than null.
>>>>>
>>>>> -Steve
>>>>
>>>> Example?
>>>
>>> Create an instance of the class which is immutable and represents an
>>> invalid value. You could check whether something is that value with the
>>> is operator, since there's only one of it. You could even make it a
>>> derived class and have all of its functions throw a particular exception
>>> if someone tries to call them.
>>>
>>> - Jonathan M Davis
>>
>> Makes sense.  Awfully labor-intensive though.  Doesn't work well on
>>
>> classes that can't be easily altered.  That is, it violates this:
>>> - Do not modify the implementation of UnreliableResource.  It's not always
>>> possible.
>> But, maybe it can be turned it into a template and made to work for
>> arrays too...
>
> Personally, I'd probably just use null. But if you want a sentinel other than
> null, it's quite feasible.
>
> - Jonathan M Davis

Wait, so you'd use null and then have the program unconditionally crash whenever you (inevitably) mess up sentinel logic?