August 21, 2010
On 08/18/2010 05:49 PM, David Simcha wrote:
> While I was debugging std.range and std.algorithm I noticed that there's tons of inconsistency about when enforce() vs. assert() is used for checking input to range primitives. I think this needs to be made consistent, and there are good arguments for both.

As mentioned, I've studied this exchange, thought a lot about it, and also discussed it with Walter.

Clearly right now Phobos is inconsistent. Walter and other used asserts and contracts and other means etc. My approach since I started has been to sprinkle enforce() everywhere and worry about it later, and that later has come.

One problem with that approach is that, as Lars mentioned, there are 3-4 checks on a simple operation due to the use of higher-order ranges that wrap other ranges, like Stride!Chain!Whatever. This issue is not difficult to tackle: we can introduce in Phobos the notion that a range wrapping another range defers the checking to the wrapped range. That way the wrapper only uses assert() and essentially trusts that the other range protects its own integrity. Consider as an example Take's implementation of popFront:

     void popFront()
     {
         enforce(_maxAvailable > 0,
             "Attempting to popFront() past the end of a "
             ~ Take.stringof);
         original.popFront();
         --_maxAvailable;
     }

The enforce is redundant - we could decide that Take defers checking to original. So this code should be allowed:

     void popFront()
     {
         assert(_maxAvailable > 0, ...);
         original.popFront();
         --_maxAvailable;
     }

But (very careful) Take should still protect its _own_ integrity, assuming original also does. So the following code would be incorrect:

     void popFront()
     {
         assert(_maxAvailable > 0, ...);
         --_maxAvailable;
         original.popFront();
     }

If original.popFront throws, _maxAvailable has already been decremented so Take is in an inconsistent state. So rule #1 could be something like:

Rule #1: A higher-order entity in Phobos can generally assume that the lower-order entities it builds on protect their own integrity, and protect its own under that premise.

I am sure there would be fuzzy corners to this rule, but by and large it shold cut down on a lot of enforce calls. (We could and should replace them with asserts).

The second rule concerns safety. To my dismay, this program still compiles and runs "successfully" with -release:

import std.stdio;
void main(string[] args)
{
     writeln(args[10]);
}

It should throw, and it should only run with -noboundscheck enabled. That flag was defined by Walter under my pressure, and I think it's misnamed - it should really be -safe.

Safety is one matter I am extremely rigid about. There is a point where we must draw a line in the sand regarding checks, and safety is it. Since there is a simple and clear definition of it, we can't redefine it as we'd prefer. So I'd say:

Rule #2: All checks protecting memory safety must be done with enforce() or rely on built-in array indexing (which in turn obeys -noboundscheck).

Finally, we have several places where we call APIs such as the I/O code etc. For those I think it's fine to mandate use of enforce() for several reasons, a simple one being that such calls would not be slowed down considerably by the extra test. So:

Rule #3: All external API calls must be protected by enforce().

Of course there are many case-by-case judgments, but I think having such guidelines in mind should be helpful. Thoughts?


Andrei
August 21, 2010
  God damn it, for some strange reason whenever I reply to Andrei's
posts (and only to Andrei's and not anyone else's) using Thunderbird
(but not the gmail web interface) the results only go to Andrei.  Here
's my reply.  Andrei, I apologize for yet another double send.

  On 8/21/2010 11:18 PM, Andrei Alexandrescu wrote:
> On 08/18/2010 05:49 PM, David Simcha wrote:
>> While I was debugging std.range and std.algorithm I noticed that there's tons of inconsistency about when enforce() vs. assert() is used for checking input to range primitives. I think this needs to be made consistent, and there are good arguments for both.
>
> As mentioned, I've studied this exchange, thought a lot about it, and also discussed it with Walter.
>
> Clearly right now Phobos is inconsistent. Walter and other used asserts and contracts and other means etc. My approach since I started has been to sprinkle enforce() everywhere and worry about it later, and that later has come.
>
> One problem with that approach is that, as Lars mentioned, there are 3-4 checks on a simple operation due to the use of higher-order ranges that wrap other ranges, like Stride!Chain!Whatever. This issue is not difficult to tackle: we can introduce in Phobos the notion that a range wrapping another range defers the checking to the wrapped range. That way the wrapper only uses assert() and essentially trusts that the other range protects its own integrity. Consider as an example Take's implementation of popFront:
>
>     void popFront()
>     {
>         enforce(_maxAvailable > 0,
>             "Attempting to popFront() past the end of a "
>             ~ Take.stringof);
>         original.popFront();
>         --_maxAvailable;
>     }
>
> The enforce is redundant - we could decide that Take defers checking to original. So this code should be allowed:
>
>     void popFront()
>     {
>         assert(_maxAvailable > 0, ...);
>         original.popFront();
>         --_maxAvailable;
>     }
>
> But (very careful) Take should still protect its _own_ integrity, assuming original also does. So the following code would be incorrect:
>
>     void popFront()
>     {
>         assert(_maxAvailable > 0, ...);
>         --_maxAvailable;
>         original.popFront();
>     }
>
> If original.popFront throws, _maxAvailable has already been decremented so Take is in an inconsistent state. So rule #1 could be something like:
>
> Rule #1: A higher-order entity in Phobos can generally assume that the lower-order entities it builds on protect their own integrity, and protect its own under that premise.
>
> I am sure there would be fuzzy corners to this rule, but by and large it shold cut down on a lot of enforce calls. (We could and should replace them with asserts).

Sounds mostly good.  One thing that still bugs me, though, is the idea of using enforce() in ranges like Iota that are supposed to be super cheap and don't risk memory corruption even if they're in some crazy invalid state.  Here's a benchmark:

import std.range, std.perf, std.stdio;

void main() {
     auto pc = new PerformanceCounter;
     pc.start;

     uint num;

     foreach(elem; iota(100_000_000)) {
         num += elem;
     }

     pc.stop;
     writeln(pc.milliseconds, " milliseconds");

     // Produce an externally visible effect to keep everything from being
     // optimized away.
     writeln("Ignore this:  ", num);
}

Iota currently doesn't do enforce(!empty) in front and popFront.  The benchmark takes ~264 milliseconds on my computer.  If I add enforce(!empty) in front and popFront, this goes up to 977 milliseconds.  If instead of enforce() I use if(empty) throw new Exception("") it still goes up to 683 milliseconds.  If I replace Iota with the following (virtual function using) range, I get 681 milliseconds.

class Foo {
     uint num;

     @property bool empty() {
         return num >= 100_000_000;
     }

     @property uint front() {
         return num;
     }

     void popFront() {
         num++;
     }
}

All numbers are for -O -inline -release builds.

>
> The second rule concerns safety. To my dismay, this program still compiles and runs "successfully" with -release:
>
> import std.stdio;
> void main(string[] args)
> {
>     writeln(args[10]);
> }
>
> It should throw, and it should only run with -noboundscheck enabled. That flag was defined by Walter under my pressure, and I think it's misnamed - it should really be -safe.
>
> Safety is one matter I am extremely rigid about. There is a point where we must draw a line in the sand regarding checks, and safety is it. Since there is a simple and clear definition of it, we can't redefine it as we'd prefer. So I'd say:
>
> Rule #2: All checks protecting memory safety must be done with enforce() or rely on built-in array indexing (which in turn obeys -noboundscheck).

I tend to agree.  Memory corruption issues are exceptionally hard to debug and most things will use array indexing in their implementation rather than raw pointers anyhow, so the checks can be disabled if necessary.  Does dereferencing a null pointer count as a memory safety issue?

>
> Finally, we have several places where we call APIs such as the I/O code etc. For those I think it's fine to mandate use of enforce() for several reasons, a simple one being that such calls would not be slowed down considerably by the extra test. So:
>
> Rule #3: All external API calls must be protected by enforce().
>
> Of course there are many case-by-case judgments, but I think having such guidelines in mind should be helpful. Thoughts?

Agreed.  Once you're bringing in some expensive external API, small efficiencies aren't very important.
August 22, 2010
On 08/21/2010 10:54 PM, David Simcha wrote:
> Sounds mostly good. One thing that still bugs me, though, is the idea of using enforce() in ranges like Iota that are supposed to be super cheap and don't risk memory corruption even if they're in some crazy invalid state.
[snip]

Thanks for taking the time to do measurements. Rules #1-#3 leave cases like this uncovered, particularly because Rule #1 does not define "integrity". Clearly integrity should involve memory safety (which as you mentioned is not an issue with Iota).

I think this is where case-by-case judgment could help. In the case of Iota it's not difficult to define behavior even in case its invariant is broken. Testing also confirms that compulsive checking affects performance significantly. So then assert() is fine there. Generally, whenever loss of performance is significant due to checking, we could relegate checking to assert().

Regarding the null pointer, I don't think it's a memory safety issue on systems with memory protection - it's a hard error.


Andrei
August 22, 2010
Le 2010-08-22 ? 9:42, Andrei Alexandrescu a ?crit :

> Regarding the null pointer, I don't think it's a memory safety issue on systems with memory protection - it's a hard error.

That's generally correct. Though there's still a loophole:

	<http://d.puremagic.com/issues/show_bug.cgi?id=3677>

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



August 22, 2010
I think it depends on how the compiler dereferences fields of an object. GCC on OSX if I have a null ptr to a struct and I try to access a field through it I'll get the data at 0x4 or wherever and the code will keep on running.  Actual segfaults happen depressingly rarely.

Sent from my iPhone

On Aug 22, 2010, at 9:42 AM, Andrei Alexandrescu <andrei at erdani.com> wrote:

> On 08/21/2010 10:54 PM, David Simcha wrote:
>> Sounds mostly good. One thing that still bugs me, though, is the idea of using enforce() in ranges like Iota that are supposed to be super cheap and don't risk memory corruption even if they're in some crazy invalid state.
> [snip]
> 
> Thanks for taking the time to do measurements. Rules #1-#3 leave cases like this uncovered, particularly because Rule #1 does not define "integrity". Clearly integrity should involve memory safety (which as you mentioned is not an issue with Iota).
> 
> I think this is where case-by-case judgment could help. In the case of Iota it's not difficult to define behavior even in case its invariant is broken. Testing also confirms that compulsive checking affects performance significantly. So then assert() is fine there. Generally, whenever loss of performance is significant due to checking, we could relegate checking to assert().
> 
> Regarding the null pointer, I don't think it's a memory safety issue on systems with memory protection - it's a hard error.
> 
> 
> Andrei
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
August 22, 2010
Java solves that problem by limiting object sizes to 64KB. I suggest we do the same.

Andrei

On 08/22/2010 10:03 AM, Michel Fortin wrote:
> Le 2010-08-22 ? 9:42, Andrei Alexandrescu a ?crit :
>
>> Regarding the null pointer, I don't think it's a memory safety issue on systems with memory protection - it's a hard error.
>
> That's generally correct. Though there's still a loophole:
>
> 	<http://d.puremagic.com/issues/show_bug.cgi?id=3677>
>
August 22, 2010
Yeah, I've meant to do that but never got around to it yet.

Andrei Alexandrescu wrote:
> Java solves that problem by limiting object sizes to 64KB. I suggest we do the same.
>
> Andrei
>
> On 08/22/2010 10:03 AM, Michel Fortin wrote:
>> Le 2010-08-22 ? 9:42, Andrei Alexandrescu a ?crit :
>>
>>> Regarding the null pointer, I don't think it's a memory safety issue on systems with memory protection - it's a hard error.
>>
>> That's generally correct. Though there's still a loophole:
>>
>>     <http://d.puremagic.com/issues/show_bug.cgi?id=3677>
>>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>
>
August 22, 2010

Sean Kelly wrote:
> I think it depends on how the compiler dereferences fields of an object. GCC on OSX if I have a null ptr to a struct and I try to access a field through it I'll get the data at 0x4 or wherever and the code will keep on running.  Actual segfaults happen depressingly rarely.
>
>
> 

On Windows, the entire first 64K of address space is reserved by the OS, and any attempt to access it results in a seg fault. I'm very surprised that OS X would not do the same.
August 22, 2010
Le 2010-08-22 ? 17:07, Walter Bright a ?crit :

> Sean Kelly wrote:
>> I think it depends on how the compiler dereferences fields of an object. GCC on OSX if I have a null ptr to a struct and I try to access a field through it I'll get the data at 0x4 or wherever and the code will keep on running.  Actual segfaults happen depressingly rarely.
> 
> On Windows, the entire first 64K of address space is reserved by the OS, and any attempt to access it results in a seg fault. I'm very surprised that OS X would not do the same.

A quick test with DMD on OS X reveals that reading anything below the first 4K of the address space causes a segfault.

For instance, reading member b of a null pointer to this struct will not cause a segfault:

	struct S {
		ubyte[4*1024] a;
		ubyte b;
	}

	S* s = null;
	ubyte c;

	@safe void main() {
		c = s.b;
	}

Remove one byte from array a and you have a segfault.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



August 22, 2010
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100822/6b53511f/attachment.html>