Jump to page: 1 2
Thread overview
@safe and null dereferencing
Jul 27, 2017
Adrian Matoga
Jul 27, 2017
H. S. Teoh
Jul 27, 2017
Adrian Matoga
Jul 27, 2017
Marco Leise
Jul 27, 2017
H. S. Teoh
Jul 27, 2017
Moritz Maxeiner
Jul 27, 2017
Jonathan M Davis
Jul 27, 2017
Moritz Maxeiner
Jul 27, 2017
Moritz Maxeiner
Jul 27, 2017
H. S. Teoh
Jul 27, 2017
Moritz Maxeiner
Jul 27, 2017
H. S. Teoh
Jul 28, 2017
Moritz Maxeiner
Jul 27, 2017
Jonathan M Davis
July 27, 2017
Inside the thread for adding @safe/@trusted attributes to OS functions, it has come to light that @safe has conflicting rules.

For the definition of safe, it says:

"Safe functions are functions that are statically checked to exhibit no possibility of undefined behavior."

In the definition of @trusted, it says:

"Trusted functions are guaranteed by the programmer to not exhibit any undefined behavior if called by a safe function."

Yet, safe functions allow dereferencing of null pointers. Example:

void foo() @safe
{
   int *x;
   *x = 5;
}

There are various places on the forum where Walter argues that null pointer dereferencing should cause a segmentation fault (or crash) and is checked by the hardware/OS. Therefore, checking for null pointers before any dereferencing would be a waste of cycles.

However, there do exist places where dereferencing null may NOT cause a segmentation fault. For example, see this post by Moritz Maxeiner: https://forum.dlang.org/post/udkdqogtrvanhbotdoik@forum.dlang.org

In such cases, the compiled program can have no knowledge that the zero page is mapped somehow. There is no way to prevent it, or guarantee it during compilation.

It's also worth noting that C/C++ identifies null dereferencing as undefined behavior. So if we are being completely pedantic, we could say that no C/C++ code could be marked safe if there is a possibility that a null pointer would be dereferenced.

The way I see it, we have 2 options. First, we can disallow null pointer dereferencing in @safe code. This would be hugely disruptive. We may not have to instrument all @safe code with null checks, we could do it with flow analysis, and assuming that all pointers passed into a @safe function are not null. But it would likely disallow a lot of existing @safe code.

The other option is to explicitly state what happens in such cases. I would opt for this second option, as the likelihood of these situations is very low.

If we were to update the spec to take this into account, how would it look?

A possibility:

"@safe D does not support platforms or processes where dereferencing a null pointer does not crash the program. In such situations, dereferencing null is not defined, and @safe code will not prevent this from happening."

In terms of not marking C/C++ code safe, I am not convinced we need to go that far, but it's not as horrible a prospect as having to unmark D @safe code that might dereference null.

Thoughts?

-Steve
July 27, 2017
On Thursday, 27 July 2017 at 15:03:02 UTC, Steven Schveighoffer wrote:
> Inside the thread for adding @safe/@trusted attributes to OS functions, it has come to light that @safe has conflicting rules.
>
> For the definition of safe, it says:
>
> "Safe functions are functions that are statically checked to exhibit no possibility of undefined behavior."
>
> In the definition of @trusted, it says:
>
> "Trusted functions are guaranteed by the programmer to not exhibit any undefined behavior if called by a safe function."
>
> Yet, safe functions allow dereferencing of null pointers. Example:
>
> void foo() @safe
> {
>    int *x;
>    *x = 5;
> }
>
> There are various places on the forum where Walter argues that null pointer dereferencing should cause a segmentation fault (or crash) and is checked by the hardware/OS. Therefore, checking for null pointers before any dereferencing would be a waste of cycles.
>
> However, there do exist places where dereferencing null may NOT cause a segmentation fault. For example, see this post by Moritz Maxeiner: https://forum.dlang.org/post/udkdqogtrvanhbotdoik@forum.dlang.org
>
> In such cases, the compiled program can have no knowledge that the zero page is mapped somehow. There is no way to prevent it, or guarantee it during compilation.
>
> It's also worth noting that C/C++ identifies null dereferencing as undefined behavior. So if we are being completely pedantic, we could say that no C/C++ code could be marked safe if there is a possibility that a null pointer would be dereferenced.
>
> The way I see it, we have 2 options. First, we can disallow null pointer dereferencing in @safe code. This would be hugely disruptive. We may not have to instrument all @safe code with null checks, we could do it with flow analysis, and assuming that all pointers passed into a @safe function are not null. But it would likely disallow a lot of existing @safe code.
>
> The other option is to explicitly state what happens in such cases. I would opt for this second option, as the likelihood of these situations is very low.
>
> If we were to update the spec to take this into account, how would it look?
>
> A possibility:
>
> "@safe D does not support platforms or processes where dereferencing a null pointer does not crash the program. In such situations, dereferencing null is not defined, and @safe code will not prevent this from happening."
>
> In terms of not marking C/C++ code safe, I am not convinced we need to go that far, but it's not as horrible a prospect as having to unmark D @safe code that might dereference null.
>
> Thoughts?
>
> -Steve

Why can't we just make the compiler insert null checks in @safe code? We can afford bounds checking even in @system -O -release. C++ can afford null check upon executing an std::function. The pointer would most likely be in a register anyway, and the conditional branch would almost always not be taken, so the cost of that check would be barely measurable. Moreover, the compiler can elide the check e.g. if the access via pointer is made in a loop in which the pointer doesn't change. And if you prove that this tiny little check ruins performance of your code, there's @trusted to help you.

July 27, 2017
On Thu, Jul 27, 2017 at 05:33:22PM +0000, Adrian Matoga via Digitalmars-d wrote: [...]
> Why can't we just make the compiler insert null checks in @safe code?

Because not inserting null checks is a sacred cow we inherited from the C/C++ days of POOP (premature optimization oriented programming), and we are loathe to slaughter it.  :-P  We should seriously take some measurements of this in a large D project to determine whether or not inserting null checks actually makes a significant difference in performance.


> We can afford bounds checking even in @system -O -release. C++ can afford null check upon executing an std::function. The pointer would most likely be in a register anyway, and the conditional branch would almost always not be taken, so the cost of that check would be barely measurable. Moreover, the compiler can elide the check e.g. if the access via pointer is made in a loop in which the pointer doesn't change. And if you prove that this tiny little check ruins performance of your code, there's @trusted to help you.

The compiler can (and should, if it doesn't already) also propagate
non-nullness (ala VRP) as part of its dataflow analysis, so that once a
pointer has been established to be non-null, all subsequent checks of
that pointer can be elided (until the next assignment to the pointer, of
course).


T

-- 
Public parking: euphemism for paid parking. -- Flora
July 27, 2017
On Thu, Jul 27, 2017 at 11:03:02AM -0400, Steven Schveighoffer via Digitalmars-d wrote: [...]
> However, there do exist places where dereferencing null may NOT cause a segmentation fault. For example, see this post by Moritz Maxeiner: https://forum.dlang.org/post/udkdqogtrvanhbotdoik@forum.dlang.org
> 
> In such cases, the compiled program can have no knowledge that the zero page is mapped somehow. There is no way to prevent it, or guarantee it during compilation.
[...]

There is one flaw with Moritz's example: if the zero page is mapped somehow, that means 0 is potentially a valid address of a variable, and therefore checking for null is basically not only useless but wrong: a null check of the address of this variable will fail, yet the pointer is actually pointing at a valid address that just happens to be 0.  IOW, if the zero page is mapped, we're *already* screwed anyway, might as well just give up now.

One workaround for this is to redefine a null pointer as size_t.max (i.e., all bits set) instead of 0.  It's far less likely for a valid address to be size_t.max than for 0 to be a valid address in a system where the zero page is mappable (due to alignment issues, the only possibility is if you have a ubyte* pointing to data stored at the address size_t.max, whereas address 0 can be a valid address for any data type).  However, this will break basically *all* code out there in C/C++/D land, so I don't see it ever happening in this lifetime.


T

-- 
Those who've learned LaTeX swear by it. Those who are learning LaTeX swear at it. -- Pete Bleackley
July 27, 2017
On Thursday, 27 July 2017 at 17:43:17 UTC, H. S. Teoh wrote:
> On Thu, Jul 27, 2017 at 05:33:22PM +0000, Adrian Matoga via Digitalmars-d wrote: [...]
>> Why can't we just make the compiler insert null checks in @safe code?
>
> Because not inserting null checks is a sacred cow we inherited from the C/C++ days of POOP (premature optimization oriented programming), and we are loathe to slaughter it.  :-P  We should seriously take some measurements of this in a large D project to determine whether or not inserting null checks actually makes a significant difference in performance.

That's exactly what I thought.

July 27, 2017
On 7/27/17 1:33 PM, Adrian Matoga wrote:
> 
> Why can't we just make the compiler insert null checks in @safe code? We can afford bounds checking even in @system -O -release. C++ can afford null check upon executing an std::function. The pointer would most likely be in a register anyway, and the conditional branch would almost always not be taken, so the cost of that check would be barely measurable. Moreover, the compiler can elide the check e.g. if the access via pointer is made in a loop in which the pointer doesn't change. And if you prove that this tiny little check ruins performance of your code, there's @trusted to help you.

The rationale from Walter has always been that the hardware is already doing this for us. I was always under the assumption that D only supported environments/systems where this happens. But technically there's nothing in the spec to require it. And it does seem apparent that we handle this situation.

This question/query is asking whether we should amend the spec with (what I think is) Walter's view, or if we should change the compiler to insert the checks.

-Steve
July 27, 2017
On 7/27/17 1:52 PM, H. S. Teoh via Digitalmars-d wrote:
> On Thu, Jul 27, 2017 at 11:03:02AM -0400, Steven Schveighoffer via Digitalmars-d wrote:
> [...]
>> However, there do exist places where dereferencing null may NOT cause
>> a segmentation fault. For example, see this post by Moritz Maxeiner:
>> https://forum.dlang.org/post/udkdqogtrvanhbotdoik@forum.dlang.org
>>
>> In such cases, the compiled program can have no knowledge that the
>> zero page is mapped somehow. There is no way to prevent it, or
>> guarantee it during compilation.
> [...]
> 
> There is one flaw with Moritz's example: if the zero page is mapped
> somehow, that means 0 is potentially a valid address of a variable, and
> therefore checking for null is basically not only useless but wrong: a
> null check of the address of this variable will fail, yet the pointer is
> actually pointing at a valid address that just happens to be 0.  IOW, if
> the zero page is mapped, we're *already* screwed anyway, might as well
> just give up now.

Very true. You wouldn't want to store anything there as any @safe code could easily get a pointer to that data at any time!

Either way, the guarantees of @safe go out the window if dereferencing null is not a crashing error.

-Steve
July 27, 2017
On 7/27/17 2:09 PM, Steven Schveighoffer wrote:
> there's nothing in the spec to require it. And it does seem apparent that we handle this situation.

that we *should* handle this situation.

-Steve
July 27, 2017
On Thursday, July 27, 2017 11:03:02 Steven Schveighoffer via Digitalmars-d wrote:
> A possibility:
>
> "@safe D does not support platforms or processes where dereferencing a null pointer does not crash the program. In such situations, dereferencing null is not defined, and @safe code will not prevent this from happening."
>
> In terms of not marking C/C++ code safe, I am not convinced we need to go that far, but it's not as horrible a prospect as having to unmark D @safe code that might dereference null.

I see no problem whatsoever requiring that the platform segfaults when you dereference null. Anything even vaguely modern will do that. Adding extra null checks is therefore redundant and complicates the compiler for no gain whatsoever.

However, one issue that has been brought up from time to time and AFAIK has never really been addressed is that apparently if an object is large enough, when you access one of its members when the pointer is null, you won't get a segfault (I think that it was something like if the object was greater than a page in size). So, as I understand it, ludicrously large objects _could_ result in @safety problems with null pointers. This would not happen in normal code, but it can happen. And if we want @safe to make the guarantees that it claims, we really should either disallow such objects or insert null checks for them. For smaller objects though, what's the point? It buys us nothing if the hardware is already doing it, and the only hardware that wouldn't do it should be too old to matter at this point.

So, I say that we need to deal with the problem with ludicrously large objects, but beyond that, we should just change the spec, because inserting the checks buys us nothing.

- Jonathan M Davis

July 27, 2017
On 7/27/17 2:46 PM, Jonathan M Davis via Digitalmars-d wrote:
> 
> However, one issue that has been brought up from time to time and AFAIK has
> never really been addressed is that apparently if an object is large enough,
> when you access one of its members when the pointer is null, you won't get a
> segfault (I think that it was something like if the object was greater than
> a page in size). So, as I understand it, ludicrously large objects _could_
> result in @safety problems with null pointers. This would not happen in
> normal code, but it can happen. And if we want @safe to make the guarantees
> that it claims, we really should either disallow such objects or insert null
> checks for them. For smaller objects though, what's the point? It buys us
> nothing if the hardware is already doing it, and the only hardware that
> wouldn't do it should be too old to matter at this point.


Yes: https://issues.dlang.org/show_bug.cgi?id=5176

There is a way to "fix" this: any time you access an object field that goes outside the page size, do a null check on the base pointer.

-Steve
« First   ‹ Prev
1 2