Disallow null references in safe code? (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Disallow null references in safe code? (page 2)

February 01, 2014

Re: Disallow null references in safe code?

Posted by deadalnix
in reply to Andrei Alexandrescu

deadalnix

Posted in reply to Andrei Alexandrescu

On Saturday, 1 February 2014 at 20:09:13 UTC, Andrei Alexandrescu wrote:
> This has been discussed to death a number of times. A field access obj.field will use addressing with a constant offset. If that offset is larger than the lowest address allowed to the application, unsafety may occur.
>

That is one point. The other point is that the optimizer can remove a null check, and then a load, causing undefined behavior.

The solution to that is to prevent the optimizer from removing any load unless it can prove it has no side effect (cannot trap) which is certainly something we don't want to do (for manpower reason, we probably don't want to ditch exiting optimizers, as well as for the performance hit that this imply).

February 01, 2014

Re: Disallow null references in safe code?

Posted by Adam D. Ruppe
in reply to Andrei Alexandrescu

Adam D. Ruppe

Posted in reply to Andrei Alexandrescu

On Saturday, 1 February 2014 at 18:58:11 UTC, Andrei Alexandrescu wrote:
>     Widget w = fetchWidget();
>     if (w)
>     {
>         ... here w is automatically inferred as non-null ...
>     }

A library solution to this exists already:

Widget wn = fetchWidget();
if(auto w = wn.checkNull) {
   // w implicitly converts to NotNull!Widget
}

I've had some difficulty in const correctness with my implementation... but const correct is an entirely separate issue anyway.

It isn't quite the same as if(w) but meh, does that matter? The point of the static check is to make you think about it, and that's achieved here.

If we do want to get the if(w) to work, I'd really prefer to do that as a library solution too, since then we might be able to use it elsewhere as well. Maybe some kind of template that lets you do a scoped transformation of the type. idk really.

February 01, 2014

Re: Disallow null references in safe code?

Posted by Meta
in reply to Andrei Alexandrescu

Meta

Posted in reply to Andrei Alexandrescu

On Saturday, 1 February 2014 at 18:58:11 UTC, Andrei Alexandrescu wrote:
> On 1/31/14, 5:39 PM, Jonathan M Davis wrote:
>> Regardless, we're not adding anything with regards to non-nullable references
>> to the language itself [...]
>
> I think the sea is changing here. I've collected hard data that reveals null pointer dereference is a top problem for at least certain important categories of applications. Upon discussing that data with Walter, it became clear that no amount of belief to the contrary, personal anecdote, or rhetoric can stand against the data.
>
> It also became clear that a library solution would improve things but cannot compete with a language solution. The latter can do local flow-sensitive inference and require notations only for function signatures. Consider:
>
> class Widget { ... }
>
> void fun()
> {
>     // assume fetchWidget() may return null
>     Widget w = fetchWidget();
>     if (w)
>     {
>         ... here w is automatically inferred as non-null ...
>     }
> }
>
> Bottom line: a language change for non-null references is on the table. It will be an important focus of 2014.
>
>
> Andrei

That is excellent news.

February 02, 2014

Re: Disallow null references in safe code?

Posted by deadalnix
in reply to Jonathan M Davis

deadalnix

Posted in reply to Jonathan M Davis

On Saturday, 1 February 2014 at 20:03:40 UTC, Jonathan M Davis wrote:
> In the general case, you can only catch it at compile time if you disallow it
> completely, which is unnecessarily restrictive.

That is not accurate. The proposal here propose to make it @system instead of disallowing it completely. Even looser, I propose to make @system reference passing that can be null through interface (function calls/return mostly). So you can use null locally, where the compiler can check you do not dereference it, and ensure that data coming from somewhere else is not null, unless specified as such.

> Sure, some basic cases can be
> caught, but unless the code where the pointer/reference is defined is right
> next to the code where it's dereferenced, there's no way for the compiler to
> have any clue whether it's null or not. And yes, there's certainly code where
> it would make sense to use non-nullable references or pointers, because
> there's no need for them to be nullable, and having them be non-nullable
> avoids any risk of forgetting to initialize them, but that doesn't mean that
> nullable pointers and references aren't useful or that you can catch all
> instances of a null pointer or reference being dereferenced at compile time.
>
> - Jonathan M Davis

February 02, 2014

Re: Disallow null references in safe code?

Posted by Timon Gehr
in reply to Adam D. Ruppe

Timon Gehr

Posted in reply to Adam D. Ruppe

On 02/01/2014 11:05 PM, Adam D. Ruppe wrote:
>
> A library solution to this exists already:
>
> Widget wn = fetchWidget();
> if(auto w = wn.checkNull) {
>     // w implicitly converts to NotNull!Widget
> }
>
> I've had some difficulty in const correctness with my implementation...
> but const correct is an entirely separate issue anyway.
>
> It isn't quite the same as if(w) but meh, does that matter? The point of
> the static check is to make you think about it,  and that's achieved here.

The following illustrates what's not achieved here:

if(auto w = wn.checkNull){
    // ...
}else w.foo();

February 02, 2014

Re: Disallow null references in safe code?

Posted by Andrei Alexandrescu
in reply to deadalnix

Andrei Alexandrescu

Posted in reply to deadalnix

On 2/1/14, 1:40 PM, deadalnix wrote:
> On Saturday, 1 February 2014 at 20:09:13 UTC, Andrei Alexandrescu wrote:
>> This has been discussed to death a number of times. A field access
>> obj.field will use addressing with a constant offset. If that offset
>> is larger than the lowest address allowed to the application, unsafety
>> may occur.
>>
>
> That is one point. The other point is that the optimizer can remove a
> null check, and then a load, causing undefined behavior.

I don't understand this. Program crash is defined behavior.

Andrei

February 02, 2014

Re: Disallow null references in safe code?

Posted by deadalnix
in reply to Andrei Alexandrescu

deadalnix

Posted in reply to Andrei Alexandrescu

On Sunday, 2 February 2014 at 01:01:25 UTC, Andrei Alexandrescu wrote:
> On 2/1/14, 1:40 PM, deadalnix wrote:
>> On Saturday, 1 February 2014 at 20:09:13 UTC, Andrei Alexandrescu wrote:
>>> This has been discussed to death a number of times. A field access
>>> obj.field will use addressing with a constant offset. If that offset
>>> is larger than the lowest address allowed to the application, unsafety
>>> may occur.
>>>
>>
>> That is one point. The other point is that the optimizer can remove a
>> null check, and then a load, causing undefined behavior.
>
> I don't understand this. Program crash is defined behavior.
>
> Andrei

This has also been discussed. Let's consider the buggy code bellow:

void foo(int* ptr) {
  *ptr;
  if (ptr !is null) {
    // do stuff
  }

  // do other stuff
}

Note that the code presented above look quite stupid, but this is typically what you end up with if you call 2 function, one that does a null check and one that doesn't after inlining.

You would expect that the program segfault at the first line. But it is in fact undefined behavior. The optimizer can decide to remove the null check as ptr is dereferenced before so can't be null, and a later pass can remove the first deference as it is a dead load. Both GCC and LLVM optimizer can exhibit such behavior.

Dereferencing null is not guaranteed to segfault, unless we impose restriction on the optimizer such as do not optimize a load away unless you can prove it won't trap, which is almost impossible to know for the compiler. As a result, you won't be able to optimize most loads away.

Unless we are willing to impose such restriction on the optimizer (understand recode several passes of existing optimizer or do not rely on them, which is a huge manpower cost, and accept poorer performences) dereferencing null is undefined behavior and can't be guaranteed to crash.

February 02, 2014

Re: Disallow null references in safe code?

Posted by Jonathan M Davis
in reply to deadalnix

Jonathan M Davis

Posted in reply to deadalnix

On Sunday, February 02, 2014 00:40:16 deadalnix wrote:
> On Saturday, 1 February 2014 at 20:03:40 UTC, Jonathan M Davis
> 
> wrote:
> > In the general case, you can only catch it at compile time if
> > you disallow it
> > completely, which is unnecessarily restrictive.
> 
> That is not accurate. The proposal here propose to make it @system instead of disallowing it completely. Even looser, I propose to make @system reference passing that can be null through interface (function calls/return mostly). So you can use null locally, where the compiler can check you do not dereference it, and ensure that data coming from somewhere else is not null, unless specified as such.

Yes, and making pointers @system is too restrictive. For instance, AAs _rely_ on the ability to have nullable pointers. That's how their in operator works. The same is likely to go for any in operator that's looking to be efficient. Do you propose that the in operator be @system?

It's too restrictive for pointers or references to be @system and making them @system under any kind of normal circumstances would be a huge blow to @safe. We protect @safe code from problems with pointers and references by disallowing unsafe values from being assigned to them and by disallowing situations which can result in their valid, safe values becoming invalid, and thus unsafe. Passing a pointer to a function should not suddenly make it @system. And on top of the considerations for what @safe is supposed to be and do, it would be a real pain if an @safe function which initialized a pointer with a valid value then ended up with the function it passed that pointer to being unsafe just because that function doesn't know whether it was given a valid pointer or not, because @safe functions can't call @system functions. If we want to make pointers and references safe, it needs to be when their values are set, and that needs to include null as an @safe value.

Andrei raised _one_ case where it's possible for a null pointer to access memory that it shouldn't - where the object is over 4K (which is ridiculously large). If that can cause problems, then maybe having a pointer to an object like that could be considered @system, but having int* suddenly be @system because of potential null pointers pretty much completely shoots @safe in the foot with regards to pointers.

- Jonathan M Davis

February 02, 2014

Re: Disallow null references in safe code?

Posted by Andrei Alexandrescu
in reply to deadalnix

Andrei Alexandrescu

Posted in reply to deadalnix

On 2/1/14, 5:17 PM, deadalnix wrote:
> On Sunday, 2 February 2014 at 01:01:25 UTC, Andrei Alexandrescu wrote:
>> On 2/1/14, 1:40 PM, deadalnix wrote:
>>> On Saturday, 1 February 2014 at 20:09:13 UTC, Andrei Alexandrescu wrote:
>>>> This has been discussed to death a number of times. A field access
>>>> obj.field will use addressing with a constant offset. If that offset
>>>> is larger than the lowest address allowed to the application, unsafety
>>>> may occur.
>>>>
>>>
>>> That is one point. The other point is that the optimizer can remove a
>>> null check, and then a load, causing undefined behavior.
>>
>> I don't understand this. Program crash is defined behavior.
>>
>> Andrei
>
> This has also been discussed. Let's consider the buggy code bellow:
>
> void foo(int* ptr) {
>    *ptr;
>    if (ptr !is null) {
>      // do stuff
>    }
>
>    // do other stuff
> }
>
> Note that the code presented above look quite stupid, but this is
> typically what you end up with if you call 2 function, one that does a
> null check and one that doesn't after inlining.
>
> You would expect that the program segfault at the first line. But it is
> in fact undefined behavior. The optimizer can decide to remove the null
> check as ptr is dereferenced before so can't be null, and a later pass
> can remove the first deference as it is a dead load. Both GCC and LLVM
> optimizer can exhibit such behavior.

Do you have any pointers to substantiate that? I find such a behavior rather bizarre.

Andrei

February 02, 2014

Re: Disallow null references in safe code?

Posted by Jonathan M Davis
in reply to Andrei Alexandrescu

Jonathan M Davis

Posted in reply to Andrei Alexandrescu

On Saturday, February 01, 2014 12:09:10 Andrei Alexandrescu wrote:
> On 2/1/14, 2:14 AM, Jonathan M Davis wrote:
> > On Saturday, February 01, 2014 04:01:50 deadalnix wrote:
> >> Dereferencing it is unsafe unless you put runtime check.
> > 
> > How is it unsafe? It will segfault and kill your program, not corrupt memory. It can't even read any memory. It's a bug to dereference a null pointer or reference, but it's not unsafe, because it can't access _any_ memory, let alone memory that it's not supposed to be accessing, which is precisely what @safe is all about.
> 
> This has been discussed to death a number of times. A field access obj.field will use addressing with a constant offset. If that offset is larger than the lowest address allowed to the application, unsafety may occur.
> 
> The amount of low-address memory protected is OS-dependent. 4KB can virtually always be counted on. For fields placed beyond than that limit, a runtime test must be inserted. There are few enough 4KB objects out there to make this practically a non-issue. But the checks must be there.

Hmmm. I forgot about that. So, in essence, dereferencing null pointers is almost always perfectly safe but in rare, corner cases can be unsafe. At that point, we could either always insert runtime checks for pointers to such large types or we could mark all pointers of such types @system (that's not even vaguely acceptable in the general case, but it might be acceptable in a rare corner case like this). Or we could just disallow such types entirely, though it wouldn't surprise me if someone screamed over that. Runtime checks is probably the best solution, though with any of those solutions, I'd be a bit worried about there being bugs with the implementation, since we then end up with a rare, special case which is not well tested in real environments.

> >>   Which is stupid for something that can be verified at compile time.
> > 
> > In the general case, you can only catch it at compile time if you disallow it completely, which is unnecessarily restrictive. Sure, some basic cases can be caught, but unless the code where the pointer/reference is defined is right next to the code where it's dereferenced, there's no way for the compiler to have any clue whether it's null or not. And yes, there's certainly code where it would make sense to use non-nullable references or pointers, because there's no need for them to be nullable, and having them be non-nullable avoids any risk of forgetting to initialize them, but that doesn't mean that nullable pointers and references aren't useful or that you can catch all instances of a null pointer or reference being dereferenced at compile time.
> The Java community has a good experience with @Nullable: http://stackoverflow.com/questions/14076296/nullable-annotation-usage

Sure, and there are other things that the compiler can do to catch null dereferences (e.g. look at the first dereferencing of the pointer in the function that it's declared in and make sure that it was initialized or assigned a non-null value first), but the only way to catch all null dereferences at compile time would be to always know at compile time whether the pointer was null at the point that it's dereferenced, and that can't be done.

AFAIK, the only solution that guarantees that it catches all dereferences of null at compile time is a solution that disallows a pointer/reference from ever being null in the first place.

- Jonathan M Davis

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation