Disallow null references in safe code? (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Disallow null references in safe code? (page 3)

February 02, 2014

Re: Disallow null references in safe code?

Posted by deadalnix
in reply to Andrei Alexandrescu

deadalnix

Posted in reply to Andrei Alexandrescu

On Sunday, 2 February 2014 at 03:27:21 UTC, Andrei Alexandrescu wrote:
> On 2/1/14, 5:17 PM, deadalnix wrote:
>> On Sunday, 2 February 2014 at 01:01:25 UTC, Andrei Alexandrescu wrote:
>>> On 2/1/14, 1:40 PM, deadalnix wrote:
>>>> On Saturday, 1 February 2014 at 20:09:13 UTC, Andrei Alexandrescu wrote:
>>>>> This has been discussed to death a number of times. A field access
>>>>> obj.field will use addressing with a constant offset. If that offset
>>>>> is larger than the lowest address allowed to the application, unsafety
>>>>> may occur.
>>>>>
>>>>
>>>> That is one point. The other point is that the optimizer can remove a
>>>> null check, and then a load, causing undefined behavior.
>>>
>>> I don't understand this. Program crash is defined behavior.
>>>
>>> Andrei
>>
>> This has also been discussed. Let's consider the buggy code bellow:
>>
>> void foo(int* ptr) {
>>   *ptr;
>>   if (ptr !is null) {
>>     // do stuff
>>   }
>>
>>   // do other stuff
>> }
>>
>> Note that the code presented above look quite stupid, but this is
>> typically what you end up with if you call 2 function, one that does a
>> null check and one that doesn't after inlining.
>>
>> You would expect that the program segfault at the first line. But it is
>> in fact undefined behavior. The optimizer can decide to remove the null
>> check as ptr is dereferenced before so can't be null, and a later pass
>> can remove the first deference as it is a dead load. Both GCC and LLVM
>> optimizer can exhibit such behavior.
>
> Do you have any pointers to substantiate that? I find such a behavior rather bizarre.
>
> Andrei

http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html

February 02, 2014

Re: Disallow null references in safe code?

Posted by deadalnix
in reply to deadalnix

deadalnix

Posted in reply to deadalnix

On Sunday, 2 February 2014 at 03:35:25 UTC, deadalnix wrote:
>> Do you have any pointers to substantiate that? I find such a behavior rather bizarre.
>>
>> Andrei
>
> http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html

Also it has to be noted that this very phenomena caused a security flaw in the linux kernel recently, but can't find the link. Anyway, that isn't just a theoretical possibility. However rare, it happens in practice.

February 02, 2014

Re: Disallow null references in safe code?

Posted by Jonathan M Davis
in reply to Andrei Alexandrescu

Jonathan M Davis

Posted in reply to Andrei Alexandrescu

On Saturday, February 01, 2014 10:58:12 Andrei Alexandrescu wrote:
> On 1/31/14, 5:39 PM, Jonathan M Davis wrote:
> > Regardless, we're not adding anything with regards to non-nullable references to the language itself [...]
> 
> I think the sea is changing here. I've collected hard data that reveals null pointer dereference is a top problem for at least certain important categories of applications. Upon discussing that data with Walter, it became clear that no amount of belief to the contrary, personal anecdote, or rhetoric can stand against the data.

I'm not sure how I feel about that, particularly since I haven't seen such data myself. My natural reaction when people complain about null pointer problems is that they're sloppy programmers (which isn't necessarily fair, but that's my natural reaction). I pretty much never have problems with null pointers and have never understood why so many people complain about them. Maybe I'm just more organized in how I deal with null than many folks are. I'm not against having non-nullable pointers/references so long as we still have the nullable ones, and a lot of people (or at least a number of very vocal people) want them, but I'm not particularly enthusiastic about the idea either.

> It also became clear that a library solution would improve things but cannot compete with a language solution. The latter can do local flow-sensitive inference and require notations only for function signatures. Consider:
> 
> class Widget { ... }
> 
> void fun()
> {
>      // assume fetchWidget() may return null
>      Widget w = fetchWidget();
>      if (w)
>      {
>          ... here w is automatically inferred as non-null ...
>      }
> }

Yeah, I think that it's always been clear that a library solution would be inferior to language one. It's just that we could get a large portion of the benefits with a library solution without having to make any language changes. It essentially comes down to a question of whether the additional benefits of having non-nullable references in the language are worth the additional costs that that incurs.

> Bottom line: a language change for non-null references is on the table. It will be an important focus of 2014.

Well, I guess that I'll just have to wait and see what gets proposed.

- Jonathan M Davis

February 02, 2014

Re: Disallow null references in safe code?

Posted by Andrei Alexandrescu
in reply to Jonathan M Davis

Andrei Alexandrescu

Posted in reply to Jonathan M Davis

On 2/1/14, 7:29 PM, Jonathan M Davis wrote:
> On Saturday, February 01, 2014 12:09:10 Andrei Alexandrescu wrote:
>> On 2/1/14, 2:14 AM, Jonathan M Davis wrote:
>>> On Saturday, February 01, 2014 04:01:50 deadalnix wrote:
>>>> Dereferencing it is unsafe unless you put runtime check.
>>>
>>> How is it unsafe? It will segfault and kill your program, not corrupt
>>> memory. It can't even read any memory. It's a bug to dereference a null
>>> pointer or reference, but it's not unsafe, because it can't access _any_
>>> memory, let alone memory that it's not supposed to be accessing, which is
>>> precisely what @safe is all about.
>>
>> This has been discussed to death a number of times. A field access
>> obj.field will use addressing with a constant offset. If that offset is
>> larger than the lowest address allowed to the application, unsafety may
>> occur.
>>
>> The amount of low-address memory protected is OS-dependent. 4KB can
>> virtually always be counted on. For fields placed beyond than that
>> limit, a runtime test must be inserted. There are few enough 4KB objects
>> out there to make this practically a non-issue. But the checks must be
>> there.
>
> Hmmm. I forgot about that. So, in essence, dereferencing null pointers is
> almost always perfectly safe but in rare, corner cases can be unsafe. At that
> point, we could either always insert runtime checks for pointers to such large
> types or we could mark all pointers of such types @system (that's not even
> vaguely acceptable in the general case, but it might be acceptable in a rare
> corner case like this). Or we could just disallow such types entirely, though
> it wouldn't surprise me if someone screamed over that. Runtime checks is
> probably the best solution, though with any of those solutions, I'd be a bit
> worried about there being bugs with the implementation, since we then end up
> with a rare, special case which is not well tested in real environments.
>
>>>>    Which is stupid for something that can be verified at compile time.
>>>
>>> In the general case, you can only catch it at compile time if you disallow
>>> it completely, which is unnecessarily restrictive. Sure, some basic cases
>>> can be caught, but unless the code where the pointer/reference is defined
>>> is right next to the code where it's dereferenced, there's no way for the
>>> compiler to have any clue whether it's null or not. And yes, there's
>>> certainly code where it would make sense to use non-nullable references
>>> or pointers, because there's no need for them to be nullable, and having
>>> them be non-nullable avoids any risk of forgetting to initialize them,
>>> but that doesn't mean that nullable pointers and references aren't useful
>>> or that you can catch all instances of a null pointer or reference being
>>> dereferenced at compile time.
>> The Java community has a good experience with @Nullable:
>> http://stackoverflow.com/questions/14076296/nullable-annotation-usage
>
> Sure, and there are other things that the compiler can do to catch null
> dereferences (e.g. look at the first dereferencing of the pointer in the
> function that it's declared in and make sure that it was initialized or
> assigned a non-null value first), but the only way to catch all null
> dereferences at compile time would be to always know at compile time whether
> the pointer was null at the point that it's dereferenced, and that can't be
> done.

What are you talking about? That has been done.

> AFAIK, the only solution that guarantees that it catches all dereferences of
> null at compile time is a solution that disallows a pointer/reference from
> ever being null in the first place.

Have you read through that link?


Andrei

February 02, 2014

Re: Disallow null references in safe code?

Posted by Andrei Alexandrescu
in reply to deadalnix

Andrei Alexandrescu

Posted in reply to deadalnix

On 2/1/14, 7:35 PM, deadalnix wrote:
> http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html

Whoa, thanks. So the compiler figures null pointer dereference in C is undefined behavior, which means the entire program could do whatever if that does happen.

Andrei

February 02, 2014

Re: Disallow null references in safe code?

Posted by Adam D. Ruppe
in reply to Timon Gehr

Adam D. Ruppe

Posted in reply to Timon Gehr

On Sunday, 2 February 2014 at 00:50:28 UTC, Timon Gehr wrote:
> if(auto w = wn.checkNull){
>     // ...
> }else w.foo();

(presumably you meant wn.foo, as w would be out of scope in the else branch)

This is more a question of the default than the library though, as all other things being equal, changing the language so if(w) { /* w changes type */ }, still lets w.foo compile.

I have a radical idea about Foo, not sure if people would like it though... I'd like like the naked type to mean "borrowed, not null".

Then, Nullable!Foo is semi-magical in implementation, but semantically is basically struct Nullable(T) { bool isNull; T object; }.

It would *not* be a specialization of Foo; it does not implicitly convert and is not usable out of the box. You'd be forced to convert by checking with the if(w) kind of thing.

The magic is that Nullable!Foo is implemented as a plain old pointer whenever possible instead of a pointer+bool pair.

Probably nothing special there, all stuff you've heard of before and it is ground that C# IMO has covered fairly well.

The other thing I want though is to make it borrowed by default. A borrowed non-immutable object cannot be stored except in the scope of the owner. To store it, you have to assert ownership somehow: is it owned by the GC? Reference counted? Scoped, RAII style? Or DIY C style? You have to mark it one way or another. (This also applies to slices btw)

I say non-immutable since immutability means it never changes, which means it is never freed, which means ownership is irrelevant. In practice, this means all immutable objects are either references to static data or managed by the GC (which provides the illusion of an infinite lifetime).

This would be a radical change by default... but then again, so would not-null by default, so hey do it together I say.

February 02, 2014

Re: Disallow null references in safe code?

Posted by Meta
in reply to Adam D. Ruppe

Meta

Posted in reply to Adam D. Ruppe

On Saturday, 1 February 2014 at 22:05:21 UTC, Adam D. Ruppe wrote:
> On Saturday, 1 February 2014 at 18:58:11 UTC, Andrei Alexandrescu wrote:
>>    Widget w = fetchWidget();
>>    if (w)
>>    {
>>        ... here w is automatically inferred as non-null ...
>>    }
>
> A library solution to this exists already:
>
> Widget wn = fetchWidget();
> if(auto w = wn.checkNull) {
>    // w implicitly converts to NotNull!Widget
> }
>
> I've had some difficulty in const correctness with my implementation... but const correct is an entirely separate issue anyway.
>
> It isn't quite the same as if(w) but meh, does that matter? The point of the static check is to make you think about it, and that's achieved here.
>
>
> If we do want to get the if(w) to work, I'd really prefer to do that as a library solution too, since then we might be able to use it elsewhere as well. Maybe some kind of template that lets you do a scoped transformation of the type. idk really.

This is a common staple of languages with more advanced type systems that support flow-sensitive typing. Ceylon is an upcoming language that I'm quite excited about that features this.

"Typesafe null and flow-sensitive typing" section
http://www.ceylon-lang.org/documentation/1.0/introduction/

February 02, 2014

Re: Disallow null references in safe code?

Posted by Jonathan M Davis
in reply to Andrei Alexandrescu

Jonathan M Davis

Posted in reply to Andrei Alexandrescu

On Saturday, February 01, 2014 19:40:26 Andrei Alexandrescu wrote:
> On 2/1/14, 7:29 PM, Jonathan M Davis wrote:
> > Sure, and there are other things that the compiler can do to catch null dereferences (e.g. look at the first dereferencing of the pointer in the function that it's declared in and make sure that it was initialized or assigned a non-null value first), but the only way to catch all null dereferences at compile time would be to always know at compile time whether the pointer was null at the point that it's dereferenced, and that can't be done.
> 
> What are you talking about? That has been done.

How so? All that's required is something like

auto ptr = foo();

if(bar()) //runtime dependent result
    ptr = null;

ptr.baz();

and there's no way that the compiler could know whether this code is going to dereference null or not. Sure, it could flag it as possible and thus buggy, but it can't know for sure. And an even simpler case is something like

auto ptr = foo();
ptr.baz();

The compiler isn't going to know whether foo returns null or not, and whether it returns null could depend on the state at runtime, making it so that it can't possibly know whether foo will return null or not.

And even if foo were as simple as

Bar* foo() { return null; }

the compiler would have to actually look at the body of foo to know that its return value was going to be null, and compilers that use the C compilation model don't normally do that, because they often don't have the body of foo available.

So, maybe I'm missing something here, and maybe we're just not quite talking about the same thing, but it's my understanding that it is not possible for the compiler to always know whether a pointer or reference is going to be null when it's dereferenced unless it's actually illegal for that pointer or reference type to be null (be it because of its type is non-nullable or an annotation marks it as such - which effectively then changes its type to non- nullable). As soon as you're dealing with an actual, nullable pointer or reference, it's trivial to make it so that the compiler doesn't have any idea what the pointer's value could be and thus can't know whether it's null or not.

Sections of code which include the initialization or assignment of pointers which are given either new or null certainly can be examined by the compiler so that it can determine whether any dereferencing of that pointer that occurs could be dereferencing null, but something as simple as making it so that the pointer was initialized from the return value of  a function makes it so that it can't.

> > AFAIK, the only solution that guarantees that it catches all dereferences of null at compile time is a solution that disallows a pointer/reference from ever being null in the first place.
> 
> Have you read through that link?

Yes. And all I see is that @Nullable makes it so that some frameworks won't accept null without @Nullable, which effectively turns a normal Java reference into a non-nullable reference. I don't see anything indicating that the compiler will have any clue whether an @Nullable reference is null or not at compile time.

My point is that when a nullable pointer or reference is dereferenced, it's impossible to _always_ be able to determine at compile time whether it's going to dereference null. Some of the time you can, and using non-nullable, references certainly makes it so that you can, because then it's illegal for them to be null, but once a reference is nullable, unless the compiler knows the entire code path that that pointer's value goes through, and that code path is guaranteed to result in a null pointer or it's guaranteed to _not_ result in a null pointer, the compiler can't know whether the pointer is going to be null or not at the point that it's dereferenced. And you can't even necessarily do that with full program analysis due to the values potentially depending on runtime state. You could determine whether it _could_ be null if you had full program analysis, but you can't determine for certain that it will or won't be - not in all circumstances. And without full program analysis, you can't even do that in most cases, not with nullable references.

But maybe I'm just totally missing something here. I suspect though that we're just not communicating clearly enough for us to quite get what the other is saying.

- Jonathan M Davis

February 02, 2014

Re: Disallow null references in safe code?

Posted by Jonathan M Davis
in reply to Andrei Alexandrescu

Jonathan M Davis

Posted in reply to Andrei Alexandrescu

On Saturday, February 01, 2014 19:44:44 Andrei Alexandrescu wrote:
> On 2/1/14, 7:35 PM, deadalnix wrote:
> > http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html
> 
> Whoa, thanks. So the compiler figures null pointer dereference in C is undefined behavior, which means the entire program could do whatever if that does happen.

I think that article clearly illustrates that some of Walter's decisions in D with regards to fully defining some stuff that C didn't define were indeed correct. Undefined behavior is your enemy, and clearly, it gets even worse when the optimizer gets involved. *shudder*

- Jonathan M Davis

February 02, 2014

Re: Disallow null references in safe code?

Posted by Marc Schütz
in reply to Jonathan M Davis

Marc Schütz

Posted in reply to Jonathan M Davis

On Sunday, 2 February 2014 at 07:54:26 UTC, Jonathan M Davis wrote:
> On Saturday, February 01, 2014 19:44:44 Andrei Alexandrescu wrote:
>> On 2/1/14, 7:35 PM, deadalnix wrote:
>> > http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html
>> 
>> Whoa, thanks. So the compiler figures null pointer dereference in C is
>> undefined behavior, which means the entire program could do whatever if
>> that does happen.
>
> I think that article clearly illustrates that some of Walter's decisions in D
> with regards to fully defining some stuff that C didn't define were indeed
> correct. Undefined behavior is your enemy, and clearly, it gets even worse
> when the optimizer gets involved. *shudder*

Even without undefined behaviour, i.e. a guarantee that null-dereference leads to a segfault, the optimizer can deduce the pointer to be non-null after the dereference. Otherwise the code there could never be reached, because the program would have aborted. This in turn can cause the dereference to be optimized away, if its result is never used any more (dead store):

auto x = *p;
if(!p) {
    do_something(x);
}

In the first step, the if-block will be removed, because its condition is "known" to be false. After that, the value stored into x is unused, and the dereference can get removed too.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation