Jump to page: 1 2 3
Thread overview
Opt-in non-null class references?
Feb 28, 2018
SimonN
Feb 28, 2018
Jonathan M Davis
Feb 28, 2018
SimonN
Mar 01, 2018
aliak
Mar 02, 2018
SimonN
Mar 03, 2018
aliak
Mar 03, 2018
arturg
Mar 04, 2018
aliak
Mar 03, 2018
SimonN
Mar 04, 2018
aliak
Mar 04, 2018
SimonN
Mar 11, 2018
aliak
Mar 15, 2018
SimonN
Feb 28, 2018
Atila Neves
Feb 28, 2018
Kagamin
Feb 28, 2018
Jonathan M Davis
Mar 01, 2018
Kagamin
Mar 01, 2018
aliak
Mar 01, 2018
Jonathan M Davis
Mar 01, 2018
aliak
Mar 02, 2018
Jacob Carlborg
Mar 02, 2018
deadalnix
Mar 02, 2018
Stefan Koch
Mar 02, 2018
Paolo Invernizzi
Feb 28, 2018
Mike Franklin
Feb 28, 2018
SimonN
Feb 28, 2018
Chris M.
February 28, 2018
Hi,

Andrei said in 2014 that not-null-references should be the priority of 2014's language design, with consideration to make not-null the default. In case the code breakage is too high, this can be an opt-in compiler flag.

Discussion here: https://forum.dlang.org/post/lcq2il$2emp$1@digitalmars.com

Everybody in the 2014 thread was hyped, but has anything ever happened in the language? In November 2017, the D forum discussed C#'s non-null warnings. Has anybody thought about this again since?

In D, to prevent immense breakage, non-nullable class references need to be opt-in. I would love to see them and don't mind adapting my 25,000-line D-using project during a weekend.

Are there any counter-arguments to why non-nullable references/pointers haven't made it into D yet? Feel free to attack my answers below.

* * *

Argument: If A denotes non-null reference to class A, it can't have an init value.
Answer: Both A?.init and A.init shall be null, then use code-flow analysis.

This would match D's immutable: In a class constructor, you may assign the value 5 to a field of type immutable(int) that has init value 0. The compiler is happy as long as it can prove that we never write a second time during this constructor, and that we never read before the first assignment.

Likewise, it should be legal to assign from A to another A expression such as new A(), and the compiler is happy as long as the reference is assigned eventually, and if the reference is never read before assignment. (I haven't contributed to the compiler, I can't testify it's that easy.)

To allow hacks, it should remain legal to cast A? (nullable reference) to A (non-nullable). This should pass compilation (because casting takes all responsibility from the compiler) and then segfault at runtime, like any null dereference today.

* * *

Argument: I shall express non-null with contracts.
Answer: That's indeed the best solution without any language change. But it's bloaty and doesn't check anything at compile-time.

    class A { }
    void f1(A a) in { assert(a); } do { f2(a); }
    void f2(A a) in { assert(a); } do { f3(a); }
    void f3(A a) in { assert(a); } do { ...; }
    void g(A a) { if (a) ...; else ...; }

Sturdy D code must look like this today. Some functions handle the nulls, others request non-null refs from their callers. The function signature should express this, and a contract is part of the signature.

But several maintenance problems arise from non-null via contract.

First issue: We now rely on unit-testing to ensure our types are correct. You would do that in dynamic languages where the type system can't give you meaningful diagonstic errors otherwise. I'd rather not fall back to this in D. It's easy to forget such tests, coverage analysis doesn't help here.

Second issue: Introducing new fields requires updating all methods that uses the fields. This isn't necessarily only the methods in the class. If you have this code:

    class B {
        A a1;
        void f1() in { assert(a1); } do { ... }
        void f2() in { assert(a1); } do { ... }
    }

When you introduce more fields, you must update every method. This is bug-prone; we have final-switch (a full-blown language feature) just to solve similar issues:

    class B {
        A a1;
        A a2;
        void f1() in { assert(a1); assert(a2); } do { ... }
        void f2() in { assert(a1); /+ forgot +/ } do { ... }
    }

Third issue: Most references in a program aren't null. Especially class references that are fields of another class are often initialized in the constructor once, and never re-set. This is the predominant use of references. In D, the default, implicit case should do the Right Thing; it's fine when nonstandard features (allowing null) are explicit.

Assuming that A means non-null A, I would love this instead:

    class A { }
    void f1(A a) { f2(a); }
    void f2(A a) { f3(a); }
    void f3(A a) { ...; }
    void g(A? a) { if (a) ...; else ...; }
Or:
    void g(A @nullable a) { if (a) ...; else ...; }

Code-flow analysis can already statically check that we initialize immutable values only once. Likewise, it should check that we only pass A? to f1 after we have tested it for non-null, and that we only call methods on A? after checking for its non-null-ness (and the type of `a' inside the `if' block should probably still be A?, not A.)

* * *

Argument: null refs aren't a problem, they're memory-safe.
Answer: Memory-safety is not the concern here. Readability of code is, and preventing at compiletime what safely explodes at runtime.

* * *

Argument: Roll your own non-null type as a wrapper around D's nullable class reference.
Answer: That will look ugly, is an abstraction inversion, and checks at runtime only.

    class A { }

    struct NotNull(T)
        if (is(T == class))
    {
        T payload;
        @disable this();
        this(T t) {
            assert(t !is null);
            payload = t;
        }
        alias payload this;
    }

    NotNull!A a = NotNull!A(new A());

The non-nullable type is type with simpler behavior, I can call all methods without segfault. The nullable type is the more complex type, I can either call methods on it or must check first for non-nullness. My NotNull implements a simple type in terms of a more complex type. Such abstraction inversion is dubious design.

And this solution would only assert at runtime again, not at compile time.

Microsoft's C++ Guideline Support Library has not_null<T>. That attacks the right problem, but becomes boilerplate when it appears everywhere in your codebase.

* * *

Argument: If A is going to denote non-null-A, then this will break huge amounts of code.
Answer: Like @safe, any such massive break must be opt-in.

The biggest downside of opt-in is that few projects will use it, and the feature will be buggy for a long time.

For example, associative arrays in opt-in @safe code together with overriding opEquals with @safe-nothrow-... annotations, all this can subtly fail if you mix it in complicated ways. Sometimes, you resort to ripping out the good annotations in your projects to please the compiler instead of dustmiting your project.

* * *

Argument: It's not worth it.

I firmly believe it's worth it, but I accept that others deem other things more important.

I merely happen to love OOP and use D classes almost everywhere, thus I have references everywhere, and methods everywhere that accept references as parameters.

-- Simon

I'll be happy to discuss this in person at DConf 2018. :-)
February 28, 2018
On Wednesday, February 28, 2018 13:43:37 SimonN via Digitalmars-d wrote:
> Answer: Both A?.init and A.init shall be null, then use code-flow analysis.

I expect that pretty much anything you propose that requires code flow analysis is DOA. Walter is almost always against features that require it, because it's so hard to get right, and the places that D does use it tend to have problems (e.g. it's actually quite trivial to use a const or immutable member variable before it's initialized). In fact, IIRC, in the most recent discussion on having the compiler give an error when it can detect that a null pointer or reference is being dereferenced, Walter was arguing against precisely because code-flow analysis is so hard to get right, and encoding it in the spec is particularly bad (which would be required for anything involving errors). If non-nullable references were added to D, I expect that they would have to be like structs marked with

@disable this();

with all of the cons that go with that.

> Argument: It's not worth it.

I'm very much in that camp. I've never understood why some folks have so many problems with null pointers. Personally, about the worst that I normally have to deal with is forgetting to initialize class reference or pointer, and that blows up quite quickly such that it's fixed quite quickly. And that's in C++, D, Java, or any other language that I've used. Null pointers/references are simply not something that I've ever seen much of a problem with even when I use pointers heavily.

And as idiomatic D code tends to use classes rarely, it's that much less useful for D than it would be for many other languages. I know that some folks think that null is a huge problem, and some folks use classes or pointers much more than idiomatic D typically does, but I definitely don't think that adding a new type of pointer or reference to the language to try to deal with null pointers/references is worth the extra complication. It's a huge complication for what I personally believe is a small problem, though obviously, not everyone agrees on that point. I have no idea what Walter and Andrei's current stances on such an idea are other than the fact that Walter is very much against using code-flow analysis for something like verifying that a pointer or reference has been initialized before it's dereferenced.

- Jonathan M Davis

February 28, 2018
On Wednesday, 28 February 2018 at 14:05:19 UTC, Jonathan M Davis wrote:
> I expect that pretty much anything you propose that requires code flow analysis is DOA.
> Walter was arguing against precisely because code-flow analysis is so hard to get right,

Thanks, that's an important judgement. I've read the 3 threads that I found around this issue, but didn't notice this sentiment before that code-flow analysis is so problematic.

Yeah, non-null class fields hinge on code-flow analysis. And I'll accept that pushing non-null refs won't lead to anything if the necessary code-flow analysis is too tricky for the benefit.

> I've never understood why some folks have so many problems with null pointers.

My gripe is that the necessarily-nullable class reference doesn't express the intent.
Either a codebase must rely on silent conventions or every function with asserts.

> and that blows up quite quickly such that it's fixed quite quickly.

Yeah, I admit that most null crashes surface adequately quickly even when you have to run the program first.

It's merely sad to see D, with all its powerful static inspection, rely on runtime tests for nulls while other languages (Kotlin, Zig, and 2017 C#) rule null out at compile-time, as if it's the most natural thing in the world.

-- Simon
February 28, 2018
On Wednesday, 28 February 2018 at 13:43:37 UTC, SimonN wrote:
> Hi,
>
> Andrei said in 2014 that not-null-references should be the priority of 2014's language design, with consideration to make not-null the default. In case the code breakage is too high, this can be an opt-in compiler flag.

You might be interested in this little experiment:  https://github.com/dlang/dmd/pull/7375

Mike

February 28, 2018
On Wednesday, 28 February 2018 at 15:29:17 UTC, Mike Franklin wrote:
> You might be interested in this little experiment:  https://github.com/dlang/dmd/pull/7375

Indeed, this looks extremely useful, at the very least in a linter. I probably rely on ints getting initialized to 0 throughout the program, and only rarely make that explicit.

With null references, the problem is not forgetting the initialization; it's expressing the intent of the variable. Usually, I want the non-null, but sometimes, I want a nullable reference and would like to require the using code to test the reference for null. Merely verifying for initialization doesn't help here; it may well be intended that null will be assigned later to the reference.

-- Simon
February 28, 2018
On Wednesday, 28 February 2018 at 14:05:19 UTC, Jonathan M Davis wrote:
> On Wednesday, February 28, 2018 13:43:37 SimonN via Digitalmars-d wrote:
>> [...]
>
> I expect that pretty much anything you propose that requires code flow analysis is DOA. Walter is almost always against features that require it, because it's so hard to get right, and the places that D does use it tend to have problems (e.g. it's actually quite trivial to use a const or immutable member variable before it's initialized). In fact, IIRC, in the most recent discussion on having the compiler give an error when it can detect that a null pointer or reference is being dereferenced, Walter was arguing against precisely because code-flow analysis is so hard to get right, and encoding it in the spec is particularly bad (which would be required for anything involving errors). If non-nullable references were added to D, I expect that they would have to be like structs marked with
>
> [...]

I don't understand the problems with null either - my program segfaults, I look at the core dump and initialise whatever it is that was T.init. And this in the rare case I actually use a pointer or a class instance to begin with.

I also declare nearly every variable with `const var = <expr>;`, so I guess that makes it even more unlikely for me.

Atila
February 28, 2018
On Wednesday, 28 February 2018 at 13:43:37 UTC, SimonN wrote:
> Hi,
>
> Andrei said in 2014 that not-null-references should be the priority of 2014's language design, with consideration to make not-null the default. In case the code breakage is too high, this can be an opt-in compiler flag.
>
> [...]

I've slowly come around to supporting this idea. I'd rather avoid segfaults in the first place and avoid extra effort checking for null if possible.

It also sets clearer expectations for a user. For example, D now:

Class func(T param); // user always needs to worry about if the return value is null or not, there may be that edge case where it is null

D with non-nullable references:

Class func(T param); // user knows that the return value will not be null, no need to check
Nullable!Class func(T param); // user knows they need to check for null and handle it.

That's my two cents anyways
February 28, 2018
On Wednesday, 28 February 2018 at 14:05:19 UTC, Jonathan M Davis wrote:
> Walter is almost always against features that require it, because it's so hard to get right

Doesn't difficulty depend on what exactly to get right? It's not a spherical problem in vacuum.
February 28, 2018
On Wednesday, February 28, 2018 19:43:07 Kagamin via Digitalmars-d wrote:
> On Wednesday, 28 February 2018 at 14:05:19 UTC, Jonathan M Davis
>
> wrote:
> > Walter is almost always against features that require it, because it's so hard to get right
>
> Doesn't difficulty depend on what exactly to get right? It's not a spherical problem in vacuum.

Feel free to discuss any code-flow analysis issues with Walter, but he's consistently been against using it for much of anything whenever I've seen him discuss it. Not even stuff like VRP covers multiple lines of code, because doing so would require code-flow analysis. And he's specifically said that he's against using it for detecting when a null pointer is dereferenced. IIRC, he's stated that dmd's optimizer uses code-flow analysis for some stuff, but for anything that involves putting it in the frontend where the behavior would have to be encoded in the spec, he's been against it.

- Jonathan M Davis

March 01, 2018
On Wednesday, 28 February 2018 at 23:58:44 UTC, Jonathan M Davis wrote:
> Feel free to discuss any code-flow analysis issues with Walter, but he's consistently been against using it for much of anything whenever I've seen him discuss it.

I'd say massive breakage is what precludes it. Well, I like the way default initialization works in D.

> Not even stuff like VRP covers multiple lines of code, because doing so would require code-flow analysis.

It feels like handling VRP for linear code can be simple enough, tracking VRP across branching is more of an overkill.
« First   ‹ Prev
1 2 3