Null-checked reference types (page 2)

Settings

Help

Index » DIP Development » Null-checked reference types (page 2)

August 07

Re: Null-checked reference types

Posted by Quirin Schroll
in reply to IchorDev

Permalink

Quirin Schroll

Posted in reply to IchorDev

Permalink

On Wednesday, 7 August 2024 at 14:34:00 UTC, IchorDev wrote:

On Wednesday, 7 August 2024 at 10:13:05 UTC, Quirin Schroll wrote:

Reference types are nullable, yet most of the time, actual references aren’t null and expected to be non-null.

Well that’s why associative arrays implicitly allocate themselves. I don’t think that would work for classes though…

Prime example: ref. Can be null, never is expected to be in practice, and when it happens to be null, it’s a bug. A bug not by the language semantics, but in practical code. 100% of the time.

Uh yeah… we should be preventing that.

Instead of a contract or documentation saying they have to be non-null, the best way is to have the type system enforce it at compile-time. Just to mention two, Kotlin and Zig default to non-nullable references / pointers. You have to annotate nullable ones and handle the null case.

Interesting. Do people who use those languages actually like that?

I don’t know, I’m neither a Zig nor Kotlin Dev.

I did some C# and like its direction with non-nullable types, but unfortunately, they have to be backwards compatible, which means ? types are treated as suggestions. There was a C# proposal to add !! parameters which makes them run-time checked for null. It was rejected, but I don’t remember why.

C#’s nullability stuff as one downside: T? means two different things depending if T is a value or reference type, and because C# has generics, not templates, in a generic context, using T? means: Nullable if T is a reference type, but non-nullable if T is a value type. However, if T is restricted to value types, it means nullable T.

> > >

Having nullability for value types might be nice too, but again it’s something you can already achieve in other ways.

Yes, optionals, which aren’t great to use. Having worked with C#, which has core-language nullable value types, I can tell you, it makes it really nice to work with them. If an indexOf function returns size_t? (or even better: some index_t? which hooks into the null semantics so that it reserves size_t.max for its null state), it’s clear that the case of whatever you’re seeking might not be there as to be accounted for.

Well there you go, maybe for value types we need something like the range interface but for nullability? And then for reference types can just do is null.

The whole goal of nullable annotations / types is to make the compiler find the places where is null is needed and where it’s not needed.

August 07

Re: Null-checked reference types

Posted by Sebastiaan Koppe
in reply to IchorDev

Permalink

Sebastiaan Koppe

Posted in reply to IchorDev

Permalink

On Wednesday, 7 August 2024 at 14:34:00 UTC, IchorDev wrote:

> >

Interesting. Do people who use those languages actually like that?

Absolutely. It's often praised as an important improvement over Java.

August 12

Re: Null-checked reference types

Posted by Quirin Schroll
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Quirin Schroll

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Wednesday, 7 August 2024 at 11:30:02 UTC, Richard (Rikki) Andrew Cattermole wrote:

On 07/08/2024 11:22 PM, Quirin Schroll wrote:

On Wednesday, 7 August 2024 at 01:39:29 UTC, Richard (Rikki) Andrew Cattermole wrote:

This allows you to do both loads and stores and do something if it failed transitively.

if (var1.var2?.var3?.field = 3) {
    // success
} else {
    // failure
}

I somehow don’t like if (… = …) when it’s not a declaration. At first sight, I thought you intended … == 3.

It's going to be valid regardless, due to AssignExpression.

Currently, assignments are not valid for conversion to bool. (Error: assignment cannot be used as a condition, perhaps `==` was meant?)

> > > >

No data flow analysis is proposed. Null checking is local and
done by tracking ? and ! by the type system.

DFA is only required if you want the type state to change as the function is interpreted. So that's fine. That is a me thing to figure out.

If I understand correctly, by “type state” you means something like value range propagation. It basically is value range propagation, however the ranges in question are null and all non-null values. You don’t suggest typeof type of a variable or expression changes, correct? (I think that would be very weird.)

No, I meant type state.

https://en.wikipedia.org/wiki/Typestate_analysis

unreachable < reachable < initialized < default-initialized < non-null < user

I didn’t read the Wikipedia article in detail, but it contains no “null,” so I’m wondering how it’s related. A variable of non-nullable type must be initialized. If we’re talking @system code, fine, it need not be, it could even be void initialized. IIUC, typestate analysis could be used to make void initialization @safe by proving that a void initialized value has definitely been initialized whenever it’s read (i.e. no uninitialized read).

IIUC, what you’re suggesting is allowing variables of non-null type to be initialized by null, but that reading one requires them to be initialized.

> > >

However, you do not need to annotate function body variables with this approach.

Look at the initializer of a function variable declaration, it'll tell you if it has the non-null type state.

int* ptr1;
int* ptr2 = ptr1;

The only issue is, just because e.g. a pointer is initialized with something non-null (e.g. the address of a variable), that doesn’t mean some logic later won’t assign null to it.

Right, that would have to be disallowed without DFA, since the type state must not change throughout a function body.

Why wouldn’t it be able to?
It might make sense to the programmer to initialize a variable with a definite non-null value, but later, e.g. on some error-like case, reassign null.

If you use inference, it may (depending on implementation) infer a non-nullable type. The right course of action is to use an explicit wider type. This is similar to how auto x = new Derived gives you x typed as Derived, and that bars you from assigning it some other Base type object. The right course of action is to declare x via Base x = new Derived.

> > >

However the problem which caused me some problems in the past is on tracking variables outside of a function. You cannot do it.

Variables outside a function change type state during their lifespan. They have the full life cycle, starting at reachable, into non-null and then back to reachable. If you tried to force it to be non-null, the language would force you to have an .init value that is non-null. This is an known issue with classes already. It WILL produce logic errors that are undetectable.

I don’t care much about tracking. Probably, with if (auto) ..., you can just rename the variable, but typed non-nullable:

void f(int*? p)
{
     if (int* q = p) ... else return;
     int v = *q; // no error, q isn’t nullable, not by analysis, just by type
}

What matters here is that you do not need to add annotation to the type itself. It only needs to exist within the function signature. Anywhere else its useless information.

I don’t understand. To me, Object! and Object? are related but different types. You can have arrays of them, etc., how else would the information of nullableness be retained?

Maybe I need some info dump on type state analysis and what you mean exactly, because as I understand, TSA would only give you an implicit cast from T? to T! in some cases, similar to how uniqueness gives you an implicit cast from T to immutable(T) in some cases.

August 13

Re: Null-checked reference types

Posted by Richard (Rikki) Andrew Cattermole
in reply to Quirin Schroll

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Quirin Schroll

Permalink

On 12/08/2024 10:02 PM, Quirin Schroll wrote:
>>>> > No data flow analysis is proposed. Null checking is local > and
>>>> done by tracking ? and ! by the type system.
>>>>
>>>> DFA is only required if you want the type state to change as the function is interpreted. So that's fine. That is a me thing to figure out.
>>>
>>> If I understand correctly, by “type state” you means something like value range propagation. It basically *is* value range propagation, however the ranges in question are `null` and all non-null values. You don’t suggest `typeof` type of a variable or expression changes, correct? (I think that would be very weird.)
>>
>> No, I meant type state.
>>
>> https://en.wikipedia.org/wiki/Typestate_analysis
>>
>> unreachable < reachable < initialized < default-initialized < non-null < user
> 
> I didn’t read the Wikipedia article in detail, but it contains no “null,” so I’m wondering how it’s related. A variable of non-nullable type must be initialized. If we’re talking `@system` code, fine, it need not be, it could even be void initialized. IIUC, typestate analysis could be used to make void initialization `@safe` by proving that a void initialized value has definitely been initialized whenever it’s read (i.e. no uninitialized read).
> 
> IIUC, what you’re suggesting is allowing variables of non-null type to be initialized by `null`, but that reading one requires them to be initialized.

No.

Initialized, just means it has been initialized. The value, has no guarantees beyond this.

It may be read, it may be mutated.

A non-null type state, means that it has been initialized AND its value isn't the sentinel value null.

If it is non-null it may be dereferenced. An initialized pointer may not be dereferenced as it is lower than non-null.

>>>> However, you do not need to annotate function body variables with this approach.
>>>>
>>>> Look at the initializer of a function variable declaration, it'll tell you if it has the non-null type state.
>>>>
>>>> ```d
>>>> int* ptr1;
>>>> int* ptr2 = ptr1;
>>>> ```
>>>
>>> The only issue is, just because e.g. a pointer is initialized with something non-null (e.g. the address of a variable), that doesn’t mean some logic later won’t assign `null` to it.
>>
>> Right, that would have to be disallowed without DFA, since the type state must not change throughout a function body.
> 
> Why wouldn’t it be able to?

You need the DFA to be able to prove the guarantees in the type system hold.

Remove the ability for the type state to change, and you don't need the DFA.

>>>> However the problem which caused me some problems in the past is on tracking variables outside of a function. You cannot do it.
>>>>
>>>> Variables outside a function change type state during their lifespan. They have the full life cycle, starting at reachable, into non-null and then back to reachable. If you tried to force it to be non-null, the language would force you to have an .init value that is non-null. This is an known issue with classes already. It WILL produce logic errors that are undetectable.
>>>
>>> I don’t care much about tracking. Probably, with `if (auto) ...`, you can just rename the variable, but typed non-nullable:
>>>
>>> ```d
>>> void f(int*? p)
>>> {
>>>      if (int* q = p) ... else return;
>>>      int v = *q; // no error, q isn’t nullable, not by analysis, just by type
>>> }
>>> ```
>>
>> What matters here is that you do not need to add annotation to the type itself. It only needs to exist within the function signature. Anywhere else its useless information.
> 
> I don’t understand. To me, `Object!` and `Object?` are related but different types. You can have arrays of them, etc., how else would the information of nullableness be retained?
> 
> Maybe I need some info dump on type state analysis and what you mean exactly, because as I understand, TSA would only give you an implicit cast from `T?` to `T!` in some cases, similar to how uniqueness gives you an implicit cast from `T` to `immutable(T)` in some cases.

No, it goes in both direction.

Type state analysis is based upon a scale, that has a transfer function to go up and down it.

You start with unreachable, meaning you cannot read or mutate it. Any access is an error.

Next is reachable, you can write to it, but cannot read it. This is void initialized (uninitialized). When a variable declaration is seen this is the default prior to handling the initializer expression.

Initialized can be both read and mutated. It is the default in D. For pointers this is the sentinel value null. Aka its nullable. It must not be dereferenced.

Non-null is a pointer proven to not be the sentinel value null. It may be dereferenced as well as read/mutated.

As you increase in the scale, you get more guarantees, and therefore safety to perform otherwise potentially wrong logic.

In such analysis, with a DFA you do it as part of the variables, not the types.

```d
int* var;
// type state initialized

if (var !is null) {
	// type state non-null
} // type state min(initialized, non-null)

var = new int;
// type state non-null
```

August 13

Re: Null-checked reference types

Posted by Quirin Schroll
in reply to Richard (Rikki) Andrew Cattermole

Permalink

Quirin Schroll

Posted in reply to Richard (Rikki) Andrew Cattermole

Permalink

On Monday, 12 August 2024 at 12:02:52 UTC, Richard (Rikki) Andrew Cattermole wrote:

On 12/08/2024 10:02 PM, Quirin Schroll wrote:

> > > > >

No data flow analysis is proposed. Null checking is local

and
done by tracking ? and ! by the type system.

DFA is only required if you want the type state to change as the function is interpreted. So that's fine. That is a me thing to figure out.

No, I meant type state.

https://en.wikipedia.org/wiki/Typestate_analysis

unreachable < reachable < initialized < default-initialized < non-null < user

IIUC, what you’re suggesting is allowing variables of non-null type to be initialized by null, but that reading one requires them to be initialized.

No.

Initialized, just means it has been initialized. The value, has no guarantees beyond this.

It may be read, it may be mutated.

A non-null type state, means that it has been initialized AND its value isn't the sentinel value null.

If it is non-null it may be dereferenced. An initialized pointer may not be dereferenced as it is lower than non-null.

> > > >

However, you do not need to annotate function body variables with this approach.

Look at the initializer of a function variable declaration, it'll tell you if it has the non-null type state.

int* ptr1;
int* ptr2 = ptr1;

The only issue is, just because e.g. a pointer is initialized with something non-null (e.g. the address of a variable), that doesn’t mean some logic later won’t assign null to it.

Right, that would have to be disallowed without DFA, since the type state must not change throughout a function body.

Why wouldn’t it be able to?

You need the DFA to be able to prove the guarantees in the type system hold.

Remove the ability for the type state to change, and you don't need the DFA.

> > > >

However the problem which caused me some problems in the past is on tracking variables outside of a function. You cannot do it.

I don’t care much about tracking. Probably, with if (auto) ..., you can just rename the variable, but typed non-nullable:

void f(int*? p)
{
     if (int* q = p) ... else return;
     int v = *q; // no error, q isn’t nullable, not by analysis, just by type
}

What matters here is that you do not need to add annotation to the type itself. It only needs to exist within the function signature. Anywhere else its useless information.

I don’t understand. To me, Object! and Object? are related but different types. You can have arrays of them, etc., how else would the information of nullableness be retained?

No, it goes in both direction.

Type state analysis is based upon a scale, that has a transfer function to go up and down it.

You start with unreachable, meaning you cannot read or mutate it. Any access is an error.

Next is reachable, you can write to it, but cannot read it. This is void initialized (uninitialized). When a variable declaration is seen this is the default prior to handling the initializer expression.

Initialized can be both read and mutated. It is the default in D. For pointers this is the sentinel value null. Aka its nullable. It must not be dereferenced.

Non-null is a pointer proven to not be the sentinel value null. It may be dereferenced as well as read/mutated.

As you increase in the scale, you get more guarantees, and therefore safety to perform otherwise potentially wrong logic.

In such analysis, with a DFA you do it as part of the variables, not the types.

int* var;
// type state initialized

if (var !is null) {
	// type state non-null
} // type state min(initialized, non-null)

var = new int;
// type state non-null

So, yes, basically it if TSA can prove a (nullable) pointer definitely isn’t null at some point, at this point, it may be treated like (including converted to) a non-nullable pointer (e.g. copied to one, be dereferenced, etc.).

I see two concerns:

The guarantees might be really weak, i.e. TSA might not be able to prove much in practice when it comes to non-null.
It might be hard to explain why a variable is possibly null at some point. If we don’t even have TSA and the error is “x is of nullable type” that’s understandable. I have to copy x to a variable that’s of non-null type using a language construct that incurs an assertion or check. On the other hand, with TSA, the compiler must assume the programmer expected TSA to prove something non-null, but it couldn’t, and explaining why might be not very insightful and thus not very actionable.

Illustrating the first concern:

int** global;

void remember(ref int* p) @system { global = &p; }

void setNull() @system { *global = null; }

void main() @system
{
    int* p = new int;
    // TSA: p is not null here
    remember(p);
    // TSA: p is not null here(?)
    setNull();
    // TSA: ???
}

How would TSA “know” that p changed after setNull? D allows for a lot of action at a distance (mostly because D has pointers).

My suspicion is that, unfortunately, because TSA has to make conservative assumptions, it’ll have to give us rather weak guarantees after innocuous things happen, like a function call.

I have some experience with C#’s non-nullable types. If you hover over a variable of reference type, it’ll tell you if the variable can be null (initially surprisingly, even if the variable is typed non-null, but that’s because C#’s non-null annotations are more of a suggestion than a guarantee). I don’t know how many people code D with an editor that has some equivalent of IntelliSense. I don’t.

Drawing from C#, it also does null analysis for properties. You speak of variables, but what about properties?

The likely explanation for why that is, is that the non-null state is fragile. An initialized variable won’t ever become uninitialized, not because that’s logically impossible, but the language has no operation that would do that.

A similar issue is with structs’ init. I hate it. C++ has it right here using default constructors. A struct with invariants may have its invariants violated by init. It must have a constructor ran over it to be valid. Here, I’d assume TSA could do some work, but again, action at a distance. Only if we disallow resetting a struct with invariants to init do we get way. But then, what about moves?

August 14

Re: Null-checked reference types

Posted by Richard (Rikki) Andrew Cattermole
in reply to Quirin Schroll

Permalink

Richard (Rikki) Andrew Cattermole

Posted in reply to Quirin Schroll

Permalink

On 13/08/2024 10:33 PM, Quirin Schroll wrote:
> On Monday, 12 August 2024 at 12:02:52 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> On 12/08/2024 10:02 PM, Quirin Schroll wrote:
>>>>>> > No data flow analysis is proposed. Null checking is local
>>>>>> > > and
>>>>>> done by tracking ? and ! by the type system.
>>>>>>
>>>>>> DFA is only required if you want the type state to change as the function is interpreted. So that's fine. That is a me thing to figure out.
>>>>>
>>>>> If I understand correctly, by “type state” you means something like value range propagation. It basically *is* value range propagation, however the ranges in question are `null` and all non-null values. You don’t suggest `typeof` type of a variable or expression changes, correct? (I think that would be very weird.)
>>>>
>>>> No, I meant type state.
>>>>
>>>> https://en.wikipedia.org/wiki/Typestate_analysis
>>>>
>>>> unreachable < reachable < initialized < default-initialized < non-null < user
>>>
>>> I didn’t read the Wikipedia article in detail, but it contains no “null,” so I’m wondering how it’s related. A variable of non-nullable type must be initialized. If we’re talking `@system` code, fine, it need not be, it could even be void initialized. IIUC, typestate analysis could be used to make void initialization `@safe` by proving that a void initialized value has definitely been initialized whenever it’s read (i.e. no uninitialized read).
>>>
>>> IIUC, what you’re suggesting is allowing variables of non-null type to be initialized by `null`, but that reading one requires them to be initialized.
>>
>> No.
>>
>> Initialized, just means it has been initialized. The value, has no guarantees beyond this.
>>
>> It may be read, it may be mutated.
>>
>> A non-null type state, means that it has been initialized AND its value isn't the sentinel value null.
>>
>> If it is non-null it may be dereferenced. An initialized pointer may not be dereferenced as it is lower than non-null.
>>
>>>>>> However, you do not need to annotate function body variables with this approach.
>>>>>>
>>>>>> Look at the initializer of a function variable declaration, it'll tell you if it has the non-null type state.
>>>>>>
>>>>>> ```d
>>>>>> int* ptr1;
>>>>>> int* ptr2 = ptr1;
>>>>>> ```
>>>>>
>>>>> The only issue is, just because e.g. a pointer is initialized with something non-null (e.g. the address of a variable), that doesn’t mean some logic later won’t assign `null` to it.
>>>>
>>>> Right, that would have to be disallowed without DFA, since the type state must not change throughout a function body.
>>>
>>> Why wouldn’t it be able to?
>>
>> You need the DFA to be able to prove the guarantees in the type system hold.
>>
>> Remove the ability for the type state to change, and you don't need the DFA.
>>
>>>>>> However the problem which caused me some problems in the past is on tracking variables outside of a function. You cannot do it.
>>>>>>
>>>>>> Variables outside a function change type state during their lifespan. They have the full life cycle, starting at reachable, into non-null and then back to reachable. If you tried to force it to be non-null, the language would force you to have an .init value that is non-null. This is an known issue with classes already. It WILL produce logic errors that are undetectable.
>>>>>
>>>>> I don’t care much about tracking. Probably, with `if (auto) ...`, you can just rename the variable, but typed non-nullable:
>>>>>
>>>>> ```d
>>>>> void f(int*? p)
>>>>> {
>>>>>      if (int* q = p) ... else return;
>>>>>      int v = *q; // no error, q isn’t nullable, not by analysis, just by type
>>>>> }
>>>>> ```
>>>>
>>>> What matters here is that you do not need to add annotation to the type itself. It only needs to exist within the function signature. Anywhere else its useless information.
>>>
>>> I don’t understand. To me, `Object!` and `Object?` are related but different types. You can have arrays of them, etc., how else would the information of nullableness be retained?
>>>
>>> Maybe I need some info dump on type state analysis and what you mean exactly, because as I understand, TSA would only give you an implicit cast from `T?` to `T!` in some cases, similar to how uniqueness gives you an implicit cast from `T` to `immutable(T)` in some cases.
>>
>> No, it goes in both direction.
>>
>> Type state analysis is based upon a scale, that has a transfer function to go up and down it.
>>
>> You start with unreachable, meaning you cannot read or mutate it. Any access is an error.
>>
>> Next is reachable, you can write to it, but cannot read it. This is void initialized (uninitialized). When a variable declaration is seen this is the default prior to handling the initializer expression.
>>
>> Initialized can be both read and mutated. It is the default in D. For pointers this is the sentinel value null. Aka its nullable. It must not be dereferenced.
>>
>> Non-null is a pointer proven to not be the sentinel value null. It may be dereferenced as well as read/mutated.
>>
>> As you increase in the scale, you get more guarantees, and therefore safety to perform otherwise potentially wrong logic.
>>
>> In such analysis, with a DFA you do it as part of the variables, not the types.
>>
>> ```d
>> int* var;
>> // type state initialized
>>
>> if (var !is null) {
>>     // type state non-null
>> } // type state min(initialized, non-null)
>>
>> var = new int;
>> // type state non-null
>> ```
> 
> So, yes, basically it if TSA can prove a (nullable) pointer definitely isn’t null at some point, at this point, it may be treated like (including converted to) a non-nullable pointer (e.g. copied to one, be dereferenced, etc.).
> 
> I see two concerns:
> - The guarantees might be really weak, i.e. TSA might not be able to prove much in practice when it comes to non-null.
> - It might be hard to explain why a variable is possibly null at some point. If we don’t even have TSA and the error is “`x` is of nullable type” that’s understandable. I have to copy `x` to a variable that’s of non-null type using a language construct that incurs an assertion or check. On the other hand, with TSA, the compiler must assume the programmer expected TSA to prove something non-null, but it couldn’t, and explaining why might be not very insightful and thus not very actionable.

You only need to store converge points to improve the error message significantly. Anything with multiple scopes. Such as switch statement, loops ext. Do that 2 or three times and you should be able to produce a pretty nice error message. However, I won't be implementing that. Somebody else can do it, different skill set that I don't have currently.

> Illustrating the first concern:
> ```d
> int** global;
> 
> void remember(ref int* p) @system { global = &p; }
> 
> void setNull() @system { *global = null; }
> 
> void main() @system
> {
>      int* p = new int;
>      // TSA: p is not null here
>      remember(p);
>      // TSA: p is not null here(?)
>      setNull();
>      // TSA: ???
> }
> ```

Right, to do this, you had to drop out of ``@safe``. Making ``@trusted`` and ``@system`` safe, is not a design goal of D.

With ``@safe`` escape analysis will mark the by-ref parameter as ``scope`` and won't let you escape it, preventing this situation.

> How would TSA “know” that `p` changed after `setNull`? D allows for a lot of action at a distance (mostly because D has pointers).
> 
> My suspicion is that, unfortunately, because TSA has to make conservative assumptions, it’ll have to give us rather weak guarantees after innocuous things happen, like a function call.

No, only when ``@system`` functions are called. It'll be enforced in ``@safe`` and ``@trusted`` should hopefully detect it for when it is called.

> I have some experience with C#’s non-nullable types. If you hover over a variable of reference type, it’ll tell you if the variable can be null (initially surprisingly, even if the variable is typed non-null, but that’s because C#’s non-null annotations are more of a suggestion than a guarantee). I don’t know how many people code D with an editor that has some equivalent of IntelliSense. I don’t.
> 
> Drawing from C#, it also does null analysis for properties. You speak of variables, but what about properties?

Fields, and globals are not supported due to temporal safety. You must perform the load into a function variable before access/mutation.

Even if it was supported, another thread can mutate it from under you after a check (or a known analysis state) and at CT you wouldn't know about it.

> The likely explanation for why that is, is that the non-null state is fragile. An initialized variable won’t ever become uninitialized, not because that’s logically impossible, but the language has no operation that would do that.

In D it would have the ability to, move constructors.

Same with unreachable, loops & goto reset reachability of variables.

> A similar issue is with structs’ `init`. I hate it. C++ has it right here using default constructors. A struct with invariants may have its invariants violated by `init`. It must have a constructor ran over it to be valid. Here, I’d assume TSA could do some work, but again, action at a distance. Only if we disallow resetting a struct with invariants to `init` do we get way. But then, what about moves?

I'm not touching invariants. I view them as good as is.

Top | Forum index | About this forum

Forums