Richard (Rikki) Andrew Cattermole
Posted in reply to Quirin Schroll
|
On 13/08/2024 10:33 PM, Quirin Schroll wrote:
> On Monday, 12 August 2024 at 12:02:52 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> On 12/08/2024 10:02 PM, Quirin Schroll wrote:
>>>>>> > No data flow analysis is proposed. Null checking is local
>>>>>> > > and
>>>>>> done by tracking ? and ! by the type system.
>>>>>>
>>>>>> DFA is only required if you want the type state to change as the function is interpreted. So that's fine. That is a me thing to figure out.
>>>>>
>>>>> If I understand correctly, by “type state” you means something like value range propagation. It basically *is* value range propagation, however the ranges in question are `null` and all non-null values. You don’t suggest `typeof` type of a variable or expression changes, correct? (I think that would be very weird.)
>>>>
>>>> No, I meant type state.
>>>>
>>>> https://en.wikipedia.org/wiki/Typestate_analysis
>>>>
>>>> unreachable < reachable < initialized < default-initialized < non-null < user
>>>
>>> I didn’t read the Wikipedia article in detail, but it contains no “null,” so I’m wondering how it’s related. A variable of non-nullable type must be initialized. If we’re talking `@system` code, fine, it need not be, it could even be void initialized. IIUC, typestate analysis could be used to make void initialization `@safe` by proving that a void initialized value has definitely been initialized whenever it’s read (i.e. no uninitialized read).
>>>
>>> IIUC, what you’re suggesting is allowing variables of non-null type to be initialized by `null`, but that reading one requires them to be initialized.
>>
>> No.
>>
>> Initialized, just means it has been initialized. The value, has no guarantees beyond this.
>>
>> It may be read, it may be mutated.
>>
>> A non-null type state, means that it has been initialized AND its value isn't the sentinel value null.
>>
>> If it is non-null it may be dereferenced. An initialized pointer may not be dereferenced as it is lower than non-null.
>>
>>>>>> However, you do not need to annotate function body variables with this approach.
>>>>>>
>>>>>> Look at the initializer of a function variable declaration, it'll tell you if it has the non-null type state.
>>>>>>
>>>>>> ```d
>>>>>> int* ptr1;
>>>>>> int* ptr2 = ptr1;
>>>>>> ```
>>>>>
>>>>> The only issue is, just because e.g. a pointer is initialized with something non-null (e.g. the address of a variable), that doesn’t mean some logic later won’t assign `null` to it.
>>>>
>>>> Right, that would have to be disallowed without DFA, since the type state must not change throughout a function body.
>>>
>>> Why wouldn’t it be able to?
>>
>> You need the DFA to be able to prove the guarantees in the type system hold.
>>
>> Remove the ability for the type state to change, and you don't need the DFA.
>>
>>>>>> However the problem which caused me some problems in the past is on tracking variables outside of a function. You cannot do it.
>>>>>>
>>>>>> Variables outside a function change type state during their lifespan. They have the full life cycle, starting at reachable, into non-null and then back to reachable. If you tried to force it to be non-null, the language would force you to have an .init value that is non-null. This is an known issue with classes already. It WILL produce logic errors that are undetectable.
>>>>>
>>>>> I don’t care much about tracking. Probably, with `if (auto) ...`, you can just rename the variable, but typed non-nullable:
>>>>>
>>>>> ```d
>>>>> void f(int*? p)
>>>>> {
>>>>> if (int* q = p) ... else return;
>>>>> int v = *q; // no error, q isn’t nullable, not by analysis, just by type
>>>>> }
>>>>> ```
>>>>
>>>> What matters here is that you do not need to add annotation to the type itself. It only needs to exist within the function signature. Anywhere else its useless information.
>>>
>>> I don’t understand. To me, `Object!` and `Object?` are related but different types. You can have arrays of them, etc., how else would the information of nullableness be retained?
>>>
>>> Maybe I need some info dump on type state analysis and what you mean exactly, because as I understand, TSA would only give you an implicit cast from `T?` to `T!` in some cases, similar to how uniqueness gives you an implicit cast from `T` to `immutable(T)` in some cases.
>>
>> No, it goes in both direction.
>>
>> Type state analysis is based upon a scale, that has a transfer function to go up and down it.
>>
>> You start with unreachable, meaning you cannot read or mutate it. Any access is an error.
>>
>> Next is reachable, you can write to it, but cannot read it. This is void initialized (uninitialized). When a variable declaration is seen this is the default prior to handling the initializer expression.
>>
>> Initialized can be both read and mutated. It is the default in D. For pointers this is the sentinel value null. Aka its nullable. It must not be dereferenced.
>>
>> Non-null is a pointer proven to not be the sentinel value null. It may be dereferenced as well as read/mutated.
>>
>> As you increase in the scale, you get more guarantees, and therefore safety to perform otherwise potentially wrong logic.
>>
>> In such analysis, with a DFA you do it as part of the variables, not the types.
>>
>> ```d
>> int* var;
>> // type state initialized
>>
>> if (var !is null) {
>> // type state non-null
>> } // type state min(initialized, non-null)
>>
>> var = new int;
>> // type state non-null
>> ```
>
> So, yes, basically it if TSA can prove a (nullable) pointer definitely isn’t null at some point, at this point, it may be treated like (including converted to) a non-nullable pointer (e.g. copied to one, be dereferenced, etc.).
>
> I see two concerns:
> - The guarantees might be really weak, i.e. TSA might not be able to prove much in practice when it comes to non-null.
> - It might be hard to explain why a variable is possibly null at some point. If we don’t even have TSA and the error is “`x` is of nullable type” that’s understandable. I have to copy `x` to a variable that’s of non-null type using a language construct that incurs an assertion or check. On the other hand, with TSA, the compiler must assume the programmer expected TSA to prove something non-null, but it couldn’t, and explaining why might be not very insightful and thus not very actionable.
You only need to store converge points to improve the error message significantly. Anything with multiple scopes. Such as switch statement, loops ext. Do that 2 or three times and you should be able to produce a pretty nice error message. However, I won't be implementing that. Somebody else can do it, different skill set that I don't have currently.
> Illustrating the first concern:
> ```d
> int** global;
>
> void remember(ref int* p) @system { global = &p; }
>
> void setNull() @system { *global = null; }
>
> void main() @system
> {
> int* p = new int;
> // TSA: p is not null here
> remember(p);
> // TSA: p is not null here(?)
> setNull();
> // TSA: ???
> }
> ```
Right, to do this, you had to drop out of ``@safe``. Making ``@trusted`` and ``@system`` safe, is not a design goal of D.
With ``@safe`` escape analysis will mark the by-ref parameter as ``scope`` and won't let you escape it, preventing this situation.
> How would TSA “know” that `p` changed after `setNull`? D allows for a lot of action at a distance (mostly because D has pointers).
>
> My suspicion is that, unfortunately, because TSA has to make conservative assumptions, it’ll have to give us rather weak guarantees after innocuous things happen, like a function call.
No, only when ``@system`` functions are called. It'll be enforced in ``@safe`` and ``@trusted`` should hopefully detect it for when it is called.
> I have some experience with C#’s non-nullable types. If you hover over a variable of reference type, it’ll tell you if the variable can be null (initially surprisingly, even if the variable is typed non-null, but that’s because C#’s non-null annotations are more of a suggestion than a guarantee). I don’t know how many people code D with an editor that has some equivalent of IntelliSense. I don’t.
>
> Drawing from C#, it also does null analysis for properties. You speak of variables, but what about properties?
Fields, and globals are not supported due to temporal safety. You must perform the load into a function variable before access/mutation.
Even if it was supported, another thread can mutate it from under you after a check (or a known analysis state) and at CT you wouldn't know about it.
> The likely explanation for why that is, is that the non-null state is fragile. An initialized variable won’t ever become uninitialized, not because that’s logically impossible, but the language has no operation that would do that.
In D it would have the ability to, move constructors.
Same with unreachable, loops & goto reset reachability of variables.
> A similar issue is with structs’ `init`. I hate it. C++ has it right here using default constructors. A struct with invariants may have its invariants violated by `init`. It must have a constructor ran over it to be valid. Here, I’d assume TSA could do some work, but again, action at a distance. Only if we disallow resetting a struct with invariants to `init` do we get way. But then, what about moves?
I'm not touching invariants. I view them as good as is.
|