November 22, 2018
On 11/20/18 6:14 PM, Johan Engelen wrote:
> On Tuesday, 20 November 2018 at 19:11:46 UTC, Steven Schveighoffer wrote:
>> On 11/20/18 1:04 PM, Johan Engelen wrote:
>>>
>>> D does not make dereferencing on class objects explicit, which makes it harder to see where the dereference is happening.
>>
>> Again, the terms are confusing. You just said the dereference happens at a.foo(), right? I would consider the dereference to happen when the object's data is used. i.e. when you read or write what the pointer points at.
> 
> But `a.foo()` is already using the object's data: it is accessing a function of the object and calling it. Whether it is a virtual function, or a final function, that shouldn't matter. There are different ways of implementing class function calls, but here often people seem to pin things down to one specific way. I feel I stand alone in the D community in treating the language in this abstract sense (like C and C++ do, other languages I don't know). It's similar to that people think that local variables and the function return address are put on a stack; even though that is just an implementation detail that is free to be changed (and does often change: local variables are regularly _not_ stored on the stack [*]).
> 
> Optimization isn't allowed to change behavior of a program, yet already simple dead-code-elimination would when null dereference is not treated as UB or when it is not guarded by a null check. Here is an example of code that also does what you call a "dereference" (read object data member):
> ```
> class A {
>      int i;
>      final void foo() {
>          int a = i; // no crash with -O
>      }
> }
> 
> void main() {
>      A a;
>      a.foo();  // dereference happens
> }
> ```

I get what you are saying. But in terms of memory safety *both results* are safe. The one where the code is eliminated is safe, and the one where the segfault happens is safe.

This is a tricky area, because D depends on a hardware feature for language correctness. In other words, it's perfectly possible for a null read or write to not result in a segfault, which would make D's allowance of dereferencing a null object without checking for null actually unsafe (now it's just another dangling pointer).

In terms of language semantics, I don't know what the right answer is. If we want to say that if an optimizer changes program behavior, the code must be UB, then this would have to be UB.

But I would prefer saying something like -- if a segfault occurs and the program continues, the system is in UB-land, but otherwise, it's fine. If this means an optimized program runs and a non-optimized one crashes, then that's what it means. I'd be OK with that result. It's like Schrodinger's segfault!

I don't know what it means in terms of compiler assumptions, so that's where my ignorance will likely get me in trouble :)

> These discussions are hard to do on a mailinglist, so I'll stop here. Until next time at DConf, I suppose... ;-)

Maybe that is a good time to discuss for learning how things work. But clearly people would like to at least have a say here.

I still feel like using the hardware to deal with null access is OK, and a hard-crash is the best result for something that clearly would be UB otherwise.

-Steve
November 22, 2018
On Monday, 19 November 2018 at 21:23:31 UTC, Jordi Gutiérrez Hermoso wrote:
> When I was first playing with D, I managed to create a segfault by doing `SomeClass c;` and then trying do something with the object I thought I had default-created, by analogy with C++ syntax. Seasoned D programmers will recognise that I did nothing of the sort and instead created c is null and my program ended up dereferencing a null pointer.
>
> I'm not the only one who has done this. I can't find it right now, but I've seen at least one person open a bug report because they misunderstood this as a bug in dmd.
>
> I have been told a couple of times that this isn't something that needs to be patched in the language, but I don't understand. It seems like a very easy way to generate a segfault (and not a NullPointerException or whatever).
>
> What's the reasoning for allowing this?

The natural way forward for D is to add static analysis in the compiler that tracks use of possibly uninitialized classes (and perhaps also pointers). This has been discussed many times on the forums. The important thing with such an extra warning is to incrementally add it without triggering any false positives. Otherwise programmers aren't gonna use it.
November 22, 2018
On Thursday, 22 November 2018 at 15:38:18 UTC, Per Nordlöw wrote:

> The natural way forward for D is to add static analysis in the compiler that tracks use of possibly uninitialized classes (and perhaps also pointers). This has been discussed many times on the forums. The important thing with such an extra warning is to incrementally add it without triggering any false positives. Otherwise programmers aren't gonna use it.

I'd say the problem here is not just false positives, but false negatives!
November 22, 2018
On Thu, 22 Nov 2018 15:50:01 +0000, Stefan Koch wrote:
> I'd say the problem here is not just false positives, but false negatives!

False negatives are a small problem. The compiler fails to catch some errors some of the time, and that's not surprising. False positives are highly vexing because it means the compiler rejects valid code, and that sometimes requires ugly circumlocutions to make it work.
November 22, 2018
On Wednesday, November 21, 2018 3:24:06 PM MST Johan Engelen via Digitalmars-d-learn wrote:
> On Wednesday, 21 November 2018 at 07:47:14 UTC, Jonathan M Davis
>
> wrote:
> > IMHO, requiring something in the spec like "it must segfault when dereferencing null" as has been suggested before is probably not a good idea is really getting too specific (especially considering that some folks have argued that not all architectures segfault like x86 does), but ultimately, the question needs to be discussed with Walter. I did briefly discuss it with him at this last dconf, but I don't recall exactly what he had to say about the ldc optimization stuff. I _think_ that he was hoping that there was a way to tell the optimizer to just not do that kind of optimization, but I don't remember for sure.
>
> The issue is not specific to LDC at all. DMD also does optimizations that assume that dereferencing [*] null is UB. The example I gave is dead-code-elimination of a dead read of a member variable inside a class method, which can only be done either if the spec says that`a.foo()` is UB when `a` is null, or if `this.a` is UB when `this` is null.
>
> [*] I notice you also use "dereference" for an execution machine [**] reading from a memory address, instead of the language doing a dereference (which may not necessarily mean a read from memory). [**] intentional weird name for the CPU? Yes. We also have D code running as webassembly...

Skipping a dereference of null shouldn't be a problem as far as memory safety goes. The issue is if the compiler decides that UB allows it do to absolutely anything, and it rearranges the code in such a way that invalid memory is accessed. That cannot be allowed in @safe code in any D compiler. The code doesn't need to actually segfault, but it absolutely cannot access invalid memory even when optimized.

Whether dmd's dead code elimination algorithm is able to make @safe code unsafe, I don't know. I'm not familiar with dmd's internals, and in general, while I have a basic understanding of the stuff at the various levels of a compiler, once the discussion gets to stuff like machine instructions and how the optimizer works, my understanding definitely isn't deep. After we discussed this issue with regards to ldc at dconf, I brought it up with Walter, and he didn't seem to think that dmd had such a problem, but I didn't think to raise that particular possibility either. It wouldn't surprise me if dmd also had issues in its optimizer that made @safe not @safe, and it wouldn't surprise me if it didn't. It's the sort of area where I'd expect that ldc's more aggressive optimizations to be much more likely to run into trouble, and it's more likely to do things that Walter isn't familiar with, but that doesn't mean that Walter didn't miss anything with dmd either. After all, he does seem to like the idea of allowing the optimizer to assume that assertions are true, and as far as I can tell based on discussions on that topic, he doesn't seem to have understood (or maybe just didn't agree) that if we did that, the optimizer can't be allowed to make that assumption if there's any possibility of the code not being memory safe if the assumption is wrong (at least not without violating the guarantees that @safe is supposed to provide). Since if the assumption turns out to be wrong (which is quite possible, even if it's not likely in well-tested code), then @safe would then violate memory safety.

As I understand it, by definition, @safe code is supposed to not have undefined behavior in it, and certainly, if any compiler's optimizer takes undefined behavior as meaning that it can do whatever it wants at that point with no restrictions (which is what I gathered from our discussion at dconf), then I don't see how any D compiler's optimizer can be allowed to think that anything is UB in @safe code. That may be why Walter was updating various parts of the spec a while back to talk about compiler-defined as opposed to undefined, since there are certainly areas where the compiler can have leeway with what it does, but there are places (at least in @safe code), where there must be restrictions on what it can assume and do even when the implementation is given leeway, or @safe's memory safety guarantees won't actually be properly guaranteed.

In any case, clearly this needs to be sorted out with Walter, and the D spec needs to be updated in whatever manner best fixes the problem. Null pointers / references need to be guaranteed to be @safe in @safe code. Whether that's going to require that the compiler insert additional null checks in at least some places, I don't know. I simply don't know enough about how things work with stuff like the optimizers, but it wouldn't surprise me if in at least some cases, the compiler is ultimately going to be forced to insert null checks. Certainly, at minimum, I think that it's quite clear that if a platform doesn't segfault like x86 does, then it would have to.

- Jonathan M Davis



November 22, 2018
On Thursday, 22 November 2018 at 15:50:01 UTC, Stefan Koch wrote:
> I'd say the problem here is not just false positives, but false negatives!

With emphasis on _incremental_ additions to the compiler for covering more and more positives without introducing any _false_ negatives whatsoever. Without loosing compilation performance.

I recall Walter saying this is challenging to get right but a very interesting task.

This would make D even more competitive against languages such as Rust.
November 23, 2018
On Thursday, 22 November 2018 at 23:10:06 UTC, Per Nordlöw wrote:
> With emphasis on _incremental_ additions to the compiler for covering more and more positives without introducing any _false_ negatives whatsoever. Without loosing compilation performance.

BTW, should such a compiler checking in D include pointers beside mandatory class checking?
November 29, 2018
On Monday, 19 November 2018 at 21:23:31 UTC, Jordi Gutiérrez Hermoso wrote:
> When I was first playing with D, I managed to create a segfault

> What's the reasoning for allowing this?

100 % agree that there should be non-nullable class references, they're my main missing feature in D. Likewise, I'm astonished that only few D users wish for them.

I understand that it's very hard to get @safely right, without code-flow analysis that Walter prefers to keep at minimum throughout D.

I'm concerned about the clarity of usercode. I would like to ensure in my function signatures that only non-null class references are accepted as input, or that only non-null class references will be returned. All possibilities in current D have drawbacks:

a) Add in/out contracts for over 90 % of the class variables?
This is nasty boilerplate.

b) Check all arguments for null, check all returned values for null?
This is against the philosophy that null should be cost-free. Also boilerplate.

c) Declare the function as if it accepts null, but segfault on receiving null?
This looks like a bug in the program. Even if c) becomes a convention in the codebase, then when the function segfaults in the future, it's not clear to maintainers whether the function or the caller has the bug.

I discussed some ideas in 2018-03:
https://forum.dlang.org/post/epjwwtstyphqknavycxt@forum.dlang.org

-- Simon
November 30, 2018
On Monday, 19 November 2018 at 21:23:31
On Monday, 19 November 2018 at 21:23:31 UTC, Jordi Gutiérrez Hermoso wrote:
>
> I'm not the only one who has done this. I can't find it right now, but I've seen at least one person open a bug report because they misunderstood this as a bug in dmd.
>
> I have been told a couple of times that this isn't something that needs to be patched in the language, but I don't understand. It seems like a very easy way to generate a segfault (and not a NullPointerException or whatever).
>

I love Null in an empty class variable and I use it very often in my code. It simplifies a lot.

What would be a better way? (practical not theoretical)

Regards Ozan
November 30, 2018
On Thursday, 29 November 2018 at 18:31:41 UTC, SimonN wrote:
> On Monday, 19 November 2018 at 21:23:31 UTC, Jordi Gutiérrez Hermoso wrote:
>> When I was first playing with D, I managed to create a segfault
>
>> What's the reasoning for allowing this?
>
> 100 % agree that there should be non-nullable class references, they're my main missing feature in D. Likewise, I'm astonished that only few D users wish for them.

https://github.com/aliak00/optional/blob/master/source/optional/notnull.d

"But I don't like the verbosity!"

alias MyClass = NotNullable!MyClassImpl;