2 days ago

On Wednesday, 30 April 2025 at 07:17:37 UTC, Kagamin wrote:

>

Finally I had my chance to cope with nullable types in one of our C# codebases, and the experience wasn't nice.

I had a prejudice that nullable types give some kind of promise that you will have only a few of them, but now that I think about it I can't remember anyone making this promise, and reality is quick to shatter this prejudice in a very ugly way.
If you have a nullable type, all code now sees it as nullable, and you must insert null checks everywhere, even if you know it's not null by that point, and the volume of this null check spam is uncomfortably large.

I don't really understand, why is your type nullable if you know for sure it's not null at this point ? If it's a function parameter, you'll have to think and ask yourself: "Does my function have any point on a null object?" If not, just remove the nullable type there.

If your type has a nullable field, ask yourself: "When will this be null?" If you know that after some action this can no longer be null, just create a new type to encode this knowledge. Just keep in mind that basically, having a nullable field means that the object is in valid state even when the field is null. An example of bad design is when a lot of functions dealing with that object type have guards/asserts at the beginning. It means probably that the object doesn't provide enough guarantees about what it's storing.

2 days ago
On 5/3/2025 3:10 PM, Timon Gehr wrote:
> No, it will not. It's UB in modern compiler backends, and there are increasingly important targets such as WASM where you can just write through a null pointer without any page protection. It also does not work in real mode, as well as some bare-metal/embedded systems.

On WASM, the null check would not be redundant, and so can be easily added by the code generator.

As for real mode, that's 16 bit architecture which D does not (officially) support.


> Furthermore, there are no null checks when you pass a dereferenced null pointer to a `ref` parameter.

That's right. And if you access such a ref parameter, you get a seg fault. Not UB.

> In practice (outside your small DMC compiler backend box), you can only dereference a non-null pointer, so dereferencing a nullable pointer is actually explicitly one of the cases where you convert it to a non-nullable one...

```
int main()
{
    int* p = (int*)0;
    int& q = *p;
    return q;
}
```
```
g++ x.c
./a.out
Segmentation fault (core dumped)
```


> Also, a segfault on some user's machine is a lot less useful than even a stack trace,

That's why the D runtime intercepts the seg fault and gives a stack trace. gdb will also give a stack trace.

> and being able to collect some crash info in a `scope(failure)` or similar is even more useful. It can reduce a month-long heisenbug hunt into a 15 minute fix.

I have a *lot* of experience chasing down null pointer violations in real mode DOS. None of them took me a month. I think the worst one took 3 days. As soon as I got my hands on a protected mode 286 system, I immediately switched all development to it, because of the null pointer hardware protection. Big productivity increase! After that, I only compiled the real mode version as the very last thing after all tests passed.

(The worst heisenbug problems I had were the result of uninitialized variables. Any changes at all to the code caused the problems to shift around. This is why D initializes all variables.)


> `assert(0)` in druntime is a similarly frustrating experience.

Why? You can set assert(0) to do one of several things, including calling your own custom handler. By default it'll tell you the file and line of the failure. I use assert(0) all the time as a debugging tool.


> If there were a flag to enable null checks on any nullable pointer dereference, I would enable it immediately in all my projects.

I would be very curious how much larger and slower the resulting executables would be.

P.S. Why would CPU designers put null checks in the hardware if it was so useless?

P.P.S. `scope(failure)` is not intended for catching `Error`, although it does. There's a PR to fix it to only catch `Exception`, but that failed in the test suite.
2 days ago
On 5/3/2025 6:19 PM, Walter Bright wrote:
> ```
> int main()
> {
>      int* p = (int*)0;
>      int& q = *p;
>      return q;
> }
> ```
> ```
> g++ x.c
> ./a.out
> Segmentation fault (core dumped)
> ```


The D version with the small DMD compiler:

```
int main()
{
    int* p = null;
    ref int q = *p;
    return q;
}
```
```
dmd x.d -O
x.d(6): Error: null dereference in function _Dmain
```
2 days ago
On 5/4/25 03:19, Walter Bright wrote:
>> In practice (outside your small DMC compiler backend box), you can only dereference a non-null pointer, so dereferencing a nullable pointer is actually explicitly one of the cases where you convert it to a non-nullable one...
> 
> ```
> int main()
> {
>     int* p = (int*)0;
>     int& q = *p;
>     return q;
> }
> ```
> ```
> g++ x.c
> ./a.out
> Segmentation fault (core dumped)
> ```

$ clang++ -O2 x.c
clang: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]
$ ./a.out
$ echo $?
48

This is not only the case in C. Translate the example to D and compile with LDC with optimizations enabled. You will most likely see zero segfaults.

I am sorry, but your position is simply untenable.

> 
>> Furthermore, there are no null checks when you pass a dereferenced null pointer to a `ref` parameter.
> 
> That's right. And if you access such a ref parameter, you get a seg fault. Not UB.
> 

You may or may not get a seg fault, because it is indeed UB. Potentially on some random non-computer person's Windows machine, at a location that is even further away from where the problem was than the original faulty dereference. It also may or may not crash at the location where you dereference the invalid `ref` parameter.

On an somewhat related note: in D you can currently create invalid non-null pointers in `@safe` code by exploiting the lack of checks.

> 
>> Also, a segfault on some user's machine is a lot less useful than even a stack trace,
> 
> That's why the D runtime intercepts the seg fault and gives a stack trace. gdb will also give a stack trace.
> ...

All I am seeing is that sometimes user's programs close and they are not giving me any further information because they don't really know how to use a computer and it happens infrequently enough that they don't care enough to try to figure out how to help me out. This may be a null dereference, or it could be something else. Maybe it even prints a stack trace on Windows, but the console window closes on them immediately or something. Who knows what actually is happening. I am not seeing this on my machine...

I can probably try to set up some other way to debug that then gets flagged as malware even harder. However, none of this is necessary in the first place. If my `scope(exit)` ran they would be able to immediately provide all the info I need, as was the case with a couple thrown Exceptions and Errors in the past.


>> and being able to collect some crash info in a `scope(failure)` or similar is even more useful. It can reduce a month-long heisenbug hunt into a 15 minute fix.
> 
> I have a *lot* of experience chasing down null pointer violations in real mode DOS. None of them took me a month. I think the worst one took 3 days. As soon as I got my hands on a protected mode 286 system, I immediately switched all development to it, because of the null pointer hardware protection. Big productivity increase! After that, I only compiled the real mode version as the very last thing after all tests passed.
> ...

Well congrats, seems your bugs were reproducible on your own machine and at a frequency of more than once every couple of months. I don't even know if the crashes are my fault or not. Maybe they even happen in some dependency, which I would know if D's error handling was more useful.

> (The worst heisenbug problems I had were the result of uninitialized variables. Any changes at all to the code caused the problems to shift around. This is why D initializes all variables.)
> 
> 
>> `assert(0)` in druntime is a similarly frustrating experience.
> 
> Why? You can set assert(0) to do one of several things, including calling your own custom handler. By default it'll tell you the file and line of the failure. I use assert(0) all the time as a debugging tool.
> ...

AFAIU the default druntime ships with `-release`.

> 
>> If there were a flag to enable null checks on any nullable pointer dereference, I would enable it immediately in all my projects.
> 
> I would be very curious how much larger and slower the resulting executables would be.
> 
> P.S. Why would CPU designers put null checks in the hardware if it was so useless?
> ...

It's not useless for everything, it's just useless for some things. Your anecdotes are unfortunately not universal and also things changed.


> P.P.S. `scope(failure)` is not intended for catching `Error`, although it does. There's a PR to fix it to only catch `Exception`, but that failed in the test suite. 

This is heart-shattering news. If this ever gets pulled without any option to revert I will no longer be able to justify using the official D releases for anything.
2 days ago
On 5/4/25 03:36, Walter Bright wrote:
> 
> The D version with the small DMD compiler:
> 
> ```
> int main()
> {
>      int* p = null;
>      ref int q = *p;
>      return q;
> }
> ```
> ```
> dmd x.d -O
> x.d(6): Error: null dereference in function _Dmain
> ```

I don't understand why you think demonstrating something with your own backend disproves a point about the world at large excluding your own backend. You did also run a test with GCC for the C version, but UB is simply not reliable.

You can go to d.godbolt.org, select "ldc latest CI", pass arguments "-O" and confirm for yourself that `_Dmain` is simply this:


```
_Dmain:
        ret
```

This is in accordance with the C standard.
2 days ago
On 5/3/2025 8:55 PM, Timon Gehr wrote:
> I don't understand why you think demonstrating something with your own backend disproves a point about the world at large excluding your own backend. You did also run a test with GCC for the C version, but UB is simply not reliable.

It shows that dmd *does* do data flow analysis that can statically detect null dereferences at compile time.

> You can go to d.godbolt.org, select "ldc latest CI", pass arguments "-O" and confirm for yourself that `_Dmain` is simply this:
> This is in accordance with the C standard.

ldc should disable that particular behavior, as it has negative utility. No wonder you're having difficulties with it.
2 days ago
On 5/4/25 06:36, Walter Bright wrote:
> 
>> You can go to d.godbolt.org, select "ldc latest CI", pass arguments "- O" and confirm for yourself that `_Dmain` is simply this:
>> This is in accordance with the C standard.
> 
> ldc should disable that particular behavior, as it has negative utility. No wonder you're having difficulties with it.

For the record, even if my application would not run very sluggishly when compiled with DMD, in this particular case it does not matter how accurate the segfault location is as I am not getting any information in the first place.

My understanding is that null pointer dereference being UB is a widespread assumption in the LLVM and GCC optimizers. Simply "disabling the behavior" is not practical.

Perhaps I could add `-fsanitize=null` to add null checks, but that would not really solve the main problem as it is not integrated with D scope guards. It also would not lead to CPU virtual memory built-in features being used for null checks.
2 days ago
On 5/3/2025 8:48 PM, Timon Gehr wrote:
> This is not only the case in C. Translate the example to D and compile with LDC with optimizations enabled. You will most likely see zero segfaults.

I replied to that in the other post.


> On an somewhat related note: in D you can currently create invalid non-null pointers in `@safe` code by exploiting the lack of checks.

D relies on the null dereference not being "optimized away". Maybe there's a switch for that on ldc, there should be if there isn't one.


> All I am seeing is that sometimes user's programs close and they are not giving me any further information because they don't really know how to use a computer and it happens infrequently enough that they don't care enough to try to figure out how to help me out. This may be a null dereference, or it could be something else. Maybe it even prints a stack trace on Windows, but the console window closes on them immediately or something. Who knows what actually is happening. I am not seeing this on my machine...

Such can be pretty tough to figure out. It could be a bug in your code, it could be a codegen bug, it could a bug in the library, etc. Maybe try sending them an unoptimized build? maybe add a signal handler that writes the diagnostic information to a file before exiting?


> I can probably try to set up some other way to debug that then gets flagged as malware even harder. However, none of this is necessary in the first place. If my `scope(exit)` ran they would be able to immediately provide all the info I need, as was the case with a couple thrown Exceptions and Errors in the past.

I thought you wrote that scope(exit) was catching null pointer exceptions.


> Well congrats, seems your bugs were reproducible on your own machine and at a frequency of more than once every couple of months.

I've also had to remotely debug buildkite failures. I've done it by adding strategic printfs to generate a log of the path through the compiler.

I've also done things by trying out the problem on another operating system, or different codegen switches. This can give helpful information. Anything you can do to vary the environment to see what might trigger it.

> I don't even know if the crashes are my fault or not. Maybe they even happen in some dependency, which I would know if D's error handling was more useful.

I find the stack trace useful enough.


> AFAIU the default druntime ships with `-release`.

See the -checkaction= switch?


> It's not useless for everything, it's just useless for some things. Your anecdotes are unfortunately not universal and also things changed.

It only has to work on platforms D supports, not every computer ever made. C has made some tragic decisions to support every computer ever made, resulting in some awful "portable" code that doesn't actually work on those nutburger machines. For one sad case, 16 bit Windows was written with portability to 32 bits in mind. When 32 bit machines finally arrived, Microsoft had a big problem that despite writing "portable" code, they'd made all the wrong decisions about what parts need to be adapted.

I've seen another case where char sizes were 32 bits. Of course, no C program would work on it without being recoded, in spite of all the portability misfeatures in C. C character sets aren't portable either, as RADIX50 won't work. Nobody ever used trigraphs and digraphs outside of test suites and Obfuscated C Contest entries.

>> P.P.S. `scope(failure)` is not intended for catching `Error`, although it does. There's a PR to fix it to only catch `Exception`, but that failed in the test suite. 
> 
> This is heart-shattering news. If this ever gets pulled without any option to revert I will no longer be able to justify using the official D releases for anything.

I had no idea anyone was using it for that purpose. I guess I can't change that :-)

2 days ago
On 5/4/25 07:00, Walter Bright wrote:
> On 5/3/2025 8:48 PM, Timon Gehr wrote:
>> This is not only the case in C. Translate the example to D and compile with LDC with optimizations enabled. You will most likely see zero segfaults.
> 
> I replied to that in the other post.
> ...

You replied by moving the goalposts and denying reality.

> 
>> On an somewhat related note: in D you can currently create invalid non-null pointers in `@safe` code by exploiting the lack of checks.
> 
> D relies on the null dereference not being "optimized away". Maybe there's a switch for that on ldc, there should be if there isn't one.
> ...

This is not how modern backends work. There are good reasons why you can't do a segfault in e.g. Rust without exploiting type system bugs.

> 
>> All I am seeing is that sometimes user's programs close and they are not giving me any further information because they don't really know how to use a computer and it happens infrequently enough that they don't care enough to try to figure out how to help me out. This may be a null dereference, or it could be something else. Maybe it even prints a stack trace on Windows, but the console window closes on them immediately or something. Who knows what actually is happening. I am not seeing this on my machine...
> 
> Such can be pretty tough to figure out. It could be a bug in your code, it could be a codegen bug, it could a bug in the library, etc. Maybe try sending them an unoptimized build? maybe add a signal handler that writes the diagnostic information to a file before exiting?
> ...

They cannot use an unoptimized build, it's too slow. This is not a patient programmer, this is a normie user, and the issue is vastly less important to them than it is to me, because I have higher standards of my software working flawlessly than normie users do.

> 
>> I can probably try to set up some other way to debug that then gets flagged as malware even harder. However, none of this is necessary in the first place. If my `scope(exit)` ran they would be able to immediately provide all the info I need, as was the case with a couple thrown Exceptions and Errors in the past.
> 
> I thought you wrote that scope(exit) was catching null pointer exceptions.
> ...

I don't think it does.

> 
>> Well congrats, seems your bugs were reproducible on your own machine and at a frequency of more than once every couple of months.
> 
> I've also had to remotely debug buildkite failures. I've done it by adding strategic printfs to generate a log of the path through the compiler.
> ...

Seems you ran the program multiple times and got the same failure. Please understand this is exactly what I am trying to achieve by saving the pertinent information in `scope(exit)` so it can be sent to me, in order for me to be able to hopefully reproduce the issue.

> I've also done things by trying out the problem on another operating system, or different codegen switches. This can give helpful information. Anything you can do to vary the environment to see what might trigger it.
> ...

Sure. But this is not a batch program, this is a graphical application that communicates over the network.

>> I don't even know if the crashes are my fault or not. Maybe they even happen in some dependency, which I would know if D's error handling was more useful.
> 
> I find the stack trace useful enough.
> ...

I am not getting any stack trace. The only info I get is "the program closed randomly on me". And anyway, if I have the choice between a cryptic stack trace and a full execution history leading to the error, I will choose the full execution history every time. Why is this so hard to understand? The language taking control away from me is nothing but frustrating, for imagined benefits that are pure dogma and not important to me at all.

I will perhaps be able to find some workaround that will allow me to get some info, but this issue existing in the first place is not something that is necessary. It is pure, frustrating, friction. A huge amount of wasted time that could have been spent doing something productive instead. A competitive disadvantage.

> 
>> AFAIU the default druntime ships with `-release`.
> 
> See the -checkaction= switch?
> ...

I think that won't affect the default druntime, I would need to build a custom one. I can do this but it complicates the build instructions for other people and it really feels like fighting the language.

> 
>> It's not useless for everything, it's just useless for some things. Your anecdotes are unfortunately not universal and also things changed.
> 
> It only has to work on platforms D supports, not every computer ever made.

LLVM bytecode is a supported platform. In fact, it is likely the primary IR D user's released programs will go through and it's not like this is defined behavior in GDC. This cannot be dismissed with a "C is crazy". We are using toolchains designed for C.

> ...
> 
>>> P.P.S. `scope(failure)` is not intended for catching `Error`, although it does. There's a PR to fix it to only catch `Exception`, but that failed in the test suite. 
>>
>> This is heart-shattering news. If this ever gets pulled without any option to revert I will no longer be able to justify using the official D releases for anything.
> 
> I had no idea anyone was using it for that purpose. I guess I can't change that :-)
> 

The whole: "We really need to make sure people cannot get any information out of a failing process unless they run the program from the command line and then we need to make sure nothing but a stack trace will escape unless we are running in a debugger" is simply not workable. I don't understand what you are chasing in this instance, but it is not utility. It almost seems like your only experience with remote crashes is those that happen frequently and repeatably, and within a terminal.
2 days ago
On 04/05/2025 5:40 PM, Timon Gehr wrote:
> The whole: "We really need to make sure people cannot get any information out of a failing process unless they run the program from the command line and then we need to make sure nothing but a stack trace will escape unless we are running in a debugger" is simply not workable. I don't understand what you are chasing in this instance, but it is not utility. It almost seems like your only experience with remote crashes is those that happen frequently and repeatably, and within a terminal.

This may be where you need to emphasize that you are talking from a users perspective, not a language developer.

Based upon your other comments, its resulting in some severe problems for you.