April 15
On 15/04/2025 9:28 AM, Richard (Rikki) Andrew Cattermole wrote:
> On 15/04/2025 8:53 AM, Derek Fawcus wrote:
>> In trying to handle and recover from such things, you're up against a variant of Gödel's incompleteness theorem, and anyone offering a language which "solves" that is IMHO selling snake oil.
> 
> In my original post I proposed a three tier solution.
> 
> The read barriers are secondary to the language guarantees via type state analysis.
> 
> You need them for when using a fast DFA engine, that doesn't do full control flow graph analysis and ignores a variable when it cannot analyze it.
> 
> But if you are ok with having a bit of pain in terms of what can be modeled and can accept a bit of slowness, you won't need the read barriers.
> 
> Unfortunately not everyone will accept the slower DFA therefore it can't be on by default. I know this as it copies a couple of the perceived negative traits of DIP1000.
> 
> So yes we can solve this, but MT could still mess it up, hence the last option in the three solution; signals killing the process.

I should mention, like the assert handler you would have the ability to configure the read barrier to do whatever you want at runtime.

So if you prefer it to kill the process you can.

As to what the default would be? Idk.

The benefit of having it is that we can likely have a stack trace, where there might not be one otherwise.

April 15

On Monday, 14 April 2025 at 15:42:07 UTC, Derek Fawcus wrote:

>

On Monday, 14 April 2025 at 15:24:37 UTC, Richard (Rikki) Andrew Cattermole wrote:

>

It is important to note that a task isn't always a process. But once an event like null dereference occurs that task must die.

It is not the dereference which is the issue, that is the downstream symptom of an earlier problem. If that reference is never supposed to be null, then the program is already in a non deterministic even without the crash.

This is the exact problem. The solution proposed here just doesn't understand what the actual problem is. Null dereferences, and index out-of-bounds are programming errors. You need to fix them in the program, not recover and hope for the best.

Trying to recover is the equivalent of a compiler resolving a syntax ambiguity with a random number generator.

Null dereference?

  1. Is it because I trusted a user value? => validate user input, rebuild, redeploy
  2. Is it because I forgot to initialize something? => initialize it, rebuild, redeploy
  3. Is it because I forgot to validate something? => do the validation properly, fix whatever it was sending in invalid data, rebuild, redeploy
  4. Is it something else? => thank you program, for crashing instead of corrupting everything. Now, time to find the memory corruption somewhere.

Similar flow chart for out-of-bounds errors.

-Steve

April 15

On Tuesday, 15 April 2025 at 02:48:42 UTC, Steven Schveighoffer wrote:

>

On Monday, 14 April 2025 at 15:42:07 UTC, Derek Fawcus wrote:

>

On Monday, 14 April 2025 at 15:24:37 UTC, Richard (Rikki) Andrew Cattermole wrote:

>

It is important to note that a task isn't always a process. But once an event like null dereference occurs that task must die.

It is not the dereference which is the issue, that is the downstream symptom of an earlier problem. If that reference is never supposed to be null, then the program is already in a non deterministic even without the crash.

This is the exact problem. The solution proposed here just doesn't understand what the actual problem is. Null dereferences, and index out-of-bounds are programming errors. You need to fix them in the program, not recover and hope for the best.

This simply is not manageable. Sometimes it is better to have semi operating systems rather ones that don't work at all because a minor thing in a tertiary module causes npe, while a fix is in progress.

April 15

On Tuesday, 15 April 2025 at 07:23:14 UTC, Alexandru Ermicioi wrote:

>

On Tuesday, 15 April 2025 at 02:48:42 UTC, Steven Schveighoffer wrote:

>

On Monday, 14 April 2025 at 15:42:07 UTC, Derek Fawcus wrote:

>

On Monday, 14 April 2025 at 15:24:37 UTC, Richard (Rikki) Andrew Cattermole wrote:

>

It is important to note that a task isn't always a process. But once an event like null dereference occurs that task must die.

It is not the dereference which is the issue, that is the downstream symptom of an earlier problem. If that reference is never supposed to be null, then the program is already in a non deterministic even without the crash.

This is the exact problem. The solution proposed here just doesn't understand what the actual problem is. Null dereferences, and index out-of-bounds are programming errors. You need to fix them in the program, not recover and hope for the best.

This simply is not manageable. Sometimes it is better to have semi operating systems rather ones that don't work at all because a minor thing in a tertiary module causes npe, while a fix is in progress.

Au contraire!

That's exactly why today we have:

  • kernels (or microkernels!)
  • processes living in userland
  • threads living in processes
  • coroutine (or similar stuff, whatever variant and name they take) living on thread
  • and least but not last VM, plenty of them.

I don't see a real use case for trying to recover a process from UB at all: go down the list and choose another layer.

/P

April 16
On Monday, 14 April 2025 at 14:22:09 UTC, Richard (Rikki) Andrew Cattermole wrote:
> On 15/04/2025 1:51 AM, Atila Neves wrote:
>> On Saturday, 12 April 2025 at 23:11:41 UTC, Richard (Rikki) Andrew Cattermole wrote:
>>> I've been looking once again at having an exception being thrown on null pointer dereferencing.
>>> However the following can be extended to other hardware level exceptions.
>>>
>>> [...]
>> 
>> I would like to know why one would want this.
>
> Imagine you have a web server that is handling 50k requests per second.
>
> It makes you $1 million dollars a day.
>
> In it, you accidentally have some bad business logic that results in a null dereference or indexing a slice out of bounds.

Possible mitigations:

* Use `sigaction` to catch `SIGSEGV` and throw an exception in the handler.
* Use a nullable/option type.
* Address sanitizer.
* Fuzzing the server (which one should do anyway).

How is out of bounds access related to null pointers throwing exceptions?

> How likely are you to keep using D, or willing to talk about using D positively afterwards?

People write servers in C and C++ too.

April 16
On 16/04/2025 8:18 PM, Atila Neves wrote:
> On Monday, 14 April 2025 at 14:22:09 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> On 15/04/2025 1:51 AM, Atila Neves wrote:
>>> On Saturday, 12 April 2025 at 23:11:41 UTC, Richard (Rikki) Andrew Cattermole wrote:
>>>> I've been looking once again at having an exception being thrown on null pointer dereferencing.
>>>> However the following can be extended to other hardware level exceptions.
>>>>
>>>> [...]
>>>
>>> I would like to know why one would want this.
>>
>> Imagine you have a web server that is handling 50k requests per second.
>>
>> It makes you $1 million dollars a day.
>>
>> In it, you accidentally have some bad business logic that results in a null dereference or indexing a slice out of bounds.
> 
> Possible mitigations:
> 
> * Use `sigaction` to catch `SIGSEGV` and throw an exception in the handler.

The only thing you can reliably do on segfault is to kill the process.

From what I've read, they get awfully iffy, even with the workarounds.

And that's just for Posix, Windows is an entirely different kettle of fish and is designed around exception handling instead, which dmd doesn't support!

> * Use a nullable/option type.

While valid to box pointers, we would then need to disallow them in business logic functions.

Very invasive, not my preference.

> * Address sanitizer.

Slow at runtime, which kinda defeats the purpose.

> * Fuzzing the server (which one should do anyway).

Absolutely, but there is too much state to kinda guarantee that it covers everything. And very few people will get it to that level (after all, people need significant amount of training to do it successfully).

> How is out of bounds access related to null pointers throwing exceptions?

Out of bounds on a slice uses a read barrier to throw an exception.

A read barrier to prevent dereferencing a null pointer is exactly the same concept.

One is 0 or 1.

Second is 0 or N.

>> How likely are you to keep using D, or willing to talk about using D positively afterwards?
> 
> People write servers in C and C++ too

Yes they do, just like they do in D.

But they have something we do not have, a ton of static analysis.

Check, select professional developers: https://survey.stackoverflow.co/2024/technology#most-popular-technologies

C#, Java, Python, and JavaScript all out rank C and C++.

I'm not going to say they each have a good solution to the problem, but they each have a solution that isn't just kill the process.

The end of this foray into read barriers may be the conclusion that we cannot use them for this. What worries me is that I don't have the evidence to show that it won't work, and dismissing it without evidence does mean that we'll be forced to recommend full CFG DFA which is slow.

If it can work, using the read barriers to fill in the gap of what a faster DFA can offer would be a much better user experience. At least as a default.

April 16
I should follow on from this to explain why I care so much.

See Developer type: https://survey.stackoverflow.co/2024/developer-profile

Back-end (which is what D could excel at) is second on the list, right next to full-stack which is first.

This is the biggest area of growth possible for D, and we're missing key parts that is not only expected, but needed to target the largest target audience possible.

April 16
On Wednesday, 16 April 2025 at 08:49:39 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> How is out of bounds access related to null pointers throwing exceptions?
>
> Out of bounds on a slice uses a read barrier to throw an exception.
>
> A read barrier to prevent dereferencing a null pointer is exactly the same concept.
>
> One is 0 or 1.
>
> Second is 0 or N.

Exactly what are you referring to by "read barrier"?

To me it has a specific technical meaning, related to memory access and if/when one access may pass another.  It has nothing to do with exceptions, but rather the details of how a CPU architecture approaches superscalar memory accesses (reads and/or writes), and how they are (or may be) re-ordered.

However I would not classify the bounds check performed as part of accessing a slice as a "read barrier", rather it is a manual range check.

So by "read barrier" for null's do you simply mean having the compile generate a "compare to zero" instruction, followed by a "jump if zero" to some error path?

If so, then while it may catch errors (and I have no objection to optionally generating such null checks); it is not IMO a means of error recovery - simply another means of forcing a crash.
April 16
On Wednesday, 16 April 2025 at 08:49:39 UTC, Richard (Rikki) Andrew Cattermole wrote:
> On 16/04/2025 8:18 PM, Atila Neves wrote:
>
>> * Use a nullable/option type.
>
> While valid to box pointers, we would then need to disallow them in business logic functions.

I'm not sure what you have in mind, what I have in mind is something like this:

  https://discourse.llvm.org/t/rfc-nullability-qualifiers/35672
  https://clang.llvm.org/docs/analyzer/developer-docs/nullability.html

The checks here are performed in a distinct SA tool, not in the main compiler.  However it catches the main erroneous cases - first two listed checks of second link:

> If a pointer p has a nullable annotation and no explicit null check or assert, we should warn in the following cases:
>
>-    p gets implicitly converted into nonnull pointer, for example, we are passing it to a function that takes a nonnull parameter.
>
>-    p gets dereferenced

Given how individual variable / fields have to be annotated, it probably does not need complete DFA, but only function local analysis for loads/stores/compares.
April 17
On 16/04/2025 11:38 PM, Derek Fawcus wrote:
> On Wednesday, 16 April 2025 at 08:49:39 UTC, Richard (Rikki) Andrew Cattermole wrote:
>>> How is out of bounds access related to null pointers throwing exceptions?
>>
>> Out of bounds on a slice uses a read barrier to throw an exception.
>>
>> A read barrier to prevent dereferencing a null pointer is exactly the same concept.
>>
>> One is 0 or 1.
>>
>> Second is 0 or N.
> 
> Exactly what are you referring to by "read barrier"?
> 
> To me it has a specific technical meaning, related to memory access and if/when one access may pass another.  It has nothing to do with exceptions, but rather the details of how a CPU architecture approaches superscalar memory accesses (reads and/or writes), and how they are (or may be) re-ordered.
> 
> However I would not classify the bounds check performed as part of accessing a slice as a "read barrier", rather it is a manual range check.
> 
> So by "read barrier" for null's do you simply mean having the compile generate a "compare to zero" instruction, followed by a "jump if zero" to some error path?

Yes.

It would then call a compiler hook just like array bounds checks do.

> If so, then while it may catch errors (and I have no objection to optionally generating such null checks); it is not IMO a means of error recovery - simply another means of forcing a crash.

This is why I think its important that we make this configurable via a global function pointer. Like we do for asserts.

It allows people to configure what it does rather than pick for them.