1 day ago

On Saturday, 12 April 2025 at 23:11:41 UTC, Richard (Rikki) Andrew Cattermole wrote:

>

I've been looking once again at having an exception being thrown on null pointer dereferencing.
However the following can be extended to other hardware level exceptions.

[...]

I would like to know why one would want this.

1 day ago
On 15/04/2025 1:51 AM, Atila Neves wrote:
> On Saturday, 12 April 2025 at 23:11:41 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> I've been looking once again at having an exception being thrown on null pointer dereferencing.
>> However the following can be extended to other hardware level exceptions.
>>
>> [...]
> 
> I would like to know why one would want this.

Imagine you have a web server that is handling 50k requests per second.

It makes you $1 million dollars a day.

In it, you accidentally have some bad business logic that results in a null dereference or indexing a slice out of bounds.

It kills the entire server losing you potentially the full 1 million dollars before you can fix it.

How likely are you to keep using D, or willing to talk about using D positively afterwards?

ASP.net guarantees that this will kill the task and will give the right response code. No process death.

1 day ago
On 14.04.25 16:22, Richard (Rikki) Andrew Cattermole wrote:
> On 15/04/2025 1:51 AM, Atila Neves wrote:
>>
>> I would like to know why one would want this.
> 
> Imagine you have a web server that is handling 50k requests per second.
> 
> It makes you $1 million dollars a day.
> 
> In it, you accidentally have some bad business logic that results in a null dereference or indexing a slice out of bounds.
> 
> It kills the entire server losing you potentially the full 1 million dollars before you can fix it.
> 
> How likely are you to keep using D, or willing to talk about using D positively afterwards?
> 
> ASP.net guarantees that this will kill the task and will give the right response code. No process death.
> 

I won't get into the merits of the feature itself, but I have to say that this example is poorly chosen, to say the least. In fact, it looks to me like a case of "when you only have a hammer, everything looks like a nail": not everything should be handled by the application itself.

As somebody coming rather from the "ops" side of "devops", let me tell you that there is a wide range of tools that you should be using **on top of your application** if you have an app that makes you 1M$ a day, including but not restricted to:

* A monitoring process to make sure the server is running (and healthy). Among this process's tasks are making sure that in case of a failure the main process is fully stopped, killing any leftover tasks, removing lock files, ensuring data sanity, etc., and then restarting the main server again.
* A HA system routing queries to a pool of several servers that are regularly polled for health status, assuming that the failure happens seldom enough that it's very unlikely to affect several backend servers at the same time.
* Some meatbag on-call 24/7 (or even on-site) who can at the very least restart the affected server (including the hardware) if it comes to that.

I mean, a service can fail for a number of reasons, including hardware issues, among which dereferencing a null pointer should be quite low in the scale of probabilities.

Having a 1M$/day operation depend on your application's continued run after dereferencing a null pointer would seem to me... rather risky and sort-sighted.

On top of that, there's the "small" issue that you can't really be sure what state the application has been left in. I certainly wouldn't want to risk any silent data corruption and would rather kill the process ASAP to start it again from a known good state.

Again, I'm not arguing for or against the feature itself, but I just think this example doesn't do it any help.
1 day ago
On 15/04/2025 2:55 AM, Arafel wrote:
> I mean, a service can fail for a number of reasons, including hardware issues, among which dereferencing a null pointer should be quite low in the scale of probabilities.

It isn't a low probability.

The reason why all these application VM languages have been introducing nullability guarantees, is because it has been a plague of problems.

We have nothing currently.

“I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.”

https://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare/

> Having a 1M$/day operation depend on your application's continued run after dereferencing a null pointer would seem to me... rather risky and sort-sighted.

You need to read my initial post. I concluded that once an instruction executes the dereference it is in fact dead. We are not in disagreement in terms of this.

This is why the static analysis and read barrier is so important, they catch it before it ever happens. The program isn't corrupted by the signal at this point in time.

Before that, thanks to ``@safe``, we can assume it is in a valid state, and its just business logic that is wrong. Yes people will disagree with me on that, but the blame is purely on the non-@safe code which should be getting thoroughly vetted for this kind of thing (and even then I still want the read barriers and type state analysis to kick in, because this can never be correct).

There is a reason why my DIP makes stackless coroutines default to @safe, not just allow you to default to @system like everything else.

> On top of that, there's the "small" issue that you can't really be sure what state the application has been left in. I certainly wouldn't want to risk any silent data corruption and would rather kill the process ASAP to start it again from a known good state.

By doing this you have killed all other tasks, those tasks make you lose money. Throw in say a scrapper and you could still be down all day or more. It is an entirely unnecessary downtime with accepted widely used solutions. It would be bad engineering to ignore this.

It is important to note that a task isn't always a process. But once an event like null dereference occurs that task must die.

If anything prevents that task from cleaning up, then yes the process dies.

1 day ago
On 14/04/2025 9:49 PM, a11e99z wrote:
> On Saturday, 12 April 2025 at 23:11:41 UTC, Richard (Rikki) Andrew Cattermole wrote:
>> I've been looking once again at having an exception being thrown on null pointer dereferencing.
> 
> null pointer in x86 is any pointer less than 0x00010000

The null page right, and how exactly did you get access to a value that isn't 0 or a valid pointer?

Program corruption. Which should only be possible in non-@safe code.

> also what about successful dereferencing 0x23a7b63c41704h827? its depends: reserved this memory by process, committed, mmaped etc

How did you get this value? Program corruption.

> failure for any "wrong" pointer should be same as for 0x0000....000 (pure NULL)

Null (0) is a special value in D, these other values are simply assumed to be valid. Of course its not possible to get these other values without calling out of @safe code where program corruption can occur.

> imo need to dig into .net source and grab their option

.net is an application VM, and for all intents and purposes they will be injecting read barriers before each null dereference.

They also have strong guarantees that you cannot have a invalid to a pointer, pointer. It must be null or point to a valid object.

It isn't as simple as copying them.

Plus they have the type state analysis as part of the language to handle nullability!

1 day ago
On 14/04/2025 10:01 PM, a11e99z wrote:
> On Monday, 14 April 2025 at 09:49:27 UTC, a11e99z wrote:
>> On Saturday, 12 April 2025 at 23:11:41 UTC, Richard (Rikki) Andrew Cattermole wrote:
>>> I've been looking once again at having an exception being thrown on null pointer dereferencing.
>>
>> null pointer in x86 is any pointer less than 0x00010000
>>
>> imo need to dig into .net source and grab their option
> 
> almost same problem is misaligned access:
> x86/x64 allows this except few instructions
> ARM - prohibit it (as I know)

The only way for this to occur in @safe code is program corruption.

> it seems there are no other options except to handle kernel signals and WinSEH

They are both geared towards killing the process, from what I've read its really not a good idea to throw an exception using them. Even if we could rely upon the signal handler being what we think it is.

We do not support the MSVC exception mechanism for Windows 64bit of dmd, so even if we wanted to do this, we cannot.

1 day ago
On Monday, 14 April 2025 at 15:24:37 UTC, Richard (Rikki) Andrew Cattermole wrote:
>
> It is important to note that a task isn't always a process. But once an event like null dereference occurs that task must die.

It is not the dereference which is the issue, that is the downstream symptom of an earlier problem.  If that reference is never supposed to be null, then the program is already in a non deterministic even without the crash.

The crash is what allows that bad state to be fixed.  Simply limping along (aoiding the deref) is sweeping the issue under the rug.  One doesn't know what else within the complete system is at fault.

The @safe annotation is not yet sufficient to ensure that the rest of the system is in a valid state.

Fail fast, and design the complete architecture with redundant HA instance can take the (or a sub-set of the) load until the system can regain its redundant operating condition.That is exactly what we do with routers.
1 day ago
On 15/04/2025 3:42 AM, Derek Fawcus wrote:
> On Monday, 14 April 2025 at 15:24:37 UTC, Richard (Rikki) Andrew Cattermole wrote:
>>
>> It is important to note that a task isn't always a process. But once an event like null dereference occurs that task must die.
> 
> It is not the dereference which is the issue, that is the downstream symptom of an earlier problem.  If that reference is never supposed to be null, then the program is already in a non deterministic even without the crash.

You are not the first to say this, and its indicative of not understanding the scenario.

Coroutines are used for business logic.

They are supposed to have guarantees where they can always be cleaned up on framework level exceptional events. That includes attempts at null dereferencing or out of bounds access of slices.

They should never have the ability to corrupt the entire program. They need to be @safe.

If @safe allows program corruption, then it needs fixing.

If you call @trusted code that isn't shipped with the compiler, that isn't our fault it wasn't vetted. But that is what it is exists for. So that you can do unsafe things and present a safe API.

Any other scenario than coroutines will result in a process death.

> The crash is what allows that bad state to be fixed.  Simply limping along (aoiding the deref) is sweeping the issue under the rug.  One doesn't know what else within the complete system is at fault.

Except it isn't the entire system that could be bad.

Its one single threaded business logic laden task. It is not library or framework code that is bad.

A single piece of business logic, that was written likely by a graduate level skilled person, failed to account for something.

This is not the same thing as the entire program being corrupt.

If some kind of coroutine isn't in use, the program still dies, just like today.

> The @safe annotation is not yet sufficient to ensure that the rest of the system is in a valid state.

Please elaborate.

If @safe has a bug, it needs solving.

> Fail fast, and design the complete architecture with redundant HA instance can take the (or a sub-set of the) load until the system can regain its redundant operating condition.That is exactly what we do with routers.

While all of those is within my recommendations generally, it does not cover this particular scenario.

This is far too common of an issue, and within business logic which will commonly be found inside a coroutine (including Fiber), it should not be bringing down the entire process.

ASP.net thanks to .net offers this guarantee for a reason.

1 day ago
On Monday, 14 April 2025 at 16:12:35 UTC, Richard (Rikki) Andrew Cattermole wrote:
> You are not the first to say this, and its indicative of not understanding the scenario.
>
> Coroutines are used for business logic.
>
> They are supposed to have guarantees where they can always be cleaned up on framework level exceptional events. That includes attempts at null dereferencing or out of bounds access of slices.
>
> They should never have the ability to corrupt the entire program. They need to be @safe.
>
> If @safe allows program corruption, then it needs fixing.

I'd have to suggest you're chasing something which can not be achieved.

It is always possible for the program to get in to a partial, or fully, unrecoverable state, where some portion of the system is inoperative, or operating incorrectly.

The only way to recover in that case is to restart the program. The reason the program can not recover is that there is a bug, where some portion is able to drive in to an unanticipated area, and there is not the correct logic to recover as the author did not think of that scenario.


I recently wrote a highly concurrent program in Go.  This was in CSP style, taking care only only access slices and maps from one goroutine, and not accidentally capturing free variables in lambda.  So this manually covered the "safety" escapes which the Rust folks like to point to in Go.  The "safety" provided being greater than D currently offers.

Also nil pointers are present in Go, and will usually crash the complete program.  One is able to catch panics if one desires, so being similar to your exception case.

There were many goroutines, and a bunch dynamically started and stopped which ran "business logic".

Despite that it was possible (due to some bugs of mine) to get the system state where things could not recover, or for some cases took 30 mins to recover.  A flaw could be detected (via outside behaviour), and could be reasoned through by analysing the log files.

However for the error which did not clear after 30 mins, there was no way for the system to come back in to full operational state without the program being restarted.

A similar situation would be achievable in a Rust program.

In trying to handle and recover from such things, you're up against a variant of Gödel's incompleteness theorem, and anyone offering a language which "solves" that is IMHO selling snake oil.
1 day ago
On 15/04/2025 8:53 AM, Derek Fawcus wrote:
> In trying to handle and recover from such things, you're up against a variant of Gödel's incompleteness theorem, and anyone offering a language which "solves" that is IMHO selling snake oil.

In my original post I proposed a three tier solution.

The read barriers are secondary to the language guarantees via type state analysis.

You need them for when using a fast DFA engine, that doesn't do full control flow graph analysis and ignores a variable when it cannot analyze it.

But if you are ok with having a bit of pain in terms of what can be modeled and can accept a bit of slowness, you won't need the read barriers.

Unfortunately not everyone will accept the slower DFA therefore it can't be on by default. I know this as it copies a couple of the perceived negative traits of DIP1000.

So yes we can solve this, but MT could still mess it up, hence the last option in the three solution; signals killing the process.