July 25
On 25/07/2024 12:12 PM, Walter Bright wrote:
> I talked with a person who has more in depth knowledge about it.
> 
> The null pointer came from reading a file and treating it as a data structure. The file was unexpectedly full of zeros.
> 
> Any language that allows casting a buffer read from the disk to a pointer would fail. This includes any language with unsafe blocks, or uses a FFI to get around the language protections.

You have almost got it.

It is not any language with that capability, it is any language with that capability that does not force you to check for null before the dereference.

July 25
On 25/07/2024 2:24 PM, Richard (Rikki) Andrew Cattermole wrote:
> On 25/07/2024 12:12 PM, Walter Bright wrote:
>> I talked with a person who has more in depth knowledge about it.
>>
>> The null pointer came from reading a file and treating it as a data structure. The file was unexpectedly full of zeros.
>>
>> Any language that allows casting a buffer read from the disk to a pointer would fail. This includes any language with unsafe blocks, or uses a FFI to get around the language protections.
> 
> You have almost got it.
> 
> It is not any language with that capability, it is any language with that capability that does not force you to check for null before the dereference.

Note: you do not need to leave safety to have this exact situation to occur.

```d
void func(int* ptr) @safe {
	int v = *ptr; // BOOM!
}
```

The problem isn't going into unsafe code, its that you made an assumption that either is the reality, or is never correct and is guaranteed to error out.
July 24
On 7/24/2024 7:24 PM, Richard (Rikki) Andrew Cattermole wrote:
> You have almost got it.
> 
> It is not any language with that capability, it is any language with that capability that does not force you to check for null before the dereference.

Checking for null won't save you, as there are 4 billion addresses for a 32 bit pointer, some will seg fault, and some will overwrite your program's data with garbage.

July 24
On 7/24/2024 7:32 PM, Richard (Rikki) Andrew Cattermole wrote:
> The problem isn't going into unsafe code, its that you made an assumption that either is the reality, or is never correct and is guaranteed to error out.

A null pointer seg fault is not an unsafe thing.

Memory unsafety is about memory corruption. A seg fault is not memory corruption.

Consider:

```
int* p;
*p = 4; // seg fault, program terminates
...
int* q;
assert(9); // program terminates
```

Both of these cases are memory safe. Both default to summarily terminating the program. Both can have handlers installed to "recover" and do something nice.

Before there was hardware memory protection, with DOS writing through a null pointer meant scrambling the operating system, leading to all kinds of horrible things like scrambling your hard disk as well. Imagine trying to find what went wrong. It's like your house burned to ashes and now you have to figure out the source of the blaze.

When protected mode became available, it was a miracle. Having the hardware check *every* pointer *every* time for validity was a huge advancement. And the code still runs at full speed!

Even better, you could get a stack trace pointing at the bug in your code.

I used to spend *weeks* trying to find memory corruption bugs. Today it's a few seconds. Seg faults are a great gift!
July 25
I occasionally peruse lists of software bugs that are sources of security vulnerabilities. The #1 issue, by far, is always array out of bounds errors. Null pointer seg faults don't even make the list - because they are not security vulnerabilities.

D nails the array overflow prevention.

July 25
On Thursday, 25 July 2024 at 07:47:11 UTC, Walter Bright wrote:

> D nails the array overflow prevention.

Sure. But D still allows corruption going unnoticed:

mems.d
```
import std.stdio;
auto foo ()
{
   int [4] f = [1, 2, 3, 4];
   int [] q = f;
   return q;
}

void main ()
{
   auto s = foo;
   writeln (s);
}
```

$ dmd mems.d
$ ./mems
[4, 0, -10016, 32767]

It requires @safe annotation PLUS -preview=dip1000 in order for dmd to
catch this (Error: scope variable `q` may not be returned).

July 25

On Wednesday, 24 July 2024 at 16:47:26 UTC, Abdulhaq wrote:

>

On Wednesday, 24 July 2024 at 14:03:31 UTC, Kagamin wrote:

>

I don't think computers can do nothing. When they are turned off they do nothing indeed, but something must turn off the computer for it to become turned off, and that's an involved procedure that doesn't count as "do nothing". So it's not clear what you mean, what CPU instruction should execute to "do nothing"?

https://en.wikipedia.org/wiki/NOP_(code)

So a safe program must execute nop instruction in an infinite loop? How is it better than crashing?

July 26
On 25/07/2024 6:48 PM, Walter Bright wrote:
> I used to spend /weeks/ trying to find memory corruption bugs. Today it's a few seconds. Seg faults are a great gift!

Right up until they bring down 8.5 million computers world wide, and impact almost everyone on the planet.

We got lucky this time, that there is an "easy" fix to get these machines working again.

It does not matter that there probably won't be a CVE from this outage.

Fact is, some data was sourced, that was not validated before access that could have been caught before a world wide outage that took out _hospitals_.

July 25
On Fri, Jul 26, 2024 at 06:42:55AM +1200, Richard (Rikki) Andrew Cattermole via Digitalmars-d wrote:
> On 25/07/2024 6:48 PM, Walter Bright wrote:
> > I used to spend /weeks/ trying to find memory corruption bugs. Today it's a few seconds. Seg faults are a great gift!
> 
> Right up until they bring down 8.5 million computers world wide, and impact almost everyone on the planet.
> 
> We got lucky this time, that there is an "easy" fix to get these machines working again.
> 
> It does not matter that there probably won't be a CVE from this outage.
> 
> Fact is, some data was sourced, that was not validated before access that could have been caught before a world wide outage that took out _hospitals_.

Fact is, reading in a file and casting the contents into a pointer without prior verification is a very unwise thing to do.  No amount of language features will save you from the consequences.  Somebody has to write the code to verify the data before acting on it.  If nobody wrote the verification code, whether program segfaults, continues silently and corrupts data, formats the harddrive, or launches nuclear missiles, is really just secondary.

Also, the fact that one tiny flaw like this can bring down half the computers across the whole world is another major lesson that people don't seem to be learning from.  Basically, the OS is a single point of failure; when it fails, you're up the creek without a paddle.  Maybe it's time somebody pulled a Walter to design fault-resistant redundant OS instances, airplane-style.  :-P

At the very least, OS upgrades should be handled much more conservatively than they are right now.  For example, the patched OS should be something separate from the running OS; it should be brought up separately before the old OS retires itself and hands over control. Easier said than done, of course, but given what has happened, people really need to be thinking about this seriously.

Another factor is, push updates are evil. What really ought to have happened is that an update notification should have been sent, and the admins should have approved it before it was actually installed. (After testing the patch in a controlled environment, before pushing it out to live systems.) But I'm probably barking up the wrong tree here... people these days are all gung-ho about fully unattended upgrades and fully automated everything, who needs anybody to check the sanity of an upgrade.  Well, we're staring at the consequences of this attitude right now.


T

-- 
Век живи - век учись. А дураком помрёшь.
July 26
On 26/07/2024 7:00 AM, H. S. Teoh wrote:
> Another factor is, push updates are evil. What really ought to have
> happened is that an update notification should have been sent, and the
> admins should have approved it before it was actually installed. (After
> testing the patch in a controlled environment, before pushing it out to
> live systems.) But I'm probably barking up the wrong tree here... people
> these days are all gung-ho about fully unattended upgrades and fully
> automated everything, who needs anybody to check the sanity of an
> upgrade.  Well, we're staring at the consequences of this attitude right
> now.

You are indeed barking up the wrong tree.

I am already convinced that there was multiple failures went on.

I was able to determine them just from the failures I was seeing on Twitter a few hours in.

But, I can't solve those.

I can however solve forcing a D user to check for nullability, and if that is the best that we can do, then that's all we can do.