[Not really OT] Crowdstrike Analysis: It was a NULL pointer from the memory unsafe C++ language. (page 4)

On Friday, 19 July 2024 at 23:33:44 UTC, H. S. Teoh wrote: > It's 2024, and a NULL pointer brought down half the world's servers. > > Just gives you *so* much confidence in technology. :-D It's not a NULL pointer that did this, but rather a combination of * A NULL pointer * Poor testing that allowed the code to be pushed upstream * Bad Microsoft policies that lead to poorly tested code to be auto-installed via automatic updates In cases like this it's never just one small thing that causes Really Bad Things™, it's usually a combination of poor decisions. IMO, the only actually wrong thing in this whole situation is how shitty updates are forced onto everyone without proper testing periods

On Wed, Jul 24, 2024 at 04:27:15PM +0000, GrimMaple via Digitalmars-d wrote: > On Friday, 19 July 2024 at 23:33:44 UTC, H. S. Teoh wrote: > > It's 2024, and a NULL pointer brought down half the world's servers. > > > > Just gives you *so* much confidence in technology. :-D > > It's not a NULL pointer that did this, but rather a combination of > * A NULL pointer > * Poor testing that allowed the code to be pushed upstream > * Bad Microsoft policies that lead to poorly tested code to be > auto-installed via automatic updates > > In cases like this it's never just one small thing that causes Really Bad Things™, it's usually a combination of poor decisions. Which means that there must be a *lot* of poor decisions going around, enough for the wrong ones to line up coincidentally to cause a disastrous failure. Which just makes me all warm and fuzzy with confidence about the state of technology. :-P > IMO, the only actually wrong thing in this whole situation is how shitty updates are forced onto everyone without proper testing periods Or the fact that updates are pushed at all. Never been a fan of push updates. Or push anything, really. Binary blob pushes are the worst of them all. All it takes is for *somebody* to compromise the binary somewhere between the source and the user, and you're looking at half the world being compromised overnight. We're lucky that this buggy update was (very!) noticeable. It could have been a lot worse. Like a malicious backdoor that went unnoticed. In fact, it may have already happened, and we just haven't noticed it yet. Remember, you heard it here first. :-P T -- "I suspect the best way to deal with procrastination is to put off the procrastination itself until later. I've been meaning to try this, but haven't gotten around to it yet. " -- swr

I talked with a person who has more in depth knowledge about it. The null pointer came from reading a file and treating it as a data structure. The file was unexpectedly full of zeros. Any language that allows casting a buffer read from the disk to a pointer would fail. This includes any language with unsafe blocks, or uses a FFI to get around the language protections.

On 7/24/2024 9:27 AM, GrimMaple wrote: > In cases like this it's never just one small thing that causes Really Bad Things™, it's usually a combination of poor decisions. IMO, the only actually wrong thing in this whole situation is how shitty updates are forced onto everyone without proper testing periods I also heard that Crowdstrike normally rate limits pushing updates, so if it is bad the damage was limited. But that limit was turned off for this update.

July 24, 2024

Re: [Not really OT] Crowdstrike Analysis: It was a NULL pointer from the memory unsafe C++ language.

Posted by H. S. Teoh
in reply to Walter Bright

Permalink

H. S. Teoh

Posted in reply to Walter Bright

Permalink

On Wed, Jul 24, 2024 at 05:12:28PM -0700, Walter Bright via Digitalmars-d wrote:
> I talked with a person who has more in depth knowledge about it.
> 
> The null pointer came from reading a file and treating it as a data structure. The file was unexpectedly full of zeros.

Reading a file and treating it as a data structure without verification or at the very least sanity checking is a rather unwise thing to do. Unfortunately this is a rather common practice in enterprise code.  Sad to say, I've been guilty of this myself. :-P

Not catching this during testing, though, is rather disappointing. Though also unsurprising... having worked for a few decades in the industry dealing with very large codebases, I've seen firsthand how testing is often skimped on, especially when there's a looming deadline or some similar pressure to get it done quickly and move on to the next item in a large pile of things that need to get down by last week.

And sometimes the tester also has no idea what he's testing and what corner cases to watch out for; he also has a large pile of other work to get to and so often just randomly does a couple of ordinary runs, observes nothing unusual, and calls it done.  This is especially bad with software that requires lots of repetitive testing of features previously verified. The temptation to skimp out a few cases to speed things up is very strong.  All sorts of things may slip through unnoticed.

This is why D's built-in unittests, in spite of their warts, is a major tool for improving software quality. You *want* tests to be as automated as possible, because the rate of human error is especially high when it comes to repetitive (but very necessary) testing of previously working features.  Having the machine automatically verify previous working features is a big step in eliminating human error (aka laziness) in this area.

> Any language that allows casting a buffer read from the disk to a pointer would fail. This includes any language with unsafe blocks, or uses a FFI to get around the language protections.

Casting a buffer read from disk to a pointer is, in general, a risky and unwise thing to do.  But the temptation to do it anyway is very strong, especially in low level code or where performance is a concern.

At least, one should be thankful the file was filled with zeroes and not something else, like malicious code or code that coincidentally did something destructive like format the disk or overwriting the boot sector.

T

-- 
The past, present, and future walk into a bar.  It was tense.
Then the physicist walks in, and it was tensor.
Finally, the mathematician walks in, and it was ten sets.

Forums