Insurers have begun calculating the financial damage caused by last week’s devastating CrowdStrike software glitch that crashed computers, canceled flights and disrupted hospitals all around the globe — and the picture isn’t pretty.
On Wednesday, CrowdStrike released a report outlining the initial results of its investigation into the incident, which involved a file that helps CrowdStrike’s security platform look for signs of malicious hacking on customer devices.
The company routinely tests its software updates before pushing them out to customers, CrowdStrike said in the report. But on July 19, a bug in CrowdStrike’s cloud-based testing system — specifically, the part that runs validation checks on new updates prior to release — ended up allowing the software to be pushed out “despite containing problematic content data.”
…
When Windows devices using CrowdStrike’s cybersecurity tools tried to access the flawed file, it caused an “out-of-bounds memory read” that “could not be gracefully handled, resulting in a Windows operating system crash,” CrowdStrike said.
Couldn’t it, though? 🤔
And CrowdStrike said it also plans to move to a staggered approach to releasing content updates so that not everyone receives the same update at once, and to give customers more fine-grained control over when the updates are installed.
I thought they were already supposed to be doing this?
IANAD and AFAIU, not in kernel mode. Things like trying to read non existing memory in kernel mode are supposed to crash the system because continuing could be worse.
They could and clearly they should have done that but hindsight is 20/20. Software is complex and there’s a lot of places that invalid data could come in.
The fact that they weren’t already doing staggered releases is mind-boggling. I work for a company with a minuscule fraction of CrowdStrike’s user base / value, and even we do staggered releases.
They do have staggered releases, but it’s a bit more complicated. The client that you run does have versioning and you can choose to lag behind the current build, but this was a bad definition update. Most people want the latest definition to protect themselves from zero days. The whole thing is complicated and a but wonky, but the real issue here is cloudflare’s kernel driver not validating the content of the definition before loading it.
…
Couldn’t it, though? 🤔
I thought they were already supposed to be doing this?
IANAD and AFAIU, not in kernel mode. Things like trying to read non existing memory in kernel mode are supposed to crash the system because continuing could be worse.
I.meant couldn’t they test for a NULL pointer.
They could and clearly they should have done that but hindsight is 20/20. Software is complex and there’s a lot of places that invalid data could come in.
The fact that they weren’t already doing staggered releases is mind-boggling. I work for a company with a minuscule fraction of CrowdStrike’s user base / value, and even we do staggered releases.
They do have staggered releases, but it’s a bit more complicated. The client that you run does have versioning and you can choose to lag behind the current build, but this was a bad definition update. Most people want the latest definition to protect themselves from zero days. The whole thing is complicated and a but wonky, but the real issue here is cloudflare’s kernel driver not validating the content of the definition before loading it.