- cross-posted to:
- sysadmin@lemmy.ml
- sysadmin@lemmy.world
- cross-posted to:
- sysadmin@lemmy.ml
- sysadmin@lemmy.world
All our servers and company laptops went down at pretty much the same time. Laptops have been bootlooping to blue screen of death. It’s all very exciting, personally, as someone not responsible for fixing it.
Apparently caused by a bad CrowdStrike update.
Edit: now being told we (who almost all generally work from home) need to come into the office Monday as they can only apply the fix in-person. We’ll see if that changes over the weekend…
Stop running production services on M$. There is a better backend OS.
The issue was caused by a third-party vendor, though. A similar issue could have happened on other OSes too. There’s relatively intrusive endpoint security systems for MacOS and Linux too.
That’s the annoying thing here. Everyone, particularly Lemmy where everyone runs Linux and FOSS, thinks this is a Microsoft/Windows issue. It’s not, it’s a Crowdstrike issue.
More than that: it’s an IT security and infrastructure admin issue. How was this 3rd party software update allowed to go out to so many systems to break them all at once with no one testing it?
From what I understand, Crowdstrike doesn’t have built in functionality for that.
One admin was saying that they had to figure out which IPs were the update server vs the rest of the functionality servers, block the update server at the company firewall, and then set up special rules to let the traffic through to batches of their machines.
So… yeah. Lot of work, especially if you’re somewhere where the sysadmin and firewall duties are split across teams. Or if you’re somewhere that is understaffed and overworked. Spend time putting out fires, or jerry-rigging a custom way to do staggered updates on a piece of software that runs largely as a black box?
Edit: re-read your comment. My bad, I think you meant it was a failure of that on CrowdStrike’s end. Yeah, absolutely.
Bingo. I work for a small software company, so I expect shit like this to go out to production every so often and cause trouble for our couple tens of thousands of clients… But I can’t fathom how any company with worldwide reach can let it happen…
That’s because cloudstrike likely has significantly worse leadership compared to your company.
They have a massive business development budget though.
Everyone, particularly Lemmy where everyone runs Linux and FOSS, knows it is a Crowdstrike issue.
Its an snakeoil issue.
Many news sources said it’s a “Microsoft update”, so it’s understandable that people are confused.
Also, there was an Azure outage yesterday.
It’s a MS process issue. This is a testing failure and a rollout failure
This had nothing to do with MS, other than their OS being impacted. Not their software that broke, not an update pushed out by their update system. This is an entirely third party piece of software that installs at the kernel level, deeper than MS could reasonably police, even it somehow was their responsibility.
Thid same piece of software was crashing certain Linux distros last month, but it didn’t make headlines due to the limited scope.
My bad i thought this went out with a MS update
Microsoft would never push an update on a Friday. They usually push their major patches on Tuesdays, unless there’s something that’s extremely important and can’t wait.
Windows is imfamous for a do-it-yourself install process, they are likely using their own deployment tools. If anything, criticize them for not helping the update process at all.
Crowdstrike did the same to Linux servers previously.
There’s a better frontend OS
Doesn’t mean people want to go away from what they know
There’s a shit ton more reasons than that, but in short: I highly doubt anyone suggesting a company just up and leave the MS ecosystem has spent any considerable amount of time in a sysadmin position.
You’ll find xp in use because they don’t want to pay for a new system
And Linux/BSD is way more expensive because not as many people are familiar with it