Here’s what we know:
Microsoft is experiencing a global outage that has had a major impact all over the world, most notably with banks, airlines and media. If you have a flight booked today, no matter where you are, CHECK IT NOW!
In the UK many train services are out, along with some supermarkets such as Morrisons. The NHS is experiencing serious issues. Almost all GPs are without their IT systems, which has also impacted pharmacies. Sky News found itself unable to broadcast for several hours, but is now back on the air with limited services. Several UK airports are at a standstill, including Standstead, Luton and Belfast. Ireland is faring better. The Transport for Ireland app is down. Cork and Dublin airports are operating normally, but Ryanair is having major issues due to a third party system failure. The HSE (Ireland’s NHS) seems unaffected so far.
Global airlines unable to function include KLM, Lufthansa, SAS, Eurowings, United and Delta. Many airports are badly hit, with Zurich airport unable to land planes. Alaska say their emergency services are effected. Australia has been particularly hard hit, mostly because Telstra Group, a telecommunications company, has been severely disrupted, which has caused a knock on effect in businesses across the country.
Both Australia and Switzerland pointed fingers early at the Crowdstrike security software as the cause of the issue. There are no signs that this is due to any cyberattack or hacking. Crowdstrike Cybersecurity are now confirming that an IT update appears to be the source of the problem.
BBC news have ongoing live updates:
https://www.bbc.com/news/live/cnk4jdwp49et
I regret to inform you that Microsoft Teams is still functional.
ITT keeping track of what is impacted by The Great Global IT Outage, as well as suggestions for any emergency workarounds or alternatives for travel or IT.
Posts
there's a workaround that involves booting into safemode, opening the Crowdstrike folder, and deleting a problem file
http://www.audioentropy.com/
It's not what he meant but the line about Macs and Linux is very funny. Hey look we only completely fucked the largest slice of our user base it's not that big of a deal.
Choose Your Own Chat 1 Choose Your Own Chat 2 Choose Your Own Chat 3
Crowdstrike is part of some of their under the hood stack in Azure that runs on Windows.
This update should never have deployed without internal staging and validation.
Let's play Mario Kart or something...
Apparently it's a content definition file. So basically the kernel sensor doesn't properly validate it's inputs somewhere.
Edit: also this happening doesn't surprise me at all - no where I've ever worked has ever given a single fuck about testing security software updates. You're graded on "auto updates on", not "human in the middle deployment and validation".
There is no god.
You'd be surprised at what can slip through internal validation if your assumptions don't hold up
Apparently the entire United Arab Emirates is more or less bricked. Pretty much nothing is working anywhere. Their government issued a statement to their citizens that basically said “Just don’t touch ANYTHING!”
Hope they don't ask for a milkshake and a Crispy bar
When cloudflare has an outage, everyone suffers.
They have effectively taken over a large chunk of NS, caching and security for the Internet.
Let's play Mario Kart or something...
As an IT team person: It's been a long day. And it's gonna be a long weekend.
Least it's good for overtime pay.
Old PA forum lookalike style for the new forums | My ko-fi donation thing.
Provided you still have your Bitlocker recovery keys.
I cannot for the life of me figure why you would do that. It’s way more expensive, more prone to rogue forced updates that break shit, etc.
Internal only systems that rely on AD? Fine. But if I were an airline I would be looking real hard at moving anything mission critical off of this shit.
Let's play Mario Kart or something...
Today my IT department is telling us not to restart our laptops in case they brick.
Lol
LMAO
Good in theory, but hits two problems: 1. Windows admin staff are far more common and cheaper than Linux or Mac admin staff, and more importantly 2. off-the-shelf products are usually Windows. Edit: I'm not sure on how Linux or Mac enterprise support goes compared to MS and their E5 support either (not that my experiences with enterprise MS support have been pleasant at all).
I'd jump for joy if there were a big enterprise push for more systems to be shunted to Linux, but such is life, and this thread is what we get to live with.
Old PA forum lookalike style for the new forums | My ko-fi donation thing.
Apple themselves currently use a blend of GCP and AWS Linux services to run iCloud.
Let's play Mario Kart or something...
I was on calls with people when they started crashing mid-sentence around 0530 UTC, and started googling when more and more coworkers started crashing out, noting that the BSOD was referring to csagent.sys
For reference - the workaround to fix your crashing device posted by Crowdstrike:
- Boot Windows into Safe Mode or the Windows Recovery Environment.
- Navigate to the C:\Windows\System32\drivers\CrowdStrike directory.
- Locate the file matching "C-00000291*.sys" and delete it.
- Boot the host normally.
https://supportportal.crowdstrike.com/s/article/Tech-Alert-Windows-crashes-related-to-Falcon-Sensor-2024-7-19Let's play Mario Kart or something...
Which is why their carrying water for bigots and fascists in the name of "free speech" is so harmful. But they have hate offsets!
(That was not a joke (at least an intentional one.))
It depends on the system. My employer uses a lot of MS stuff for our front-end side, but our back end is pretty much RHEL across the board.
Friday afternoon. Office is empty from everyone working from home; all that's to be heard is the occasional tik tak of the scattered remaining people browsing news articles and videos as the workweek winds up. The few left are mentally checked out and hanging out just a few more minutes at the coffee machine while discussing weekend plans.
A trouble ticket comes in. A critical app is down. "God damnit, I manage that one," I think. Sighing, I open the ticket and check on the server, pushing away my empty energy drink can. Then I overhear my manager on the phone: "Hold on, I bluescreened, give me a moment." Hmm.
An eyebrow raised, I check the virtual machines. The app's server is down. Okay, that makes sense. Wait... hold on a minute, here. A second server goes down while I'm looking, then a third. My heart skips. More green ticks switch to red errors as I scan down the list. Checking the screens for each, and they're bluescreening, each and every one.
Oh shit.
And that's how an IT admin's day goes to hell.
Old PA forum lookalike style for the new forums | My ko-fi donation thing.
Custom round here is to only deploy between Tuesday and Thursday, so we have time to roll back without either having a bug active all weekend or kneecapping the Monday morning traffic.
https://www.nytimes.com/2024/07/19/business/microsoft-outage-cause-azure-crowdstrike.html?unlocked_article_code=1.8U0.8mtp.fZNbILhC6gTE&smid=url-share
unlocked article
I get why it can't be automated, but it's still one of those things where you're in awe that a single error leads to irreversible cascading failure on a global scale. Easier to destroy than to build, etc.
Any chance of residual effects from this?
and the gengars who are guiding me" -- W.S. Merwin
Crowdstrike stonks going down