The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

The Great Global IT Outage

Desktop HippieDesktop Hippie Registered User regular
Here’s what we know:

Microsoft is experiencing a global outage that has had a major impact all over the world, most notably with banks, airlines and media. If you have a flight booked today, no matter where you are, CHECK IT NOW!

In the UK many train services are out, along with some supermarkets such as Morrisons. The NHS is experiencing serious issues. Almost all GPs are without their IT systems, which has also impacted pharmacies. Sky News found itself unable to broadcast for several hours, but is now back on the air with limited services. Several UK airports are at a standstill, including Standstead, Luton and Belfast. Ireland is faring better. The Transport for Ireland app is down. Cork and Dublin airports are operating normally, but Ryanair is having major issues due to a third party system failure. The HSE (Ireland’s NHS) seems unaffected so far.

Global airlines unable to function include KLM, Lufthansa, SAS, Eurowings, United and Delta. Many airports are badly hit, with Zurich airport unable to land planes. Alaska say their emergency services are effected. Australia has been particularly hard hit, mostly because Telstra Group, a telecommunications company, has been severely disrupted, which has caused a knock on effect in businesses across the country.

Both Australia and Switzerland pointed fingers early at the Crowdstrike security software as the cause of the issue. There are no signs that this is due to any cyberattack or hacking. Crowdstrike Cybersecurity are now confirming that an IT update appears to be the source of the problem.

BBC news have ongoing live updates:

https://www.bbc.com/news/live/cnk4jdwp49et

I regret to inform you that Microsoft Teams is still functional.

ITT keeping track of what is impacted by The Great Global IT Outage, as well as suggestions for any emergency workarounds or alternatives for travel or IT.

«1345678

Posts

  • Speed RacerSpeed Racer Scritch scratch scritch scratch scritch scratch scritch scratch scritch scratch scritch scratch scritch scratch scritch scratch scritch scratch scritch scratch scritch scratch scritch scratchRegistered User regular
    I work graveyards at a datacenter, the network folks have been working on this on our end all night

    there's a workaround that involves booting into safemode, opening the Crowdstrike folder, and deleting a problem file

  • Desktop HippieDesktop Hippie Registered User regular
    American Airlines say they’re fully back in business. Their systems had been down most of the morning. Not seen updates for United or Delta yet.

  • BogartBogart Streetwise Hercules Registered User, Moderator Mod Emeritus
    Crowdstrike shares are apparently down by 20% and I would imagine it won't end there.
    CrowdStrike is actively working with customers impacted by a defect found in a single content update for Windows hosts. Mac and Linux hosts are not impacted. This is not a security incident or cyberattack. The issue has been identified, isolated and a fix has been deployed. We refer customers to the support portal for the latest updates and will continue to provide complete and continuous updates on our website. We further recommend organizations ensure they’re communicating with CrowdStrike representatives through official channels. Our team is fully mobilized to ensure the security and stability of CrowdStrike customers.

    It's not what he meant but the line about Macs and Linux is very funny. Hey look we only completely fucked the largest slice of our user base it's not that big of a deal.

  • Desktop HippieDesktop Hippie Registered User regular
    I did very much enjoy Microsoft’s press release sniffily referring to “a third party issue” causing problems with their OS.

  • RMS OceanicRMS Oceanic Registered User regular
    I'm reminded of that XKCD comic where the infrastructure of the internet is held aloft by a flimsy Jenga piece called "Code bundle from 2003".

  • This content has been removed.

  • RMS OceanicRMS Oceanic Registered User regular
    Also I feel terrible for whoever deployed that patch.

  • syndalissyndalis Getting Classy On the WallRegistered User, Loves Apple Products, Transition Team regular
    I did very much enjoy Microsoft’s press release sniffily referring to “a third party issue” causing problems with their OS.

    Crowdstrike is part of some of their under the hood stack in Azure that runs on Windows.

    This update should never have deployed without internal staging and validation.

    SW-4158-3990-6116
    Let's play Mario Kart or something...
  • edited July 19
    This content has been removed.

  • GyralGyral Registered User regular
    This would explain why a certain company who's name starts and ends with X has had absolute bullshit trouble with the infrastructure this week.
    I regret to inform you that Microsoft Teams is still functional.
    There is no god.

    25t9pjnmqicf.jpg
  • RMS OceanicRMS Oceanic Registered User regular
    syndalis wrote: »
    I did very much enjoy Microsoft’s press release sniffily referring to “a third party issue” causing problems with their OS.

    Crowdstrike is part of some of their under the hood stack in Azure that runs on Windows.

    This update should never have deployed without internal staging and validation.

    You'd be surprised at what can slip through internal validation if your assumptions don't hold up

  • Desktop HippieDesktop Hippie Registered User regular
    Zavian wrote:
    its always amazing to me that these little known outside of IT circles companies like CrowdStrike and CloudFlare are mega essential infrastructure for the entire globe

    Apparently the entire United Arab Emirates is more or less bricked. Pretty much nothing is working anywhere. Their government issued a statement to their citizens that basically said “Just don’t touch ANYTHING!”

  • RMS OceanicRMS Oceanic Registered User regular
    Zavian wrote:
    its always amazing to me that these little known outside of IT circles companies like CrowdStrike and CloudFlare are mega essential infrastructure for the entire globe

    Apparently the entire United Arab Emirates is more or less bricked. Pretty much nothing is working anywhere. Their government issued a statement to their citizens that basically said “Just don’t touch ANYTHING!”

    Hope they don't ask for a milkshake and a Crispy bar

  • syndalissyndalis Getting Classy On the WallRegistered User, Loves Apple Products, Transition Team regular
    edited July 19
    Zavian wrote: »
    its always amazing to me that these little known outside of IT circles companies like CrowdStrike and CloudFlare are mega essential infrastructure for the entire globe

    When cloudflare has an outage, everyone suffers.

    They have effectively taken over a large chunk of NS, caching and security for the Internet.

    syndalis on
    SW-4158-3990-6116
    Let's play Mario Kart or something...
  • TraceTrace GNU Terry Pratchett; GNU Gus; GNU Carrie Fisher; GNU Adam We Registered User regular
    Australia is basically bricked as well from what I'm hearing.

  • Dark Raven XDark Raven X Laugh hard, run fast, be kindRegistered User regular
    I just sent my passport back to the UK for renewal via USPS and I guess the tracking is fucked. That's gonna be a stressful wait. :I

    Oh brilliant
  • RazielMortemRazielMortem Registered User regular
    Thing is, Crowdstrike shit the bed two weeks ago. Just it was less severe (stuff worked, just very slooooowly). Two fuckups in 2 weeks? That's a whole team getting fired and their competitors salivating.

  • RazielMortemRazielMortem Registered User regular
    The fix is pretty simple BUT manual for busted machines. IT teams are not happy today.

  • Desktop HippieDesktop Hippie Registered User regular
    @Trace Australia’s Telecommunications giant Telstra Group was hit, which has had a knock on effect on… preeeetty much everything over there.

  • SurikoSuriko AustraliaRegistered User regular
    The fix is pretty simple BUT manual for busted machines. IT teams are not happy today.

    As an IT team person: It's been a long day. And it's gonna be a long weekend.

    Least it's good for overtime pay.

  • This content has been removed.

  • syndalissyndalis Getting Classy On the WallRegistered User, Loves Apple Products, Transition Team regular
    I am always surprised to see how much service-level or customer facing internet things run on windows machines.

    I cannot for the life of me figure why you would do that. It’s way more expensive, more prone to rogue forced updates that break shit, etc.

    Internal only systems that rely on AD? Fine. But if I were an airline I would be looking real hard at moving anything mission critical off of this shit.

    SW-4158-3990-6116
    Let's play Mario Kart or something...
  • KarlKarl Registered User regular
    yesterday I had an office 365 update forced upon me.

    Today my IT department is telling us not to restart our laptops in case they brick.

    Lol

    LMAO

  • Desktop HippieDesktop Hippie Registered User regular
    Oh when it comes to media, Paramount channels MTV, VH1, CMT and Pop TV are offline as well as ESPN cable channels. Sky News was hit in the UK and Australia, as well as ABC Australia, but all are back now and broadcasting without graphics, which I’m sure is resulting in some truly superb clips for later.

  • SurikoSuriko AustraliaRegistered User regular
    edited July 19
    syndalis wrote: »
    I am always surprised to see how much service-level or customer facing internet things run on windows machines.

    I cannot for the life of me figure why you would do that. It’s way more expensive, more prone to rogue forced updates that break shit, etc.

    Internal only systems that rely on AD? Fine. But if I were an airline I would be looking real hard at moving anything mission critical off of this shit.

    Good in theory, but hits two problems: 1. Windows admin staff are far more common and cheaper than Linux or Mac admin staff, and more importantly 2. off-the-shelf products are usually Windows. Edit: I'm not sure on how Linux or Mac enterprise support goes compared to MS and their E5 support either (not that my experiences with enterprise MS support have been pleasant at all).

    I'd jump for joy if there were a big enterprise push for more systems to be shunted to Linux, but such is life, and this thread is what we get to live with.

    Suriko on
  • 101101 Registered User regular
    I am so glad I am not in work today I tell you what

  • syndalissyndalis Getting Classy On the WallRegistered User, Loves Apple Products, Transition Team regular
    Oh I wouldn’t dare recommend Mac for this either. Great consumer platform, absolutely not what I would use for infrastructure.

    Apple themselves currently use a blend of GCP and AWS Linux services to run iCloud.

    SW-4158-3990-6116
    Let's play Mario Kart or something...
  • RMS OceanicRMS Oceanic Registered User regular
    My work laptop autorestarted with an update and is...fine? I appear to be a jammy get

  • ArchangleArchangle Registered User regular
    I work graveyards at a datacenter, the network folks have been working on this on our end all night

    there's a workaround that involves booting into safemode, opening the Crowdstrike folder, and deleting a problem file
    Yeah, the workaround was posted on the Crowdstrike support portal within 2 hours of it hitting.

    I was on calls with people when they started crashing mid-sentence around 0530 UTC, and started googling when more and more coworkers started crashing out, noting that the BSOD was referring to csagent.sys

    For reference - the workaround to fix your crashing device posted by Crowdstrike:
    1. Boot Windows into Safe Mode or the Windows Recovery Environment.
    2. Navigate to the C:\Windows\System32\drivers\CrowdStrike directory.
    3. Locate the file matching "C-00000291*.sys" and delete it.
    4. Boot the host normally.
    https://supportportal.crowdstrike.com/s/article/Tech-Alert-Windows-crashes-related-to-Falcon-Sensor-2024-7-19

  • syndalissyndalis Getting Classy On the WallRegistered User, Loves Apple Products, Transition Team regular
    Yup just need direct physical (or remote hardware KVM) access to potentially thousands of instances to do a manual task, no biggie haha.

    SW-4158-3990-6116
    Let's play Mario Kart or something...
  • AngelHedgieAngelHedgie Registered User regular
    syndalis wrote: »
    Zavian wrote: »
    its always amazing to me that these little known outside of IT circles companies like CrowdStrike and CloudFlare are mega essential infrastructure for the entire globe

    When cloudflare has an outage, everyone suffers.

    They have effectively taken over a large chunk of NS, caching and security for the Internet.

    Which is why their carrying water for bigots and fascists in the name of "free speech" is so harmful. But they have hate offsets!

    (That was not a joke (at least an intentional one.))

    XBL: Nox Aeternum / PSN: NoxAeternum / NN:NoxAeternum / Steam: noxaeternum
  • AngelHedgieAngelHedgie Registered User regular
    Suriko wrote: »
    syndalis wrote: »
    I am always surprised to see how much service-level or customer facing internet things run on windows machines.

    I cannot for the life of me figure why you would do that. It’s way more expensive, more prone to rogue forced updates that break shit, etc.

    Internal only systems that rely on AD? Fine. But if I were an airline I would be looking real hard at moving anything mission critical off of this shit.

    Good in theory, but hits two problems: 1. Windows admin staff are far more common and cheaper than Linux or Mac admin staff, and more importantly 2. off-the-shelf products are usually Windows. Edit: I'm not sure on how Linux or Mac enterprise support goes compared to MS and their E5 support either (not that my experiences with enterprise MS support have been pleasant at all).

    I'd jump for joy if there were a big enterprise push for more systems to be shunted to Linux, but such is life, and this thread is what we get to live with.

    It depends on the system. My employer uses a lot of MS stuff for our front-end side, but our back end is pretty much RHEL across the board.

    XBL: Nox Aeternum / PSN: NoxAeternum / NN:NoxAeternum / Steam: noxaeternum
  • Desktop HippieDesktop Hippie Registered User regular
    Microsoft are saying they’ve fixed the underlying cause, according to the BBC. Though they expect the residual impact will last a few days yet.

  • SurikoSuriko AustraliaRegistered User regular
    God, what a day.

    Friday afternoon. Office is empty from everyone working from home; all that's to be heard is the occasional tik tak of the scattered remaining people browsing news articles and videos as the workweek winds up. The few left are mentally checked out and hanging out just a few more minutes at the coffee machine while discussing weekend plans.

    A trouble ticket comes in. A critical app is down. "God damnit, I manage that one," I think. Sighing, I open the ticket and check on the server, pushing away my empty energy drink can. Then I overhear my manager on the phone: "Hold on, I bluescreened, give me a moment." Hmm.

    An eyebrow raised, I check the virtual machines. The app's server is down. Okay, that makes sense. Wait... hold on a minute, here. A second server goes down while I'm looking, then a third. My heart skips. More green ticks switch to red errors as I scan down the list. Checking the screens for each, and they're bluescreening, each and every one.

    Oh shit.

    And that's how an IT admin's day goes to hell.

  • EchoEcho ski-bap ba-dapModerator, Administrator admin
    "It's Friday, I'ma deploy and go on vacation."

  • RMS OceanicRMS Oceanic Registered User regular
    Echo wrote: »
    "It's Friday, I'ma deploy and go on vacation."

    Custom round here is to only deploy between Tuesday and Thursday, so we have time to roll back without either having a bug active all weekend or kneecapping the Monday morning traffic.

  • Desktop HippieDesktop Hippie Registered User regular
    Apparently the US states experiencing issues with their 911 emergency systems are Alaska, Arizona, Indiana and New Hampshire. So if you’re there be sure to take a look at whatever workarounds they have in place right now. (Just in case. Hopefully you won’t need 911)

  • RazielMortemRazielMortem Registered User regular
    Read-only Fridays people!

  • EddyEddy Gengar the Bittersweet Registered User regular
    A big part of the problem lies with the current suggested solution, which is to reboot each computer manually into safe mode, delete a specific file, and then restart the computer normally. Security experts said that while it is a relatively simple process, there is no way to automate it at scale.

    https://www.nytimes.com/2024/07/19/business/microsoft-outage-cause-azure-crowdstrike.html?unlocked_article_code=1.8U0.8mtp.fZNbILhC6gTE&smid=url-share

    unlocked article

    I get why it can't be automated, but it's still one of those things where you're in awe that a single error leads to irreversible cascading failure on a global scale. Easier to destroy than to build, etc.

    Any chance of residual effects from this?

    "and the morning stars I have seen
    and the gengars who are guiding me" -- W.S. Merwin
  • EchoEcho ski-bap ba-dapModerator, Administrator admin
    Eddy wrote: »
    Any chance of residual effects from this?

    Crowdstrike stonks going down

Sign In or Register to comment.