The CrowdStrike outage should make us rethink the resiliency of our tech infrastructure. It probably won’t

 

The CrowdStrike outage should make us rethink the resiliency of our tech infrastructure. It probably won’t

The tech solutions we rely on are often less polished than they appear. We just learned that lesson—once again—the hard way.

BY Chris Stokel-Walker

Very late Thursday night, the world got a firsthand look at how vulnerable our computer infrastructure is. A glitch in an update to cybersecurity firm CrowdStrike’s Falcon Sensor tool pushed out by engineers caused a catastrophic error that resulted in potentially millions of Windows computers worldwide crashing into a blue screen of death.

That has created utter chaos for the masses of people affected by such a wide-ranging, yet elementary error that can cause such havoc around the world. Hospital appointments and surgeries have been canceled in Austria as a result of the issue. Airports closed in Germany. The UK stock exchange encountered issues. Airlines in Japan grounded flights. Banks and supermarkets went offline in India, Australia, and elsewhere.

CrowdStrike said in a statement Friday it has “identified, isolated and a fix has been deployed.” But the damage has in some cases already been done. We’re left now with a big, terrifying, realization: Many of the people in charge of our global digital systems rely on a single software vendor as a point of failure (one estimate suggests CrowdStrike accounts for 24% of the security market), and when that vendor itself screws up, we’re all left picking up the pieces.

“What is unique about this incident is the scale at which it has taken place, likely wiping billions from the global economy due to global, widespread downtime,” says Neatsun Ziv, CEO of OX Security, a cybersecurity firm. It also wiped billions from CrowdStrike’s balance sheet, as its stock price tumbled in early trading Friday.

Yet this is not the first example of a single company’s failures affecting massive global networks. We’ve seen scores of massive internet outages as a result of failures on the part of cloud web hosts and other system providers. The world should have learned its lesson from any one of those incidents, whether the September 2020 outage of Microsoft 365 software, or the nationwide cell outage in February 2024, or even the U.S. utilities attack in April this year.

“It’s important that lessons are learned from [the CrowdStrike] incident to reduce the likelihood of it happening again,” says Simon Newman, cofounder of Cyber London and a member of the International Cyber Expo advisory council. “I would encourage all organizations to review their supply chain resilience regularly.”

However, we haven’t, and we probably won’t this time either. The whizz-bang tech solutions we rely on day in day out are often less polished than they appear. While they seem to be slickly developed and coded, and are by design for marketing purposes faceless, they’re the result of hard-working humans coding every line and checking every element of them. And humans make mistakes.

That this happens—and continues to happen—should help put to rest the idea that these systems are somehow infallible, and that nothing can ever go wrong with them. In a perfect world, we would game-plan around these inevitabilities, and create appropriate fail-safes. But that’s not the world we live in.

 

ABOUT THE AUTHOR

Chris Stokel-Walker is a freelance journalist and Fast Company contributor. He is the author of YouTubers: How YouTube Shook up TV and Created a New Generation of Stars, and TikTok Boom: China’s Dynamite App and the Superpower Race for Social Media. 


Fast Company

(3)