X, Zoom, and Teams went down from one fibre cut. The transit layer doesn’t show up on most redundancy diagrams.
What the June 22 outage actually revealed, and why anycast and multi-region design were never built to catch it.
On the morning of June 22, 2026, a single severed fibre route took down X, Zoom, Microsoft Teams, Reddit, and several other unrelated platforms inside the same ten-minute window. Most headlines called it a Cloudflare outage. The actual cause sat one layer further down: a transit-layer failure that anycast, multi-AZ, and multi-region architecture were never built to catch.
What happened that morning
At around 13:35 UTC, Cloudflare’s status page began reporting elevated error rates and latency across several of its services. Within twenty minutes, Downdetector showed nearly 36,000 reports of frozen feeds and failed refreshes on X, concentrated between 9:45 and 10:00am US Eastern time, according to Newsweek. Zoom logged more than 3,200 reports of login failures and dropped audio. Reddit logged 2,864. Microsoft Teams logged over 1,400. Smaller spikes hit Robinhood, Discord, Fortnite, and Canva inside the same window.
Within about an hour, Cloudflare traced the cause: not its own infrastructure, but a fibre cut on a route operated by Zayo, a network transit provider, somewhere between Cleveland and Buffalo. Cloudflare’s own status updates were specific that the impact wasn’t limited to its customers: any site routed through that stretch of Zayo’s network would have seen the same failures, with or without Cloudflare sitting in front of it.
Most platforms recovered within 45 minutes once Zayo shifted traffic onto alternate paths. Cloudflare also happened to be running scheduled, unrelated maintenance in its Newark data centre that same day, a separate event that overlapped in timing but had no bearing on the fibre cut. The overlap is a small, useful reminder of how easy it is to misattribute a failure if you stop reading at the headline.
Why the blame landed on Cloudflare anyway
Cloudflare’s name was on the status page everyone checked first, so Cloudflare’s name went into the headline. Zayo is a transit provider most consumers have never heard of, and "a BGP route-convergence problem on a regional fibre path" doesn’t fit a push notification the way "Cloudflare is down" does. That’s not a complaint about the reporting. It’s a description of how incident attribution actually travels once seven unrelated platforms degrade at once. The brand with the most visible dashboard absorbs the blame, whether or not the fault lives there.
What anycast actually buys you
Cloudflare, like every major CDN, uses anycast: the same IP address is advertised from dozens of data centres at once, and BGP (the protocol that glues independent networks into one internet) decides which advertisement a given chunk of traffic follows. In practice, "nearest" means shortest AS-path, not shortest physical distance or lowest latency. Cloudflare has written about the mechanics of anycast at length, since hiding planet-scale infrastructure behind one IP address is most of what the technique is for.
The protection anycast buys you is real, and it’s specific: if one data centre goes dark, BGP withdraws the route to it and traffic shifts to the next-nearest advertisement, usually before a human notices. That’s the failure anycast, multi-AZ, and multi-region deployments are all built around: a single location, zone, or region disappearing while the rest of the system keeps running.
None of that is what happened on June 22.
The failure mode that doesn’t show up in your AZ or region diagrams: transit-layer concentration
Anycast and multi-region design assume the paths leading to your redundant locations are independent of each other. They often aren’t. Multiple data centres, regions, even multiple cloud providers, can still depend on the same handful of long-haul fibre routes and the same small set of transit providers underneath — because there are only so many physical routes between Cleveland and Buffalo, and fewer companies still that own the fibre running through them. When one of those routes is severed, every "redundant" path that happened to share it goes down together, and no failover logic above that layer can route around a link that simply isn’t there.
BGP convergence after a physical break isn’t instant, either. Routers have to detect the failure, withdraw the broken routes, and propagate new ones across however many autonomous systems sit between the cut and your traffic, typically a matter of seconds. Zayo’s own preliminary account of the incident described it as a physical fibre cut compounded by a software fault that slowed how quickly traffic moved onto backup paths. That compounding is what turned a routine fibre cut (the kind transit providers deal with most weeks) into a multi-platform event visible on Downdetector.
| Failure layer | Example | What protects against it | What June 22 showed |
|---|---|---|---|
| Single server | A box crashes mid-request | Load balancer with health checks | Not what broke |
| Availability zone | One cloud AZ goes dark | Multi-AZ deployment | Not what broke |
| Region | A whole cloud region fails | Multi-region failover | Not what broke |
| CDN edge location | One Cloudflare PoP goes offline | Anycast reroutes to the next PoP | Worked as designed |
| Transit / physical path | A fibre route is severed | Multi-transit peering on a disjoint path | Mostly absent — this is what broke |
The mitigations that exist, and why most teams don’t have them
Multi-CDN and multi-transit setups exist precisely for this failure mode, and they work — provided the second path is physically disjoint from the first. A second contract with a different brand name doesn’t help if both providers lease capacity on the same underlying fibre bundle, which happens more often than most procurement processes check for. Genuine path diversity means asking, specifically, which physical routes and which underlying carriers serve each region or point of presence you depend on, and getting that answer in writing rather than a sales assurance.
Health-check-based failover needs the same scrutiny. Most disaster-recovery drills simulate a server dying or a data centre losing power, because that’s the failure everyone already understands. Almost none simulate the transit path itself disappearing while every server behind it stays healthy and unreachable. That’s the scenario that actually played out on June 22, and it’s the one most runbooks have never rehearsed.
“Anycast tells your traffic where the nearest door is. It says nothing about whether the road to that door is still there.”
None of this is free. Genuine multi-transit diversity costs real money and real engineering attention, and for a lot of products the honest answer is that the residual risk is acceptable as-is. The mistake isn’t carrying that risk — it’s not knowing you’re carrying it, and finding out during an incident review instead of a design review.
What to actually check this week
- Ask your CDN and cloud providers, by name, which transit carriers and physical routes serve each region or PoP you rely on — "redundant" is not an answer.
- Pull up your last two disaster-recovery drills and check whether either one simulated a transit-layer failure, rather than a server or data-centre failure.
- Check whether your "multi-region" or "multi-cloud" setup shares an upstream transit provider you’ve never actually identified.
- Time how long failover takes when the failure sits upstream of your own infrastructure rather than inside it — that number is usually unmeasured, not just untested.
The next fibre cut won’t announce itself as a Cloudflare problem either. It’ll show up as half a dozen unrelated outages inside the same ten-minute window, and most of the coverage will name whichever brand had the most visible status page. The actual fix lives one layer further down, in a part of the stack most architecture diagrams have never had a reason to draw.
Frequently asked questions
Related reading
An AI agent deleted PocketOS's production database in 9 seconds. Credential scoping was the real failure.
A Cursor agent found one unscoped API token and wiped a production database and its backups in nine seconds. The real failure was credential scoping, not the model.
The AI memory shortage just rewrote the cloud cost-optimisation playbook
DRAM and NAND contract prices rose roughly 95% in a single quarter. The cause is a global reallocation of memory manufacturing towards AI accelerators, and the usual cost-optimisation playbook does not touch it.
Meta published a postmortem for its 2021 outage. Not for the ones in 2026.
Meta's Instagram breach traced to a basic authentication gap, not a sophisticated attack, after its Trust and Safety team lost half its staff to an AI reassignment. No public postmortem has followed.