Until this happen. A single region in a cascade failure and your saas is single region.
Recall is on AWS.
Everyone using Recall for meeting recordings is down.
In some domains, a single SaaS dominates the domain and if that SaaS sits on AWS, it doesn't matter if AWS is 35% marketshare because the SaaS that dominates 80% of the domain is on AWS so the effect is wider than just AWS's market share.
We're on GCP, but we have various SaaS vendors on AWS so any of the services that rely on AWS are gone.
Many chat/meeting services also run on AWS Chime so even if you're not on AWS, if a vendor uses Chime, that service is down.
And this comes in a time with regulations like Dora and the BaFin tightening things - managing these boxes becomes less effort than maintaining compliance across vendors.
> They’re not wrong though. If AWS goes down, EVERYTHING goes down to some degree. Your app, your competitor’s apps, your clients’ chat apps. You’re kinda off the hook.
They made their own bigger problems by all crowding into the same single region.
imagine a beach, with icecream vendors. You'd think it would be optimal for two vendors to each split it half north, half south. However, in wanting to steal some of the other vendors' customers, you end up with two icecream stands in the center.
So too with outages. Safety / loss of blame in numbers.
At the end of the day most of us aren't working on super critical things. No one is dying because they can't purchase X item online or use Y SaaS. And, more importantly, customers are _not_ willing to pay the extra for you to host your backend in multiple regions/providers.
In my contracts (for my personal company) I call out the single-point-of-failure very clearly and I've never had anyone balk. If they did I'd offer then resiliency (for a price) and I have no doubt that they would opt to "roll the dice" instead of pay.
Lastly, it's near-impossible to verify what all your vendors are using so even if you manage to get everything resilient it only takes one chink in the armor the bring it all down (See: us-east-1 and various AWS services that rely on that even if you don't host anything in us-east-1 directly).
I'm not trying to downplay this, pretend it doesn't matter, or anything like that. Just trying to point out that most people don't care because no one seems to care (or want to pay for it). I wish that was different (I wish a lot of things were different) but wishing doesn't pay my bills and so if customers don't want to pay for resiliency then this is what they get and I'm at peace with that.
It's fairly difficult to avoid single points of failure completely, and if you do it's likely your suppliers and customers haven't managed to.
It's about how much your risk level is.
AWS us-east-1 fails constantly, it has terrible uptime, and you should expect it to go. A cyberattack which destroyed AWSs entire infrastructure would be less likely. BGP hijacks across multiple AWS nodes are quite plausible though, but that can be mitigated to an extent with direct connects.
Sadly it seems people in charge of critical infrastructure don't even bother thinking about these things, because next quarters numbers are more important.
I can avoid London as a single point of failure, but the loss of Docklands would cause so much damage to the UK's infrastructure I can't confidently predict that my servers in Manchester connected to peering points such as IXman will be able to reach my customer in Norwich. I'm not even sure how much international connectivity I could rely on. In theory Starlink will continue to work, but in practice I'm not confident.
When we had power issues in Washington DC a couple of months ago, three of our four independent ISPS failed, as they all had undeclared requirements on active equipment in the area. That wasn't even a major outage, just a local substation failure. The one circuit which survived was clearly just fibre from our (UPS/generator backed) equipment room to a data centre towards Baltimore (not Ashburn).