undefined | Better HN

0 pointsnikolay7mo ago0 comments

Choosing us-east-1 as your primary region is good, because when you're down, everybody's down, too. You don't get this luxury with other US regions!

0 comments

rdhatt7mo ago

One unexpected upside moving from a DC to AWS is when a region is down, customers are far more understanding. Instead of being upset, they often shrug it off since nothing else they needed/wanted was up either.

scandox7mo ago

This is a remarkable and unfair truth. I have had this experience with Office365...when they're down a lot of customers don't care because all their customers are also down.

ta12437mo ago

It took me so long to realise this is what's important in enterprise. Uptime isn't important, being able to blame someone else is what's important.

If you're down for 5 minutes a year because one of your employees broke something, that's your fault, and the blame passes down through the CTO.

If you're down for 5 hours a year but this affected other companies too, it's not your fault

From AWS to Crowdstrike - system resilience and uptime isn't the goal. Risk mitigation isn't the goal. Affordability isn't the goal.

When the CEO's buddies all suffer at the same time as he does, it's just an "act of god" and nothing can be done, it's such a complex outcome that even the amazing boffins at aws/google/microsoft/cloudflare/etc can't cope.

If the CEO is down at a different time than the CEO's buddies then it's that Dave/Charlie/Bertie/Alice can't cope and it's the CTO's fault for not outsourcing it.

As someone who likes to see things working, it pisses me off no end, but it's the way of the world, and likely has been whenever the owner and CTO are separate.

shermantanktop7mo ago

A slightly less cynical view: execs have a hard filter for “things I can do something about” and “things I can’t influence at all.” The bad ones are constantly pushing problems into the second bucket, but there are legitimately gray area cases. When an exec smells the possibility that their team could have somehow avoided a problem, that’s category 1 and the hammer comes down hard.

After that process comes the BS and PR step, where reality is spun into a cotton candy that makes the leader look good no matter what.

balls1877mo ago

> It took me so long to realise this is what's important in enterprise. Uptime isn't important, being able to blame someone else is what's important.

Yes.

What is important is having a Contractual SLA that is defensible. Acts of God are defensible. And now major cloud infrastructure outtages are too.

arvindh-manian7mo ago

“No one ever got fired for hiring IBM”

sam1r7mo ago

Sometimes we all need a tech shutdown.

sunrunner7mo ago

As they say, every cloud outage has a silver lining.

* Give the computers a rest, they probably need it. Heck, maybe the Internet should just shut down in the evening so everyone can go to bed (ignoring those pesky timezone differences)

* Free chaos engineering at the cloud provider region scale, except you didn't opt in to this one and know about in advance, making it extra effective

* Quickly figure out a map which of the things you use have a dependency on a single AWS region without no capability to change or re-route

rwky7mo ago

Back in the day people used to shut down mail servers at the weekend, maybe we should start doing that again.

giobox7mo ago

This still happens in some places. In various parts of Europe there are legal obligations not to email employees out of hours if it is avoidable. Volkswagen famously adopted a policy in Germany of only enabling receipt of new email messages for most of their employees 30 minutes before start of the working day, then disabling 30 minutes after the end, with weekends turned off also. You can leave work on Friday and know you won't be receiving further emails until Monday.

> https://en.wikipedia.org/wiki/Right_to_disconnect

1 more reply

kshacker7mo ago

Disconnect day

Gigachad7mo ago

I was once told that our company went with Azure because when you tell the boomer client that our service is down because Microsoft had an outage, they go from being mad at you, to accepting that the outage was an act of god that couldn’t be avoided.

myhf7mo ago

Azure outages: happens all the time, understandable, no way to prevent this

AWS outages: almost never happens, you should have been more prepared for when it does

Gigachad7mo ago

The 50 year old executive using the software doesn’t know what an AWS is and hardly knows what Amazon does outside of selling junk.

If you say it’s Microsoft then it’s just unavoidable.

Sparkyte7mo ago

I am down with that lets all build in US-East-1.

kelseydh7mo ago

Is us-east-1 equally unstable to the other regions? My impression was that Amazon deployed changes to us-east-1 first so it's the most unstable region.

hinkley7mo ago

I've heard this so many times and not seen it contradicted so I started saying it myself. Even my last Ops team wanted to run some things in us-east-1 to get prior warning before they broke us-west-1.

But there are some people on Reddit who think we are all wrong but won't say anything more. So... whatever.

Nothing in the outage history really stands out as "this is the first time we tried this and oops" except for us-east-1.

It's always possible for things to succeed at a smaller scale and fail at full scale, but again none of them really stand out as that to me. Or at least, not any in the last ten years. I'm allowing that anything older than that is on the far side of substantial process changes and isn't representative anymore.

morshu90017mo ago

Would think that Amazon safeguards their biggest region more, but no idea, I've never worked at AWS

hinkley7mo ago

I do know from previous discussions that some companies are in us-east-1 because of business partnerships with other inhabitants and if one moves out the costs and latency goes up. So they are all stuck in this boat together.

Still, it would make a bit of sense if you can find a place in your code where crossing a region hurts less, to move some of your services to a different region.

While your business partners will understand that you’re down while they’re down, will your customers? You called yesterday to say their order was ready, and now they can’t pick it up?

ej_campbell7mo ago

And all your dependencies are co-located.

tokioyoyo7mo ago

Doing pretty well up here in Tokyo region for now! Just can't log into console and some other stuff.

happymellon7mo ago

Check the URL, we had an issue a couple of years ago with the Workspaces. US East was down but all of our stuff was in EU.

Turns out the default URL was hardcoded to use the us east interface and just by going to workspaces and then editing your URL to be the local region got everyone working again.

Unless you mean nothing is working for you at the moment.

thdhhghgbhy7mo ago

Doesn't this mean you are not regionally isolated from us-east-1?

j / k navigate · click thread line to collapse

0 comments

rdhatt7mo ago

scandox7mo ago

This is a remarkable and unfair truth. I have had this experience with Office365...when they're down a lot of customers don't care because all their customers are also down.

ta12437mo ago

It took me so long to realise this is what's important in enterprise. Uptime isn't important, being able to blame someone else is what's important.

If you're down for 5 minutes a year because one of your employees broke something, that's your fault, and the blame passes down through the CTO.

If you're down for 5 hours a year but this affected other companies too, it's not your fault

From AWS to Crowdstrike - system resilience and uptime isn't the goal. Risk mitigation isn't the goal. Affordability isn't the goal.

If the CEO is down at a different time than the CEO's buddies then it's that Dave/Charlie/Bertie/Alice can't cope and it's the CTO's fault for not outsourcing it.

As someone who likes to see things working, it pisses me off no end, but it's the way of the world, and likely has been whenever the owner and CTO are separate.

shermantanktop7mo ago

After that process comes the BS and PR step, where reality is spun into a cotton candy that makes the leader look good no matter what.

balls1877mo ago

> It took me so long to realise this is what's important in enterprise. Uptime isn't important, being able to blame someone else is what's important.

Yes.

What is important is having a Contractual SLA that is defensible. Acts of God are defensible. And now major cloud infrastructure outtages are too.

arvindh-manian7mo ago

“No one ever got fired for hiring IBM”

sam1r7mo ago

Sometimes we all need a tech shutdown.

sunrunner7mo ago

As they say, every cloud outage has a silver lining.

* Give the computers a rest, they probably need it. Heck, maybe the Internet should just shut down in the evening so everyone can go to bed (ignoring those pesky timezone differences)

* Free chaos engineering at the cloud provider region scale, except you didn't opt in to this one and know about in advance, making it extra effective

* Quickly figure out a map which of the things you use have a dependency on a single AWS region without no capability to change or re-route

rwky7mo ago

Back in the day people used to shut down mail servers at the weekend, maybe we should start doing that again.

giobox7mo ago

> https://en.wikipedia.org/wiki/Right_to_disconnect

1 more reply

kshacker7mo ago

Disconnect day

Gigachad7mo ago

myhf7mo ago

Azure outages: happens all the time, understandable, no way to prevent this

AWS outages: almost never happens, you should have been more prepared for when it does

Gigachad7mo ago

The 50 year old executive using the software doesn’t know what an AWS is and hardly knows what Amazon does outside of selling junk.

If you say it’s Microsoft then it’s just unavoidable.

Sparkyte7mo ago

I am down with that lets all build in US-East-1.

kelseydh7mo ago

Is us-east-1 equally unstable to the other regions? My impression was that Amazon deployed changes to us-east-1 first so it's the most unstable region.

hinkley7mo ago

But there are some people on Reddit who think we are all wrong but won't say anything more. So... whatever.

Nothing in the outage history really stands out as "this is the first time we tried this and oops" except for us-east-1.

morshu90017mo ago

Would think that Amazon safeguards their biggest region more, but no idea, I've never worked at AWS

hinkley7mo ago

Still, it would make a bit of sense if you can find a place in your code where crossing a region hurts less, to move some of your services to a different region.

While your business partners will understand that you’re down while they’re down, will your customers? You called yesterday to say their order was ready, and now they can’t pick it up?

ej_campbell7mo ago

And all your dependencies are co-located.

tokioyoyo7mo ago

Doing pretty well up here in Tokyo region for now! Just can't log into console and some other stuff.

happymellon7mo ago

Check the URL, we had an issue a couple of years ago with the Workspaces. US East was down but all of our stuff was in EU.

Turns out the default URL was hardcoded to use the us east interface and just by going to workspaces and then editing your URL to be the local region got everyone working again.

Unless you mean nothing is working for you at the moment.

thdhhghgbhy7mo ago

Doesn't this mean you are not regionally isolated from us-east-1?

j / k navigate · click thread line to collapse