undefined | Better HN

0 pointsmlhpdx7mo ago0 comments

Cool, building in resilience seems to have worked. Our static site has origins in multiple regions via CloudFront and didn’t seem to be impacted (not sure if it would have been anyway).

My control plane is native multi-region, so while it depends on many impacted services it stayed available. Each region runs in isolation. There is data replication at play but failing to replicate to us-east-1 had no impact on other regions.

The service itself is also native multi-region and has multiple layers where failover happens (DNS, routing, destination selection).

Nothing’s perfect and there are many ways this setup could fail. It’s just cool that it worked this time - great to see.

Nothing I’ve done is rocket science or expensive, but it does require doing things differently. Happy to answer questions about it.

0 comments

SteveNuts7mo ago

> Our static site has origins in multiple regions via CloudFront and didn’t seem to be impacted

This seems like such a low bar for 2025, but here we are.

immibis7mo ago

You're also betting that CloudFront isn't one of the several AWS services that only works when us-east-1 is up.

mlhpdxOP7mo ago

Yeah, it's not clear how resilient CloudFront is but it seems good. Since content is copied to the points of presence and cached it's the lightly used stuff that can break (we don't do writes through CloudFront, which in IMHO is an anti-pattern). We setup multiple "origins" for the content so hopefully that provides some resiliency -- not sure if it contributed positively in this case since CF is such a black box. I might setup some metadata for the different origins so we can tell which is in use.

x3n0ph3n37mo ago

CloudFront isn't just for CDN, but also for DDoS protection. Writes through CloudFront are not an anti-pattern.

1 more reply

melbourne_mat7mo ago

Yep if you wrote lambda@edge functions, which are part of Cloudfront and can be used for authentication among other things, they can only be deployed to us-east-1

nijave7mo ago

I was under the impression it's similar to IAM where the control plane is in us-east-1 and the config gets replicated to other regions. In that case, existing stuff would likely continue to work but updates may fail

nothrabannosir7mo ago

afaik cloudfront TLS certs and access logs S3 buckets must be stored in us-east-1

mlhpdxOP7mo ago

True for certs but not the log bucket (but it’s still going to be in a single region, just doesn’t have to be Virginia). I’m guessing those certs are cached where needed, but I can also imagine a perfect storm where I’m unable to rotate them due to an outage.

I prefer the API Gateway model where I can create regional endpoints and sew them together in DNS.

AndrewKemendo7mo ago

How did you do resilient auth for keys and certs?

mlhpdxOP7mo ago

We use AWS for keys and certs, with aliases for keys so they resolve properly to the specific resources in each region. For any given HTTP endpoint there is a cert that is part of a the stack in that region (different regions use different certs).

The hardest part is that our customers' resources aren't always available in multiple regions. When they are we fall back to a region where they exist that is next closest (by latency, courtesy of https://www.cloudping.co/).

AndrewKemendo7mo ago

That’s what I’d expect a basic setup to look like - region/space specific

So you’re minimally hydrating everyone’s data everywhere so that you can have some failover. Seems smart and a good middle ground to maximize HA. I’m curious what your retention window for the failover data redundancy is. Days/weeks? Or just a fifo with total data cap?

mlhpdxOP7mo ago

Just config information, not really much customer data. Customer data stays in their own AWS accounts with our service. All we hold is the ARNs of the resources serving as destinations.

We’ve gone to great lengths to minimize the amount of information we hold. We don’t even collect an email address upon sign-up, just the information passed to us by AWS Marketplace, which is very minimal (the account number is basically all we use).

1 more reply

zild3d7mo ago

active/active? curious what the data stack looks like as that tends to be the hard part

mlhpdxOP7mo ago

The data layer is DynamoDB with Global Tables providing replication between regions, so we can write to any region. It's not easy to get this right, but our use case is narrow enough and rate of change low enought (intentionally) that it works well. That said, it still isn't clear that replication to us-east-1 would be perfect so we did "diff" tables just to be sure (it has been for us).

There is some S3 replication as well in the CI/CD pipeline, but that doesn't impact our customers directly. If we'd seen errors there it would mean manually taking Virginia out of the pipeline so we could deploy everyehere else.

rstupek7mo ago

So your global tables weren't impacted in us-east-1... I thought I read their status showed issues with global table replication

mlhpdxOP7mo ago

Our stacks in us-east-1 stopped getting traffic when the errors started and we’ve kept them out of service for now, so those tables aren’t being used. When we manually checked around noon (Pacific) they were fine (data matched) but we may have just gotten lucky.

zild3d7mo ago

cool thanks, we've been considering dynamo global tables for the same. We have S3 replication setup for cold storage data. For primary/hot DB there doesn't seem to be many other options for doing local writes

j / k navigate · click thread line to collapse

0 comments

SteveNuts7mo ago

> Our static site has origins in multiple regions via CloudFront and didn’t seem to be impacted

This seems like such a low bar for 2025, but here we are.

immibis7mo ago

You're also betting that CloudFront isn't one of the several AWS services that only works when us-east-1 is up.

mlhpdxOP7mo ago

x3n0ph3n37mo ago

CloudFront isn't just for CDN, but also for DDoS protection. Writes through CloudFront are not an anti-pattern.

1 more reply

melbourne_mat7mo ago

Yep if you wrote lambda@edge functions, which are part of Cloudfront and can be used for authentication among other things, they can only be deployed to us-east-1

nijave7mo ago

nothrabannosir7mo ago

afaik cloudfront TLS certs and access logs S3 buckets must be stored in us-east-1

mlhpdxOP7mo ago

I prefer the API Gateway model where I can create regional endpoints and sew them together in DNS.

AndrewKemendo7mo ago

How did you do resilient auth for keys and certs?

mlhpdxOP7mo ago

AndrewKemendo7mo ago

That’s what I’d expect a basic setup to look like - region/space specific

mlhpdxOP7mo ago

Just config information, not really much customer data. Customer data stays in their own AWS accounts with our service. All we hold is the ARNs of the resources serving as destinations.

1 more reply

zild3d7mo ago

active/active? curious what the data stack looks like as that tends to be the hard part

mlhpdxOP7mo ago

rstupek7mo ago

So your global tables weren't impacted in us-east-1... I thought I read their status showed issues with global table replication

mlhpdxOP7mo ago

zild3d7mo ago

j / k navigate · click thread line to collapse