Honestly, I have an app in production that isn't completely hardened against single zone outages. There was pressure to turn off some redundancy in our caching infra, and not every backend service we call is free of tenant affinity, so we could well lose at least 1/3rd of our customers in a single AZ failure in the wrong region, or have huge latency issues for all of our tenants based on high cache miss rates.
Having written this, I'm going to ping our SME on the cache replication and remind him that since the last time he benchmarked it, we've upgraded to a newer generation of EC2 instances that has lower latency, and could he please run those numbers again.