--------
An architect demoed the failure of a shard and the automatic promotion of its backup shard to main, in production. They actually test their failure models.As I see it, sharding is not very hard. HA is not very hard given a reasonable SLA. But sharding with HA on a large setup that actually works is pretty hard.
Another thing that stuck in my mind was their high throughput-per-provisioned-hardware ratio. With not much hardware they were pulling 80k queries per second with room to spare.
Although I have to say, that's not much compared to GitHub which pulls 1.2 million queries/sec on Vitess [0].
[0] https://github.blog/2021-09-27-partitioning-githubs-relation...