DoorDash manages high-availability CockroachDB clusters at scale | Better HN

DoorDash manages high-availability CockroachDB clusters at scale | Better HN

61 comments

gizmo2y ago

DoorDash has about 35 million users, and there is zero interaction between users. The median user uses doordash maybe once a week. So 5 million sessions a day, all happening in the same 3 hour window. That's 2 million sessions per hour at peak times.

How does DoorDash get to 1.2 million queries per second. 1.2mqps * 10000 seconds in 3 hours = 12 billion queries to process 5 million orders? That's wild. Is it all analytics? This is highly suspect. 35m users isn't nothing, but it isn't exactly Facebook scale either.

BrentOzar2y ago

I’m not excusing the wild number, but just tossing out some additional load: * Drivers checking in for work, especially if the apps poll automatically * Drivers phoning home with live location updates * Restaurants sending automated updates on order status * Push notifications to users with status changes on their orders * Users with multiple devices (like I have at least 5 devices with the UberEats app)

jakjak1232y ago

Yes, our server had 120k queries/sec, but 80% of that traffic was driver heartbeats or connection verification. We halved it by disabling the connection verification query.

cdchn2y ago

Even just searching and browsing restaurants and menus is probably dozens of queries for every interaction.

scottlamb2y ago

> 12 billion queries to process 5 million orders?

2,400 queries per order? That's not that crazy IMHO. There might be significant database fan-out on each click (depending on how they do geographic lookups, search ranking / synonyms / sponsored stuff, the repeat your last order features, whether the ranked search returns the full object or a reference that then has to be individually queried, etc.). There might be many clicks per order because people browse a lot (both to find a restaurant then to find dishes within the restaurant), leave reviews, poll for delivery status updates, etc.

gizmo2y ago

That's fair but that also suggests most actions hit the main database directly instead of caching layers. Possible, but somewhat unusual at this scale.

> 2,400 queries per order? That's not that crazy IMHO.

Isn't that off by at least an order of magnitude though? It forces them to operate a much larger cluster than should be necessary.

Given how these blog posts are typically written, it is very likely the 1.2 million QPS figure is an all-time peak, not anything like an average.

orangechairsOP2y ago

According to their slides/video (bottom of the blog post), the 1.2 million QPS is their daily peak number, not the average.

It’s naive to think that database access is only happening when a customer makes an order. Each driver has workflows they exercise and data that needs stored. Same with vendors. There could be operational data for their infrastructure.

That said 1.2 million queries per second is wild. Would be interesting to see the breakdown.

jen202y ago

> there is zero interaction between users

A curious description for a platform which acts as a broker for transactions between users!

Massive amounts of user tracking.

Thaxll2y ago

Your numbers are off by a large factor.

milkglass2y ago

It's all the new reminders telling users to tip.

xyst2y ago

> About 1.2 million queries per second at daily peak hours.

> About 2,300 total nodes spread across 300+ clusters.

> About 1.9 petabytes of data on disk.

> Close to 900 changefeeds.

> Largest cluster is currently 280 TB in size (but has peaked above 600 TB), with a single table that is 122 TB.

all of this yet my food still arrives cold af

kidding aside, I wonder if DD has the same problems as Uber or Lyft except with food delivery. Each new "change feed" is a specific region, county/municipality, or city. Federal, state, and local laws all handled delicately.

orangechairsOP2y ago

DoorDash's engineering blog has a much more indepth look at their architecture: https://doordash.engineering/2023/02/07/how-we-scaled-new-ve...

> my food still arrives cold af

Ha.

The first thing I noticed and you almost got to it in your summary: at 1.2MM/2300 = 520 qps per node, this isn't a wild setup. I'm wrapping my head around how they're generating that amount of load. Seems like an easy task for any database to handle.

rickreynoldssf2y ago

I'm not really seeing why DoorDash needs all their operational data in one monster clustered database. I would think its so much simpler to shard the data by region for operational queries and aggregate in the background for long-term storage.

jordanthoms2y ago

Sharding is anything but simple. A single shard per region wouldn't have enough write capacity so they'd have to be managing likely 100+ shards in each region - you'd have to build a lot of infrastructure to automate setting those up, rebalancing traffic to avoid hot spots and underutilized shards, in sync with schema migrations etc.

Even after that, now your applications using the DB have to be aware of the sharding - interactions between users who are housed on different shards etc could require a lot of work at the application layer. If your customers can be easily be split into tenants which never interact with each other this isn't so bad but for a consumer app like DoorDash there isn't clear tenant boundaries.

We looked at all this for Kami and realised that it would be much easier for us to move from PostgreSQL to CockroachDB (we had exceeded the write capacity of a single PostgreSQL primary) than to shard Postgres, and it'd make future development much faster. We could have made sharding work if we had to... but it's not 2013 any more and we have distributed SQL databases, why not use them?

cellularmitosis2y ago

That’s surprising — the education market seems like an even better fit for sharding: students and teachers generally stay within the context of a single school.

JohnBooty2y ago

I've never done geographic sharding but it seems kind of hard. How do you pick shard boundaries? How do you deal with entities who are near the boundaries and whose current operational data therefore spans >1 shards? (Imagine somebody at near the geographic intersection of like, five shards looking for pizza in a 10 miles radius or w/e)

Also the majority of entities they're tracking (users, drivers) do not have fixed locations.

Maybe it's not as hard as I'm thinking. I guess you just have to accept that any query can span an arbitrary number of shards and the results need to be union'd.

I'm sure a lot of smart people have tackled this at the DoorDashes and Ubers of the world and maybe there's some optimal way of handling it. I would love to hear about that.

scottlamb2y ago

Great points that show regional databases are not obviously simpler than one global database, especially from the application developer's perspective.

> How do you deal with entities who are near the boundaries and whose current operational data therefore spans >1 shards? (Imagine somebody at near the geographic intersection of like, five shards looking for pizza in a 10 miles radius or w/e)

Hitting 5 shards might not be that bad. I think you could divide the world into sufficiently large hexagonal tiles; you'd hit at most three shards then. Maybe each fixed-size tile is a logically separate database. Some would be much hotter than others, so you don't want to like back each by a fixed-size traditional DBMS or something; that'd be pretty wasteful.

> Also the majority of entities they're tracking (users, drivers) do not have fixed locations.

Yeah, you at least want a global namespace for users with consistent writes. The same email address belonging to different people in different regions is unacceptable. In theory the global data here could just be a rarely-updated breadcrumb pointing to which database holds the "real" data for that user. [1] So you can make the global database be small in terms of data size, rarely-written, mostly read in eventually consistent fashion, and not needed for geographically targeted queries. That could be worthwhile. YMMV.

[1] iirc Spanner and Megastore have a feature called (something like) "global homing" that is somewhat similar to this. For a given entity, the bulk of the data is stored in some subset of the database's replicas, and bread crumbs indicate which. If you get a stale bread crumb, you follow the trail, so looking up bread crumbs with eventually consistent reads is fine. [edit to add a bit more context:] One major use case for this is Gmail. It has tons of regions in total, but replicating each user's data to more than 2 full replicas + 1 witness would be absurdly wasteful.

[edit:] looks like CockroachDB has the concept of a per-row preferred region, which might also be vaguely similar. <https://www.cockroachlabs.com/docs/v23.1/table-localities#re...> I haven't used CockroachDB and only skimmed this doc section.

jfim2y ago

> I've never done geographic sharding but it seems kind of hard. How do you pick shard boundaries? How do you deal with entities who are near the boundaries and whose current operational data therefore spans >1 shards? (Imagine somebody at near the geographic intersection of like, five shards looking for pizza in a 10 miles radius or w/e)

You could do it by market (eg. SFBA, Los Angeles, San Diego) or by state.

You could probably just do it by continent as no one from SA would order anything from Europe but at that point each of those databases is probably big enough that you would need a similar solution as if you just had a single one

mbyio2y ago

Cockroach automates the sharding of data by region and provides tools that let you use and manage it more like a traditional database. If they didn't use cockroach, they would have to write/setup tools and adapters to do all that anyway. It would probably be more familiar to developers conceptually if they used traditional sharding, but why build and maintain all that when you can just use off the shelf software?

karmakaze2y ago

CockroachDB had "Follow the sun" multi-region data balancing, which then got generalized to "Follow the workload"[0].

[0] https://www.cockroachlabs.com/docs/stable/topology-follow-th...

mike_d2y ago

Because CockroachDB is a vendor that abstracts away all the thinking parts of running a database cluster. They do regional sharding, clustering, consistency, etc. for you.

They could have just as easily dropped in Oracle. You pay for expensive DB up front, and can hire cheaper junior DBAs and developers going forward.

The article says they have 300+ clusters, not one monster one.

esafak2y ago

Manual sharding is a crutch and a pain. Just use a distributed database, and let the database company worry about it.

killingtime742y ago

Resume driven development

ravenstine2y ago

But it's at scale!!!!!

sean0-2y ago

Ha! Amazing (didn’t know this was being written or put up).

This is a summary of a recent conference talk:

https://youtu.be/jCjrfpF64Kc?si=Gf-gp_ixX2V6Qz8V

This was my team. We did and lived this. AMA.

sverhagen2y ago

Well, it looks like the sibling threads are very interested to know where the need comes from to even _have_ 1.2 million queries per second. How does that break down? How much is that just core functionality versus analytics and tracking?

sean0-2y ago

That's core functionality, not analytics. Nearly all of that is browsing, ordering, location updates, etc. The dismissive comments are amusing and show a lack of understanding of how the business works and, subsequently, the technology required to power the end-to-end flow for users.

joshstrange2y ago

Do you know what DoorDash doesn’t manage? A staging/test environment. All testing for API integrations is done in prod on shared account. The docs and the API endpoints themselves leave a lot to be desired as well.

jvans2y ago

I've been advocating for this approach for a long time. At some level of size it is so brutally difficult to maintain an environment that mirrors production that the effort isn't worth it. With enough tooling in place you can mitigate the risk to customers significantly

stingraycharles2y ago

So I assume they use feature flags instead, and staggered rollout of new features? As that’s a common alternative to heavy up-front testing.

snihalani2y ago

interesting. curious if anyone has benchmarked it relative to other dbs. like: https://benchmark.clickhouse.com/

karmakaze2y ago

CrDB is not about many many low latency queries, like say MySQL. It's designed more for getting your workload processes down to making as few large queries as every one incurs quorum latencies. You don't want to prototype something in Rails and hope there's no hidden lazy queries happening along the way.

It wouldn't be a good idea to take a large working PostgreSQL app and try to switch over to using CrDB. You'd spend all your time (unwittingly rewriting the entire app) speeding up and grouping a few queries at a time.

jordanthoms2y ago

We moved took a large working PostgreSQL app and switched it over to CRDB and that doesn't match my experience. Our existing schemas and query patterns moved over nicely - latency for small indexed reads and writes did increase from ~1ms to ~3ms, but the max throughput now effectively unlimited since we could add capacity by adding new nodes into the cluster and letting CRDB automatically rebalance the workload. There was an increase in cost as it will need more cores, disk etc compared to a single-primary PostgreSQL, but that makes sense when you consider that every bit of data is getting stored on 5 different nodes and there are overheads to maintain the consistency.

For the highest throughput endpoints we did make some changes to be more optimal on CRDB so we could run a smaller cluster, but it didn't require anything close to a rewrite.

namibj2y ago

You can have small queries, they just have to be be sent before you block on the results from the first of each group.

jordanthoms2y ago

Clickhouse is a totally different use case - Cockroach is OLTP, Clickhouse is OLAP. We use both Cockroach and Clickhouse at scale and they are both great but not competing products - Cockroach is great for the types of reads and writes you do when serving user requests, processing transactions etc, but isn't optimal for analytics queries where you are going do things like read and aggregate data on a 50TB table. Clickhouse eats those kinds of aggregate queries for breakfast, and is fast for some types of small read queries too, but it's not built to handle random writes or frequently updating rows of data.

cebert2y ago

This reads like a long form advertisement.

al_borland2y ago

Case studies hosted on a company's own website generally are. It's kinds of an, "it worked for them, so it will work for you," thing.

candiddevmike2y ago

"Art of the possible" (YMMV)

j / k navigate · click thread line to collapse