Zero-latency SQLite storage in every Durable Object (opens in new tab)

(simonwillison.net)

285 pointsajhit4061y ago103 comments

103 comments

Some other interesting points:

- The write api is sync, but it has a hidden async await: when you do your next output with a response, if the write fails the runtime will replace the response with a http failure. This allows the runtime to auto-batch writes and optimistically assume they will succeed, without the user explicitly handling the errors or awaits.

- There are no read transactions, which would be useful to get a pointer to a snapshot at a point in time.

- Each runtime instance is limited to 128mb RAM.

- Websockets can hibernate and you do not have to pay for the time they are sleeping. This allows your clients to remain connected even when the DO is sleeping.

- They have a kind of auto RPC ability where you can talk to other DOs or workers as if they are normal JS calls, but they can actually be calling another data center. The runtime handles the serialisation and parsing.

crabmusket1y ago

The RPC stuff is pretty interesting. More here: https://blog.cloudflare.com/javascript-native-rpc/

skybrian1y ago

Without a schema, I’m wondering about validation. I guess your server should use Zod or an equivalent library?

ngrilly1y ago

> The write api is sync, but it has a hidden async await: when you do your next output with a response, if the write fails the runtime will replace the response with a http failure. This allows the runtime to auto-batch writes and optimistically assume they will succeed, without the user explicitly handling the errors or awaits.

It reminds me of PostgreSQL's commit_delay, even thought it's not exactly the same principle: https://www.postgresql.org/docs/current/runtime-config-wal.h...

Litestream, mentioned in the post, is also suggesting a similar technique.

matharmin1y ago

Just wondering, do you have a specific use case for read transactions implemented on the database level here?

In SQLite in general read transactions are useful since you can access the same database from multiple processes at a time. Here, only a single process can access the database. So you can get the same effect as read transactions either by doing all reads in one synchronous function, or implement your own process-level locking.

emadda1y ago

E.g. if you have many websocket connections and they each have a snapshot at a point in time (that spans over many different await function calls/ws messages).

SQLite can have many readers and a single writer with WAL, so a many read transactions can exist whilst the writers move the db state forward.

kentonv1y ago

We (Cloudflare) have considered adding an API to create multiple "database connections", especially to be able to stream a response from a long-running cursor while representing a consistent snapshot of the data.
It's a bit tricky since if you hold open that old connection, the WAL could grow without bound and cannot be checkpointed back into the main database. What do we do when the WAL gets unreasonably large (e.g. bigger than the database)? Cancel old cursors so we can finally checkpoint? Will that be annoying for app developers to deal with, e.g. causing errors when traffic is high?
SQLite itself calls an open database a "connection" even though there's no actual network involved.

2 more replies

tmikaeld1y ago

> ..each DO constantly streams a sequence of WAL entries to object storage - batched every 16MB or every ten seconds.

Which also means it may take 10 seconds before you can (reliably) read the write globally.

I keep failing to see how this can replace regionally placed database clusters which can serve a continent in milliseconds.

Edit: I know it uses streams, but those are only to 5 followers and CF have hundreds of datacenters. There is no physical way to guarantee reads in seconds unless all instances of the SQLite are always connected and even then, packet latency will cause issues.

firtoz1y ago

AFAIK the writes and reads are done only from the same process, so the long term storage will apply only if the current process is hibernated. When you write something and then read it, it's immediate, because the writes and reads are also updating the current process's state in memory.

For another process (e.g. another DO or another worker) to access the data, they need to go through the DO which "contains" the data, so they'd be making a RPC or a HTTP request to the DO, and they'd get the latest information.

+ the hibernation happens after x seconds of inactivity, so it feels like the only time a data write to be unavailable as expected would be when the DO or worker crashes right after a write.

tmikaeld1y ago

You're right that reads and writes are immediate in the same client connection, this is how it works with CF KV as well - but not across the entire network.

On KV they expect up to 30 second latency before a write can be written everywhere, I expect similar here.

1 more reply

neamar1y ago

The writes are streamed in near real time to five followers, acknowledging it near instantly. The cloudflare blog article mention this more in depth. So writes remain fast, while still having durability.

kentonv1y ago

As others have noted, you misunderstand how Durable Objects work. All traffic addressed to the same object is routed to a single machine where that object lives. That machine always has a consistent view of its SQLite database. You can have billions of objects, but each has its own separate database. There's no way to read from a database directly from a different machine than the one the DO is running on.

memothon1y ago

Those WAL entries streamed to object storage I think are just for backups.

Each DO is globally unique (there's one DO with a given id running anywhere) and runs sqlite on its own local storage in that datacenter.

simonw1y ago

One thing I don't understand about Durable Objects yet is where they are physically located.

Are they located in the region that hosted the API call that caused them to be created in the first place?

If so, is there a mechanism by which a DO can be automatically migrated to another location if it turns out that e.g. they were created in North America but actually all of the subsequent read/write traffic to them comes from Australia?

mhart1y ago

By default in the region you created them in, but you can alternatively specify a locationHint. Use "oc" for Australia. https://developers.cloudflare.com/durable-objects/reference/...

Note the "Dynamic relocation of existing Durable Objects is planned for the future"

simonw1y ago

Thanks, added that to my post.

dantiberian1y ago

https://where.durableobjects.live is a good website that shows you where they live. Only about 10-11% of Cloudflare PoPs host durable objects. Requests to another PoP to create a DO will get forward to one of the nearby PoPs which do host them.

masterj1y ago

> Durable Objects do not currently change locations after they are created

> Dynamic relocation of existing Durable Objects is planned for the future.

https://developers.cloudflare.com/durable-objects/reference/....

IIRC Orleans (https://www.microsoft.com/en-us/research/wp-content/uploads/...) allows actors to be moved between machines, which should map well to DOs being moved between locations.

pests1y ago

As actors in Orleans are virtual and persistent it can also be the case it is running nowhere.

If it's stateless it could be running in multiple locations.

I worry "Dynamic relocation of DOs" might be going a bit too granular, this should be something the runtime takes care of.

ko_pivot1y ago

Durable Objects have long term storage. They get hydrated from that storage, so in that sense, they can move to any Cloudflare DS. However, there is no API call to move a Durable Object. It has to have no connections and then gets recreated in the DS nearest to the next/first connection. Memory gets dropped when that happens, storage survives. (This is slightly out of date as they have some nuanced hibernation stuff that is recent).

crabmusket1y ago

Not an answer to your question, but shoutout to https://where.durableobjects.live/

jwblackwell1y ago

Does anyone else struggle to wrap their head around a lot of this new cloud stuff?

I have 15+ years experience of building for the web, using Laravel / Postgres / Redis stack and I read posts like this and just think, "not for me".

djtango1y ago

From the article:

> For useful background on the first version of Durable Objects take a look at Cloudflare's durable multiplayer moat by Paul Butler, who digs into its popularity for building WebSocket-based realtime collaborative applications.

First apps that come to mind that have RT collaboration:

- Google Docs/Sheets etc

- Notion

- Miro

- Figma

These are all global scale collaborative apps, I'm not sure a Laravel stack will support those use cases... Google had to in house everything and probably spearheaded the usage of CRDTs ( this is a guess!) but as the patterns emerge and the building blocks get SAASified, mass-RT collaboration no longer becomes a giant engineering problem and more and more interesting products get unlocked

jlokier1y ago

> Google had to in house everything and probably spearheaded the usage of CRDTs ( this is a guess!)

Fwiw, Google Docs/Sheets etc don't use CRDTs, they use the more server-oriented Operational Transforms (OT). CRDTs were spearheaded by others.

fastball1y ago

Google actually uses OT for their collab.

skrebbel1y ago

I really love the Durable Object design, particularly because it's easy to understand how it works on the inside. Unlike lots of other solutions designed for realtime data stuff, Durable Objects have a simplicity to them, much like Redis and Italian food. You can see all the ingredients. Given enough time and resources (and datacenters :) ), a competent programmer could read the DO docs and reimplement something similar. This makes it easy to judge the tradeoffs involved.

I do worry that DOs are great for building fast, low-overhead, realtime experiences (eg five people editing a document in realtime), but make it very hard to make analyses and overviews (which groups of people have been which editing documents the last week?). Putting the data inside SQLite might make that even harder - you'd have to somehow query lots and lots of little SQLite instances and then merge the results together. I wonder if there's anything for this with DOs, because this is what keeps bringing me back to Postgres time and time again: it works for core app features and for overviews, BI, etc.

parsadotsh1y ago

I think for those cases you're expected to use something like this: https://developers.cloudflare.com/analytics/analytics-engine...

stavros1y ago

This is a really interesting design, but these kinds of smart systems always inhabit an uncanny valley for me. You need them in exactly two cases:

1. You have a really high-load system that you need to figure out some clever ways to scale.

2. You're working on a toy project for fun.

If #2, fine, use whatever you want, it's great.

If this is production, or for Work(TM), you need something proven. If you don't know you need this, you don't need it, go with a boring Postgres database and a VM or something.

If you do know you need this, then you're kind of in a bind: It's not really very mature yet, as it's pretty new, and you're probably going to hit a bunch of weird edge cases, which you probably don't really want to have to debug or live with.

So, who are these systems for, in the end? They're so niche that they can't easily mature and be used by lots of serious players, and they're too complex with too many tradeoffs to be used by 99.9% of companies.

The only people I know for sure are the target market for this sort of thing is the developers who see something shiny, build a company (or, worse, build someone else's company) on it, and then regret it pretty soon and move to something else (hopefully much more boring).

Does anyone have more insight on this? I'd love to know.

crabmusket1y ago

As far as I can tell, multiplayer is the killer app for Durable Objects. If you want to build another Figma, Google Docs, etc, the programming model of Durable Objects is super handy.

This article goes into it more: https://digest.browsertech.com/archive/browsertech-digest-cl...

I think this old article is quite relevant too: http://ithare.com/scaling-stateful-objects/

Anyone who read the Figma multiplayer article and thought "that's kind of what I need" would be well served by Durable Objects, I think. https://www.figma.com/blog/rust-in-production-at-figma/

There are other approaches - I've worked in the past with CRDTs over WebRTC which felt absolutely space-age. But that's a much more complicated foundation compared to a websocket and a single class instance "somewhere" in the cloud.

stavros1y ago

That's a very interesting use case. Given that your "players" aren't guaranteed to be local to the DO, doesn't using DOs only make sense in high-traffic situations again? Otherwise you might as well just serve the players from a conventional server, no?

CRDTs really do sound amazing, though.

crabmusket1y ago

Best case, the players are co-located in a city or country, and they'll benefit from data center locality.

Worst case, they're not co-located, and one participant has good latency, and the other doesn't. This is equivalent to the "deploy the backend in a single server/datacenter" approach.

Aside from the data locality, I still find the programming model (a globally-unique and addressable single-threaded class instance) to be quite nice, and would want to emulate it even without the Cloudflare edge magic.

2 more replies

skybrian1y ago

Some games have regions and you only see players in the same region. For example, a “Europe” region. If you’re in the US and you connect to the Europe region, you know that you should expect some lag.

And it seems like that would work just as well with durable objects.

dumbo-octopus1y ago

In practice you’re most likely to be collaborating with other folks on your school project group, work team, close family, etc. Sure there are exceptions, but generally speaking picking a service location near your first group member ensures low latency for them (and they’re probably most engaged), and is likely to have lowish latency for everyone else.

On the flip side, picking US-East-1 gives okayish latency to folks near that, and nobody else.

1 more reply

klabb31y ago

Databases is an extremely slow-maturing area, similar to programming languages, but are all deviations from Postgres shiny and hipster?

The idea of colocating data and behavior is really a quantifiable reduction in complexity. It removes latency and bandwidth concerns, which means both operational concerns and development concerns (famously the impact of the N+1 problem is greatly reduced). You can absolutely argue that networked Postgres is better for other reasons (and you may be right) but SQLite is about as boring and predictable as you can get, with known strong advantages. This is the reason it’s getting popular on the server.

That said, I don’t like the idea of creating many small databases very much - as they suggest with Durable Objects. That gives noSQL nightmares - breaking all kinds of important invariants of relational dbs. I think it’s much preferable to use SQLite as a monolithic database like it’s done in their D1 product.

crabmusket1y ago

> That gives noSQL nightmares - breaking all kinds of important invariants of relational dbs

IMO Durable Objects map well to use cases where there actually are documents. Think of Figma. There is a ton of data that lives inside the literal Figma document. It would be awful to have a relational table for like "shapes" with one row per rectangle across Figma's entire customer base. That's just not an appropriate use of a relational database.

So let's say I built Figma on MongoDB, where each Figma document is a Mongo document. That corresponds fairly straightforwardly to each Figma document being a Durable Object instance, using either the built-in noSQL storage that Durable Objects already have, or a small Sqlite relational database which does have a "shapes" table, but only containing the shapes in this one document.

2 more replies

8n4vidtmkvmk1y ago

N+1 problem is also reduced if you keep your one and only server next to your one and only database.

This was actually the solution we came up with at a very big global company. Well, not 1 server, but 1 data center. If your write leaders are all in one place it apparently doesn't matter that everything else is global, for certain write requests at least.

masterj1y ago

If you adopt a wide-column db like Cassandra or DynamoDB, don’t you have to pick a shard for your table? The idea behind Durable Objects seems similar

1 more reply

jmtulloss1y ago

If you're in #1, you talk to CloudFlare. They need some great customer stories and they have some great engineers that are most likely willing to work with you on how this will work/help you with bugs in exchange for some success stories. If it gets proven out this turns into a service relationship, but early on it's a partnership.

danpalmer1y ago

I'd view the split here along the axes of debuggability/introspection.

There are many services that just don't require performance tuning or deep introspection, things like internal tools. This is where I think serverless frameworks do well, because they avoid a lot of time spent on deployment. It's nice if these are fast, but that's rarely a key requirement. Usually the key requirement is that they are fast to build and low maintenance. It's possible that Cloudflare have got a good story for developer experience here that gets things working quickly, but that's not their pitch, and there are a lot of services competing to make this sort of development fast.

However where I don't think these services work well is when you have high debuggability and introspection requirements. What metrics do I get out of this? What happens if some Durable Objects are just slow, do we have the information to understand why? Can we rectify it if they are? What's the logging story, and how much does it cost?

I think these sorts of services may be a good idea for a startup on day 1 to build some clever distributed system in order to put off thinking about scaling, but I can't help but think that scale-up sized companies would be wanting to move off this onto something they can get into the details more with, and that transition would be a hard one.

camgunz1y ago

First, this is very insightful--I think most people should go through this exact analysis before architecting a system.

As others have said, the use is multiplayer, and that's because you need everyone to see your changes ASAP for the app to feel good. But more broadly, the storage industry has been trying to build something that's consistent, low latency, and multiuser for a long time. That's super hard, just from a physics point of view there's generally a tradeoff between consistency and latency. So I think people are trying different models to get there, and a lot of that experimentation (not all, cf Yugabyte or Cockroach) is happening with SQLite.

yen2231y ago

I almost have the opposite view:

When starting out you can get away with using a simple Postgres database. Postgres is fine for low-traffic projects with minimal latency constraints, and you probably want to spend your innovation tokens elsewhere.

But in very high-traffic Production cases with tight latency requirements, you will start to see all kinds of weird and wacky traffic patterns, that barebones Postgres won't be able to handle. It's usually in these cases where you'd need to start exploring alternatives to Postgres. It's also in these cases where you can afford to hire people to manage your special database needs.

simonw1y ago

Have you worked on any examples of projects that started on PostgreSQL and ended up needing to migrate to something specialized?

1 more reply

gregwebs1y ago

There are a lot of cases of low traffic applications that aren’t toys but instead are internal tools- this could be a great option for those.

For higher traffic they are asking you to figure out how to shard your data and it’s compute. That’s really hard to do without hitting edge cases.

stavros1y ago

Why would you use this for an internal, low-traffic tool over Postgres?

2 more replies

MuffinFlavored1y ago

> If this is production, or for Work(TM), you need something proven.

I feel like part of Cloudflare's business model is to try to convince businesses at scale to solve problems in a non-traditional way using technology they are cooking up, no matter the cost.

blixt1y ago

I'm constantly impressed by the design of DOs. I think it's easy to have a knee-jerk reaction that something is wrong with doing it this way, but in reality I think this is exactly how a lot of real products are implicitly structured: a lot of complex work done at very low scale per atomic thing (by which I mean, anything that needs to be transactionally consistent).

In retrospect what we ended up building at Framer for projects with multiplayer support where edits are replicated at 60 FPS while being correctly ordered for all clients is a more applied version of what DOs are doing now. We also ended up with something like a WAL of JSON object edits so in case a project instance crashed its backup could pick up as if nothing had happened, even if committing the JSON patches into the (huge) project data object didn't have time to occur (on an every-N-updates/M-seconds basis just like described here).

bluehatbrit1y ago

This is probably a really stupid question, but how would one handle schema migrations with this kind of setup? My understanding is it's aimed at having a database per-tenant (or even more broken down than that). Is there a sane way of handling schema migrations, or is the expectation that these databases are more short-lived and so you support multiple versions of the db (DO) until it's deleted?

In my head, this would be a fun way to build a bookmark service with a DO per user. But as soon as you want to add a new field to an existing table, you meet a pretty tricky problem of getting that change to each individual DO. Perhaps that example is too long lived though, and this is designed for more ephemeral usage.

If anyone has any experience with this, I'd be really interested to know what you're doing.

simonw1y ago

You'd need to roll your own migrations.

I have a version of that for SQLite written in Python, but I'm not sure if you could run that in Durable Objects - maybe via WASM and PyOdide? Otherwise you'd have to port it to JavaScript.

https://github.com/simonw/sqlite-migrate

bluehatbrit1y ago

Appreciate the response (and the blog post itself)! I probably worded my question poorly, but I'm more wondering about executing schema migrations against a large number of DO's as part of a deployment (such as 1 per customer).

I suppose the answer is "it's easier to have 1 central database/DO", but it feels like this approach to data storage really shines when you can have a DO per tenant.

simonw1y ago

A pattern where you check for and then execute any necessary migrations on initialization of a Durable Object would actually work pretty well I think - presumably you can update the code for these things without erasing the existing database?

1 more reply

segalord1y ago

Noticing CF pushing for devs to use DO for eveything over workers these days. Even websocket connections on workers get timed out after ~30s and the recommended way is to use DO for them

rozenmd1y ago

Durable Objects have always been the recommended way to do websocket connections on Cloudflare Workers? (as far as I remember, anyway)

The original chat demo dates back to 2020, using DOs + websockets: https://github.com/cloudflare/workers-chat-demo

kondro1y ago

Does this mean SQLite for DO can lose up to 10 seconds of data in the event of a failing DO?

stavros1y ago

> To ensure durability beyond that ten second window, writes are also forwarded to five replicas in separate nearby data centers as soon as they commit, and the write is only acknowledged once three of them have confirmed it.

I think Simon meant "within", rather than "beyond", here.

simonw1y ago

Thanks, I've updated that word.

_1tem1y ago

What I don’t understand is why, in the example of flight seat mapping provided, you create a DO per flight. So does a DO correspond to a “model” in MVC architecture? What if I used DOs in a per-tenant way, so one DO per user. And then how do I query or “join” across all DOs to find all full flights? I guess you would have to design your DOs such that joins are not required?

ec1096851y ago

They support “function” calling between DOs, so you are able to compose a response from more than one DO.

braden-lk1y ago

Durable objects seem so cool but the pricing always scares me. (Specifically, having to worry about getting hibernation right.) They’d be a great fit for our yjs document based strategy, but while everything in prod still works on plain ol redis and Postgres, it’s hard to justify an exploration.

attilakun1y ago

Does CloudFlare have proper spending caps? If they have, I'd be open to try DOs but if they don't, it's a non-starter for an indie dev as I can't risk bankruptcy due to a bad for loop.

viraptor1y ago

It's not just the listed prices either. There was a story here not long ago where they essentially requested someone to migrate to an enterprise plan or get out. With AWS it's pretty common to get a refund for accidental abuse. From my contact so far and from stories here, I wouldn't expect anything close to that treatment from CF.

ignoramous1y ago

> Specifically, having to worry about getting hibernation right.

As long as the client doesn't exchange websocket messages with DO, it'll hibernate. From what I can tell, ping/pong frames don't count towards uptime, if you're worried about that.

anentropic1y ago

What scares me is it is super specific to Cloudflare

What is your option if you want to eject to another cloud?

paulgb1y ago

For the specific example of a Yjs backend, I happen to be working on one that can be hosted either on Cloudflare or as a native process. We’ve had people running in production migrate from cloudflare to native just by swapping out the URL they connect to in their application config.

https://github.com/jamsocket/y-sweet

vlaaad1y ago

Re https://where.durableobjects.live/ — why the hell are they still operating in Russia?

esnard1y ago

From https://blog.cloudflare.com/steps-taken-around-cloudflares-s... :

> Since the invasion, providing any services in Russia is understandably fraught. Governments have been united in imposing a stream of new sanctions and there have even been some calls to disconnect Russia from the global Internet. As discussed by ICANN, the Internet Society, the Electronic Frontier Foundation, and Techdirt, among others, the consequences of such a shutdown would be profound.

> [...]

> Beyond this, we have received several calls to terminate all of Cloudflare's services inside Russia. We have carefully considered these requests and discussed them with government and civil society experts. Our conclusion, in consultation with those experts, is that Russia needs more Internet access, not less.

9dev1y ago

I would love to work with Durable Objects and all the other cool stuff from Cloudflare, but I’m really hesitant to make a single cloud providers technology the backbone of my application. If CF decides to pull the plug, or charge a lot more, the only way to migrate elsewhere would be rebuilding the entire app.

As long as there aren’t any comparable technologies, or abstraction layers on top of DOs, I’m not going to make the leap of faith.

rcarmo1y ago

The first thing I wondered was how this plays with data residency and privacy/regulatory requirements.

avinassh1y ago

I'd love to know how they have hooked VFS with WAL to monitor changes. The SQLite's WAL layer deals with page numbers where as VFS deals with file and byte offsets. I am curious to understand how they mapped it, how they get new writes to the WAL and read from the WAL.

kikimora1y ago

This design does not handle hot partitions well and they are ubiquitous to so many domains.

simonw1y ago

Your partition would have to be VERY hot for SQLite not to be able to handle it - anything up to several thousand writes per second would likely work fine.

Since this is all running on Cloudflare you could scale reads with a 1 second cache TTL somewhere, which would drop your incoming read queries to around one per second no matter how much read traffic you had.

CyberDildonics1y ago

What is the difference between a "durable object" and a file?

simonw1y ago

You can read the full article for an answer to that: https://blog.cloudflare.com/sqlite-in-durable-objects/

Short version: it's replicated to five data centers on every transaction, and backed up as a stream to object storage as well.

CyberDildonics1y ago

So it's a file that gets backed up like dropbox

1 more reply

pajeets1y ago

wonder how this works with Pocketbase

j / k navigate · click thread line to collapse

103 comments

emadda1y ago

Some other interesting points:

- There are no read transactions, which would be useful to get a pointer to a snapshot at a point in time.

- Each runtime instance is limited to 128mb RAM.

- Websockets can hibernate and you do not have to pay for the time they are sleeping. This allows your clients to remain connected even when the DO is sleeping.

crabmusket1y ago

The RPC stuff is pretty interesting. More here: https://blog.cloudflare.com/javascript-native-rpc/

skybrian1y ago

Without a schema, I’m wondering about validation. I guess your server should use Zod or an equivalent library?

ngrilly1y ago

It reminds me of PostgreSQL's commit_delay, even thought it's not exactly the same principle: https://www.postgresql.org/docs/current/runtime-config-wal.h...

Litestream, mentioned in the post, is also suggesting a similar technique.

matharmin1y ago

Just wondering, do you have a specific use case for read transactions implemented on the database level here?

emadda1y ago

E.g. if you have many websocket connections and they each have a snapshot at a point in time (that spans over many different await function calls/ws messages).

SQLite can have many readers and a single writer with WAL, so a many read transactions can exist whilst the writers move the db state forward.

kentonv1y ago

2 more replies

tmikaeld1y ago

> ..each DO constantly streams a sequence of WAL entries to object storage - batched every 16MB or every ten seconds.

Which also means it may take 10 seconds before you can (reliably) read the write globally.

I keep failing to see how this can replace regionally placed database clusters which can serve a continent in milliseconds.

firtoz1y ago

+ the hibernation happens after x seconds of inactivity, so it feels like the only time a data write to be unavailable as expected would be when the DO or worker crashes right after a write.

tmikaeld1y ago

You're right that reads and writes are immediate in the same client connection, this is how it works with CF KV as well - but not across the entire network.

On KV they expect up to 30 second latency before a write can be written everywhere, I expect similar here.

1 more reply

neamar1y ago

kentonv1y ago

memothon1y ago

Those WAL entries streamed to object storage I think are just for backups.

Each DO is globally unique (there's one DO with a given id running anywhere) and runs sqlite on its own local storage in that datacenter.

simonw1y ago

One thing I don't understand about Durable Objects yet is where they are physically located.

Are they located in the region that hosted the API call that caused them to be created in the first place?

mhart1y ago

By default in the region you created them in, but you can alternatively specify a locationHint. Use "oc" for Australia. https://developers.cloudflare.com/durable-objects/reference/...

Note the "Dynamic relocation of existing Durable Objects is planned for the future"

simonw1y ago

Thanks, added that to my post.

dantiberian1y ago

masterj1y ago

> Durable Objects do not currently change locations after they are created

> Dynamic relocation of existing Durable Objects is planned for the future.

https://developers.cloudflare.com/durable-objects/reference/....

IIRC Orleans (https://www.microsoft.com/en-us/research/wp-content/uploads/...) allows actors to be moved between machines, which should map well to DOs being moved between locations.

pests1y ago

As actors in Orleans are virtual and persistent it can also be the case it is running nowhere.

If it's stateless it could be running in multiple locations.

I worry "Dynamic relocation of DOs" might be going a bit too granular, this should be something the runtime takes care of.

ko_pivot1y ago

crabmusket1y ago

Not an answer to your question, but shoutout to https://where.durableobjects.live/

jwblackwell1y ago

Does anyone else struggle to wrap their head around a lot of this new cloud stuff?

I have 15+ years experience of building for the web, using Laravel / Postgres / Redis stack and I read posts like this and just think, "not for me".

djtango1y ago

From the article:

First apps that come to mind that have RT collaboration:

- Google Docs/Sheets etc

- Notion

- Miro

- Figma

jlokier1y ago

> Google had to in house everything and probably spearheaded the usage of CRDTs ( this is a guess!)

Fwiw, Google Docs/Sheets etc don't use CRDTs, they use the more server-oriented Operational Transforms (OT). CRDTs were spearheaded by others.

fastball1y ago

Google actually uses OT for their collab.

skrebbel1y ago

parsadotsh1y ago

I think for those cases you're expected to use something like this: https://developers.cloudflare.com/analytics/analytics-engine...

stavros1y ago

This is a really interesting design, but these kinds of smart systems always inhabit an uncanny valley for me. You need them in exactly two cases:

1. You have a really high-load system that you need to figure out some clever ways to scale.

2. You're working on a toy project for fun.

If #2, fine, use whatever you want, it's great.

If this is production, or for Work(TM), you need something proven. If you don't know you need this, you don't need it, go with a boring Postgres database and a VM or something.

Does anyone have more insight on this? I'd love to know.

crabmusket1y ago

As far as I can tell, multiplayer is the killer app for Durable Objects. If you want to build another Figma, Google Docs, etc, the programming model of Durable Objects is super handy.

This article goes into it more: https://digest.browsertech.com/archive/browsertech-digest-cl...

I think this old article is quite relevant too: http://ithare.com/scaling-stateful-objects/

Anyone who read the Figma multiplayer article and thought "that's kind of what I need" would be well served by Durable Objects, I think. https://www.figma.com/blog/rust-in-production-at-figma/

stavros1y ago

CRDTs really do sound amazing, though.

crabmusket1y ago

Best case, the players are co-located in a city or country, and they'll benefit from data center locality.

Worst case, they're not co-located, and one participant has good latency, and the other doesn't. This is equivalent to the "deploy the backend in a single server/datacenter" approach.

2 more replies

skybrian1y ago

And it seems like that would work just as well with durable objects.

dumbo-octopus1y ago

On the flip side, picking US-East-1 gives okayish latency to folks near that, and nobody else.

1 more reply

klabb31y ago

Databases is an extremely slow-maturing area, similar to programming languages, but are all deviations from Postgres shiny and hipster?

crabmusket1y ago

> That gives noSQL nightmares - breaking all kinds of important invariants of relational dbs

2 more replies

8n4vidtmkvmk1y ago

N+1 problem is also reduced if you keep your one and only server next to your one and only database.

masterj1y ago

If you adopt a wide-column db like Cassandra or DynamoDB, don’t you have to pick a shard for your table? The idea behind Durable Objects seems similar

1 more reply

jmtulloss1y ago

danpalmer1y ago

I'd view the split here along the axes of debuggability/introspection.

camgunz1y ago

First, this is very insightful--I think most people should go through this exact analysis before architecting a system.

yen2231y ago

I almost have the opposite view:

simonw1y ago

Have you worked on any examples of projects that started on PostgreSQL and ended up needing to migrate to something specialized?

1 more reply

gregwebs1y ago

There are a lot of cases of low traffic applications that aren’t toys but instead are internal tools- this could be a great option for those.

For higher traffic they are asking you to figure out how to shard your data and it’s compute. That’s really hard to do without hitting edge cases.

stavros1y ago

Why would you use this for an internal, low-traffic tool over Postgres?

2 more replies

MuffinFlavored1y ago

> If this is production, or for Work(TM), you need something proven.

I feel like part of Cloudflare's business model is to try to convince businesses at scale to solve problems in a non-traditional way using technology they are cooking up, no matter the cost.

blixt1y ago

bluehatbrit1y ago

If anyone has any experience with this, I'd be really interested to know what you're doing.

simonw1y ago

You'd need to roll your own migrations.

I have a version of that for SQLite written in Python, but I'm not sure if you could run that in Durable Objects - maybe via WASM and PyOdide? Otherwise you'd have to port it to JavaScript.

https://github.com/simonw/sqlite-migrate

bluehatbrit1y ago

I suppose the answer is "it's easier to have 1 central database/DO", but it feels like this approach to data storage really shines when you can have a DO per tenant.

simonw1y ago

1 more reply

segalord1y ago

Noticing CF pushing for devs to use DO for eveything over workers these days. Even websocket connections on workers get timed out after ~30s and the recommended way is to use DO for them

rozenmd1y ago

Durable Objects have always been the recommended way to do websocket connections on Cloudflare Workers? (as far as I remember, anyway)

The original chat demo dates back to 2020, using DOs + websockets: https://github.com/cloudflare/workers-chat-demo

kondro1y ago

Does this mean SQLite for DO can lose up to 10 seconds of data in the event of a failing DO?

stavros1y ago

I think Simon meant "within", rather than "beyond", here.

simonw1y ago

Thanks, I've updated that word.

_1tem1y ago

ec1096851y ago

They support “function” calling between DOs, so you are able to compose a response from more than one DO.

braden-lk1y ago

attilakun1y ago

Does CloudFlare have proper spending caps? If they have, I'd be open to try DOs but if they don't, it's a non-starter for an indie dev as I can't risk bankruptcy due to a bad for loop.

viraptor1y ago

ignoramous1y ago

> Specifically, having to worry about getting hibernation right.

As long as the client doesn't exchange websocket messages with DO, it'll hibernate. From what I can tell, ping/pong frames don't count towards uptime, if you're worried about that.

anentropic1y ago

What scares me is it is super specific to Cloudflare

What is your option if you want to eject to another cloud?

paulgb1y ago

https://github.com/jamsocket/y-sweet

vlaaad1y ago

Re https://where.durableobjects.live/ — why the hell are they still operating in Russia?

esnard1y ago

From https://blog.cloudflare.com/steps-taken-around-cloudflares-s... :

> [...]

9dev1y ago

As long as there aren’t any comparable technologies, or abstraction layers on top of DOs, I’m not going to make the leap of faith.

rcarmo1y ago

The first thing I wondered was how this plays with data residency and privacy/regulatory requirements.

avinassh1y ago

kikimora1y ago

This design does not handle hot partitions well and they are ubiquitous to so many domains.

simonw1y ago

Your partition would have to be VERY hot for SQLite not to be able to handle it - anything up to several thousand writes per second would likely work fine.

CyberDildonics1y ago

What is the difference between a "durable object" and a file?

simonw1y ago

You can read the full article for an answer to that: https://blog.cloudflare.com/sqlite-in-durable-objects/

Short version: it's replicated to five data centers on every transaction, and backed up as a stream to object storage as well.

CyberDildonics1y ago

So it's a file that gets backed up like dropbox

1 more reply

pajeets1y ago

wonder how this works with Pocketbase

j / k navigate · click thread line to collapse