- The write api is sync, but it has a hidden async await: when you do your next output with a response, if the write fails the runtime will replace the response with a http failure. This allows the runtime to auto-batch writes and optimistically assume they will succeed, without the user explicitly handling the errors or awaits.
- There are no read transactions, which would be useful to get a pointer to a snapshot at a point in time.
- Each runtime instance is limited to 128mb RAM.
- Websockets can hibernate and you do not have to pay for the time they are sleeping. This allows your clients to remain connected even when the DO is sleeping.
- They have a kind of auto RPC ability where you can talk to other DOs or workers as if they are normal JS calls, but they can actually be calling another data center. The runtime handles the serialisation and parsing.
It reminds me of PostgreSQL's commit_delay, even thought it's not exactly the same principle: https://www.postgresql.org/docs/current/runtime-config-wal.h...
Litestream, mentioned in the post, is also suggesting a similar technique.
In SQLite in general read transactions are useful since you can access the same database from multiple processes at a time. Here, only a single process can access the database. So you can get the same effect as read transactions either by doing all reads in one synchronous function, or implement your own process-level locking.
SQLite can have many readers and a single writer with WAL, so a many read transactions can exist whilst the writers move the db state forward.
Which also means it may take 10 seconds before you can (reliably) read the write globally.
I keep failing to see how this can replace regionally placed database clusters which can serve a continent in milliseconds.
Edit: I know it uses streams, but those are only to 5 followers and CF have hundreds of datacenters. There is no physical way to guarantee reads in seconds unless all instances of the SQLite are always connected and even then, packet latency will cause issues.
For another process (e.g. another DO or another worker) to access the data, they need to go through the DO which "contains" the data, so they'd be making a RPC or a HTTP request to the DO, and they'd get the latest information.
+ the hibernation happens after x seconds of inactivity, so it feels like the only time a data write to be unavailable as expected would be when the DO or worker crashes right after a write.
On KV they expect up to 30 second latency before a write can be written everywhere, I expect similar here.
Each DO is globally unique (there's one DO with a given id running anywhere) and runs sqlite on its own local storage in that datacenter.
Are they located in the region that hosted the API call that caused them to be created in the first place?
If so, is there a mechanism by which a DO can be automatically migrated to another location if it turns out that e.g. they were created in North America but actually all of the subsequent read/write traffic to them comes from Australia?
Note the "Dynamic relocation of existing Durable Objects is planned for the future"
> Dynamic relocation of existing Durable Objects is planned for the future.
https://developers.cloudflare.com/durable-objects/reference/....
IIRC Orleans (https://www.microsoft.com/en-us/research/wp-content/uploads/...) allows actors to be moved between machines, which should map well to DOs being moved between locations.
If it's stateless it could be running in multiple locations.
I worry "Dynamic relocation of DOs" might be going a bit too granular, this should be something the runtime takes care of.
I have 15+ years experience of building for the web, using Laravel / Postgres / Redis stack and I read posts like this and just think, "not for me".
> For useful background on the first version of Durable Objects take a look at Cloudflare's durable multiplayer moat by Paul Butler, who digs into its popularity for building WebSocket-based realtime collaborative applications.
First apps that come to mind that have RT collaboration:
- Google Docs/Sheets etc
- Notion
- Miro
- Figma
These are all global scale collaborative apps, I'm not sure a Laravel stack will support those use cases... Google had to in house everything and probably spearheaded the usage of CRDTs ( this is a guess!) but as the patterns emerge and the building blocks get SAASified, mass-RT collaboration no longer becomes a giant engineering problem and more and more interesting products get unlocked
Fwiw, Google Docs/Sheets etc don't use CRDTs, they use the more server-oriented Operational Transforms (OT). CRDTs were spearheaded by others.
I do worry that DOs are great for building fast, low-overhead, realtime experiences (eg five people editing a document in realtime), but make it very hard to make analyses and overviews (which groups of people have been which editing documents the last week?). Putting the data inside SQLite might make that even harder - you'd have to somehow query lots and lots of little SQLite instances and then merge the results together. I wonder if there's anything for this with DOs, because this is what keeps bringing me back to Postgres time and time again: it works for core app features and for overviews, BI, etc.
1. You have a really high-load system that you need to figure out some clever ways to scale.
2. You're working on a toy project for fun.
If #2, fine, use whatever you want, it's great.
If this is production, or for Work(TM), you need something proven. If you don't know you need this, you don't need it, go with a boring Postgres database and a VM or something.
If you do know you need this, then you're kind of in a bind: It's not really very mature yet, as it's pretty new, and you're probably going to hit a bunch of weird edge cases, which you probably don't really want to have to debug or live with.
So, who are these systems for, in the end? They're so niche that they can't easily mature and be used by lots of serious players, and they're too complex with too many tradeoffs to be used by 99.9% of companies.
The only people I know for sure are the target market for this sort of thing is the developers who see something shiny, build a company (or, worse, build someone else's company) on it, and then regret it pretty soon and move to something else (hopefully much more boring).
Does anyone have more insight on this? I'd love to know.
This article goes into it more: https://digest.browsertech.com/archive/browsertech-digest-cl...
I think this old article is quite relevant too: http://ithare.com/scaling-stateful-objects/
Anyone who read the Figma multiplayer article and thought "that's kind of what I need" would be well served by Durable Objects, I think. https://www.figma.com/blog/rust-in-production-at-figma/
There are other approaches - I've worked in the past with CRDTs over WebRTC which felt absolutely space-age. But that's a much more complicated foundation compared to a websocket and a single class instance "somewhere" in the cloud.
CRDTs really do sound amazing, though.
The idea of colocating data and behavior is really a quantifiable reduction in complexity. It removes latency and bandwidth concerns, which means both operational concerns and development concerns (famously the impact of the N+1 problem is greatly reduced). You can absolutely argue that networked Postgres is better for other reasons (and you may be right) but SQLite is about as boring and predictable as you can get, with known strong advantages. This is the reason it’s getting popular on the server.
That said, I don’t like the idea of creating many small databases very much - as they suggest with Durable Objects. That gives noSQL nightmares - breaking all kinds of important invariants of relational dbs. I think it’s much preferable to use SQLite as a monolithic database like it’s done in their D1 product.
IMO Durable Objects map well to use cases where there actually are documents. Think of Figma. There is a ton of data that lives inside the literal Figma document. It would be awful to have a relational table for like "shapes" with one row per rectangle across Figma's entire customer base. That's just not an appropriate use of a relational database.
So let's say I built Figma on MongoDB, where each Figma document is a Mongo document. That corresponds fairly straightforwardly to each Figma document being a Durable Object instance, using either the built-in noSQL storage that Durable Objects already have, or a small Sqlite relational database which does have a "shapes" table, but only containing the shapes in this one document.
This was actually the solution we came up with at a very big global company. Well, not 1 server, but 1 data center. If your write leaders are all in one place it apparently doesn't matter that everything else is global, for certain write requests at least.
There are many services that just don't require performance tuning or deep introspection, things like internal tools. This is where I think serverless frameworks do well, because they avoid a lot of time spent on deployment. It's nice if these are fast, but that's rarely a key requirement. Usually the key requirement is that they are fast to build and low maintenance. It's possible that Cloudflare have got a good story for developer experience here that gets things working quickly, but that's not their pitch, and there are a lot of services competing to make this sort of development fast.
However where I don't think these services work well is when you have high debuggability and introspection requirements. What metrics do I get out of this? What happens if some Durable Objects are just slow, do we have the information to understand why? Can we rectify it if they are? What's the logging story, and how much does it cost?
I think these sorts of services may be a good idea for a startup on day 1 to build some clever distributed system in order to put off thinking about scaling, but I can't help but think that scale-up sized companies would be wanting to move off this onto something they can get into the details more with, and that transition would be a hard one.
As others have said, the use is multiplayer, and that's because you need everyone to see your changes ASAP for the app to feel good. But more broadly, the storage industry has been trying to build something that's consistent, low latency, and multiuser for a long time. That's super hard, just from a physics point of view there's generally a tradeoff between consistency and latency. So I think people are trying different models to get there, and a lot of that experimentation (not all, cf Yugabyte or Cockroach) is happening with SQLite.
When starting out you can get away with using a simple Postgres database. Postgres is fine for low-traffic projects with minimal latency constraints, and you probably want to spend your innovation tokens elsewhere.
But in very high-traffic Production cases with tight latency requirements, you will start to see all kinds of weird and wacky traffic patterns, that barebones Postgres won't be able to handle. It's usually in these cases where you'd need to start exploring alternatives to Postgres. It's also in these cases where you can afford to hire people to manage your special database needs.
For higher traffic they are asking you to figure out how to shard your data and it’s compute. That’s really hard to do without hitting edge cases.
I feel like part of Cloudflare's business model is to try to convince businesses at scale to solve problems in a non-traditional way using technology they are cooking up, no matter the cost.
In retrospect what we ended up building at Framer for projects with multiplayer support where edits are replicated at 60 FPS while being correctly ordered for all clients is a more applied version of what DOs are doing now. We also ended up with something like a WAL of JSON object edits so in case a project instance crashed its backup could pick up as if nothing had happened, even if committing the JSON patches into the (huge) project data object didn't have time to occur (on an every-N-updates/M-seconds basis just like described here).
In my head, this would be a fun way to build a bookmark service with a DO per user. But as soon as you want to add a new field to an existing table, you meet a pretty tricky problem of getting that change to each individual DO. Perhaps that example is too long lived though, and this is designed for more ephemeral usage.
If anyone has any experience with this, I'd be really interested to know what you're doing.
I have a version of that for SQLite written in Python, but I'm not sure if you could run that in Durable Objects - maybe via WASM and PyOdide? Otherwise you'd have to port it to JavaScript.
I suppose the answer is "it's easier to have 1 central database/DO", but it feels like this approach to data storage really shines when you can have a DO per tenant.
The original chat demo dates back to 2020, using DOs + websockets: https://github.com/cloudflare/workers-chat-demo
I think Simon meant "within", rather than "beyond", here.
As long as the client doesn't exchange websocket messages with DO, it'll hibernate. From what I can tell, ping/pong frames don't count towards uptime, if you're worried about that.
What is your option if you want to eject to another cloud?
> Since the invasion, providing any services in Russia is understandably fraught. Governments have been united in imposing a stream of new sanctions and there have even been some calls to disconnect Russia from the global Internet. As discussed by ICANN, the Internet Society, the Electronic Frontier Foundation, and Techdirt, among others, the consequences of such a shutdown would be profound.
> [...]
> Beyond this, we have received several calls to terminate all of Cloudflare's services inside Russia. We have carefully considered these requests and discussed them with government and civil society experts. Our conclusion, in consultation with those experts, is that Russia needs more Internet access, not less.
As long as there aren’t any comparable technologies, or abstraction layers on top of DOs, I’m not going to make the leap of faith.
Since this is all running on Cloudflare you could scale reads with a 1 second cache TTL somewhere, which would drop your incoming read queries to around one per second no matter how much read traffic you had.
Short version: it's replicated to five data centers on every transaction, and backed up as a stream to object storage as well.