I believe this behavior is changing in the 2024 edition: https://doc.rust-lang.org/edition-guide/rust-2024/temporary-...
Past tense, the 2024 edition stabilized in (and has been the default edition for `cargo new` since) Rust 1.85.
Their writing is so good, always a fun and enlightening read.
This tier approach makes a lot of sense to mitigate the scaling limit per corrosion node. Can you share how much data you wind up tracking in each tier in practice?
How concise is the entry for each application -> [regions] table? Does the constraint of running this on every node mean that this creates a global limit for number of applications? It also seems like the region level database would have a regional limit for the number of Fly machines too?
So is this a case of wanting to deliver a differentiating feature before the technical maturity is there and validated? It's an acceptable strategy if you are building a lesser product but if you are selling Public Cloud maybe having a better strategy than waiting for problems to crop up makes more sense? Consul, missing watchdogs, certificate expiry, CRDT back filling nullable columns - sure in a normal case these are not very unexpected or to-be-ashamed-of problems but for a product that claims to be Public Cloud you want to think of these things and address them before day 1. Cert expiry for example - you should be giving your users tools to never have a cert expire - not fixing it for your stuff after the fact! (Most CAs offer API to automate all this - no excuse for it.)
I don't mean to be dismissive or disrespectful, the problem is challenging and the work is great - merely thinking of loss of customer trust - people are never going to trust a new comer that has issues like this and for that reason move fast break things and fix when you find isn't a good fit for this kind of a product.
The "decision that long predates Corrosion" is precisely the point I was trying to make - was it made too soon before understanding the ramifications and/or having a validated technical solution ready? IOW maybe the feature requiring the problem solution could have come later? (I don't know much about fly.io and its features, so apologies if some of this is unclear/wrongly assumes things.)
Huge pet peeve. At least this one has a date somewhere (at the bottom, "last updated Oct 22, 2025").
Is this a typo? Why does it backfill values for a nullable column?
https://github.com/vlcn-io/cr-sqlite/blob/891fe9e0190dd20917...
To ensure every instance arrives at the same “working set” picture, we use cr-sqlite, the CRDT SQLite extension.
Cool to see cr-sqlite used in production!vlcn-io/cr-sqlite definitely built by someone who doesn't understand the fundamentals of the space
> As of cr-sqlite 0.15, the CRDT for an existing row being update is this: (1) Biggest col_version wins
col_version is definitely something, but it isn't a logical timestamp!
--
https://github.com/superfly/corrosion/blob/main/doc/crdts.md
> Crsqlite specifically uses a "lamport timestamp" which, if you squint at from a distance, could be most concisely boiled down to a monotonically increasing counter.
lamport clocks can be boiled down to monotonically-increasing counters _per physical node in the system_, not per logical row/entity in the data model
so if you want to do conflict resolution based on logical (lamport) clocks you need to evaluate/resolve concurrent modifications according to site-specific logical clocks and their histories -- not just raw integers
which 100% vlcn.io does not do
> destroyed comes before started and so started is "bigger"
eep. good luck!
But they have to. Physically no solution will be instantaneous because that’s not how the speed of light nor relativity works - even two events next to each other cannot find out about each other instantaneously. So then the question is “how long can I wait for this information”. And that’s the part that I feel isn’t answered - eg if the app dies, the TCP connections die and in theory that information travels as quickly as anything else you send. It’s not reliably detectable but conceivably you could have an eBPF program monitoring death and notifying the proxies. Thats the part that’s really not explained in the article which is why you need to maintain an eventually consistent view of the connectivity. I get maybe why that could be useful but noticing app connectivity death seems wrong considering I believe you’re more tracking machine and cluster health right? Ie not noticing an app instance goes down but noticing all app instances on a given machine are gone and consensus deciding globally where the new app instance will be as quickly as possible?
Did you ever consider envoy xDS?
There are a lot of really cool things in envoy like outlier detection, circuit breakers, load shedding, etc…
I was thinking I'll just have to bite the bullet and migrate to PostgreSQL, but perhaps rqlite can work.
This blog is not impressive for an infra company.
Makes you think that's all.
it would be super cool to learn more about how the world's largest gossip systems work :)
We're actually keeping the global Corrosion cluster! We're just stripping most of the data out of it.
Nice.
and I think the intended webfont is loaded because the font is clearly weird ish and non-standard and the text is invisible for good 2 seconds at first while it loads:)
In what sense do you think we need specialty routers?
How would you deploy Postgres to address these problems?