A summary for the lazy readers:
* Cloudflare workers outperforms the rest by a big margin (~80ms avg)
* Fly.io cold starts are not great for the Hono use case (~1.5s avg)
* Koyeb wraps the requests behind Cloudflare to optimize latency (150ms best case)
* Railway localized the app in one region (80ms best case, ~400ms rest)
* Render has some challenges scaling from cold (~600ms avg)
In my opinion, this shows that all platform providers that uses Docker containers under the hood (Fly, Koyeb, Railway, Render) only achieve good cold starts by never shutting down the app. The ones that they do, can only achieve ~600ms startup times at best.That's... no longer a cold start, right?
>The primary region of our server is Amsterdam, and the fly instances is getting paused after a period of inactivity.
After they configured Fly to run nonstop, it outperformed everyone by 3x. But it seems like they're running the measurement from Fly's infrastructure, which biases the results in Fly's favor.
Also weird that they report p75, p90, p95, p99, but not median.
Looking at Google's SRE book, they use p50, p85, p95, and p99, so it's possible I'm misremembering or that Google uses unusual metrics:
https://sre.google/sre-book/service-level-objectives/#fig_sl...
I'm not aware of P50s having ever been a relevant performance metric in latency. The focus of these latency measurements were always the expected value for most customers, and that means P90-ish.
Is that the case though? AWS is upfront in how their nodejs lambdas being the preferred choice for low overhead, low latency, millisecond workloads, and as they also control the runtime I'd be surprised if they followed the naive path you're implying of just running nodejs instances in a dumb VM.
Hell, didn't AWS just updated the way they handled JVM lambdas to basically not charge for their preemptive starts and thus make them look as performant as nodejs ones?
But it's not all about latency a real world application will be different for sure !
send us an email at ping@openstatus.dev :)
Something is definitely up with Cloudflare's Johannesburg data centre. On particularly bad days, TTFB routinely reaches 1-3 seconds. Bypassing Cloudflare immediately drops this to sub 100ms.
In the past, I would have emailed support@cloudflare.com, but it seems that this channel is no longer available for free tier users. What is the recommended approach these days for reporting issues such as this?
I don't recall Cloudflare routing request to their free clients differently than paid, but I've read multiple reports of that happening recently. Change in policy or fallout from something else?
Users in the UK having their traffic sent to Australian Cloudflare DC Workers was quite the round-trip/tromboning-a-go-go...
Nice tip thanks
Submitters: If you want to say what you think is important about an article, that's fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else's: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...
On the other hand, if you want lowest possible latency, and the ability to run normal Linux applications (rather than being confined to the Cloudflare Worker limits, e.g. maximum worker size is 10MB), something like Fly.io is pretty nice: it's even lower latency than Cloudflare assuming you keep the machines running, and scaling up/down is relatively quick although not something you'd generally be doing every few seconds.
e.g. if you add prisma connecting to postgres, presumably there's extra latency to create the client. for the fly app, you have a server reusing the client while it's warm. presumably for the cloudflare worker, you're recreating the client per request, but im not 100% on that. how would the latency change then for cold vs warm, and on the other platforms?
https://developers.cloudflare.com/hyperdrive/configuration/h...