Chrome and Curl both report it takes about 1100ms to load the linked page's HTML, split about 50/50 between establishing a connection and fetching content. I'm not sure how the implementation works internally but that seems like a long time for a site served from memory and aiming to be "high-performance". The images bring the total time up to around 5.7s.
As a point of comparison, my site (nginx serving static content, on the 0.25 CPU GCP instance) serves the index page in 250ms. Of that, ~140ms is connection setup (DNS, TCP, TLS). The whole page loads in < 1000ms.
https://i.imgur.com/X4LDbWj.png
https://i.imgur.com/Ccwzmgz.png
One thing to remember is that when a server like nginx serves static content, it's often serving it from the page cache (memory). The author of Varnish has written at some length about the benefits of using the OS page cache, for example <https://varnish-cache.org/docs/trunk/phk/notes.html>. Some of the same principles can be applied even for servers that render dynamically (by caching expensive fragments).
You removed the CDN and the site got slower?
How do you know your site was the one that was fast or just the CDN? IE, the CDN should have added a lot of extra hops and made things slower.
To me, this implies the rust code is very poor at opening and closing connections, so the CDNs keep alive is pasting over that issue.
edit: web.dev measure gave this blog post url a performance score of 30/100 which is quite poor.
I would have liked to see the actual results from this comparison: "I compared my site to Nginx, openresty, tengine, Apache, Go's standard library, Warp in Rust, Axum in Rust, and finally a Go standard library HTTP server that had the site data compiled into ram."
If it's amd64, long mode requires a page table. Otherwise, a page table is handy so you can get page faults for null pointer dereferencing. Of course, you could do that only for development, and let production run without a page table.
My hobby OS can almost fill your needs though, but the TCP stack isn't really good enough yet (I'm pretty sure I haven't fixed retransmits after I broke them, no selective ack, probably icmp path mtu discovery is broken, certainly no path mtu blackhole detection, ipv4 only, etc), and I only support one realtek nic, cause it's what I could put in my test machine. Performance probably isn't great, but it's not far enough along to make a fair test.
I remember working in 2008 on a project for some geothermal devices that were spitting some IoT data on a "hardcoded" html page directly in the C code of the program, the device was using a chinese 8051-like CPU so you had no OS-per se
I don't think the author is claiming it is faster than a static site stored in memory, they're saying it is faster than a traditional static site that loads files from the disk. At least that's how I read it.
It can be a tiny amount more efficient since an async disk IO implementation might dispatch the file read() call to a thread pool, wait for the result, and then send the data back to the client. Makes 2 extra context switches compared to sending data from memory. Now if the user is super confident that the data is hot and in page cache then a synchronous disk read will fix the problem. Or trying a read with RWF_NOWAIT and only falling back to a thread pool if necessary.
On the other hand rendering a template on each request also requires CPU, which might be either more or less expensive than doing a syscall.
All in all the efficiency differences are likely negligible unless you run a CDN which does thousands of requests per seconnd.
In terms of throughput to the end user it will make zero measurable difference unless the box ran out of CPU.
Keeping everything in user space buffers might just be faster.
On the other hand, you're sending that sucker over network, and what you save doing this is most likely best counted in microseconds/request. It's piss in the ocean compared to the delay introduced even over a local network.
I wonder if io_uring could be used to issue a single syscall that would read data from disk (actually using page cache) and send it on the network.
Of course, you could use DPDK or similar technologies to do the opposite - read the data from disk once and keep it in user-space buffers, then write it directly to NIC memory without another syscall. That should still theoretically be faster, since there would be 0 syscalls per request, where the other approach would require 1 per request.
200MB of pages and assets, sure. Code? No. If you compile it into the binary then the storage is no worse than having a small binary and all the resources separate.
Taking a statically generated site and returning the raw bytes is 100% faster. The author said so themselves.
If you did it that way, now all your content is basically mmaped into the memory which means (probably) less syscalls.
Soo it might've shaved half a microsecond maybe ?
The individual site may be constructed individually (maybe) but it can only work if the society of people-who-use-the-internet all agree to follow a series of conventions about how websites work; you can't start using \<soul\> instead of \<body\> and expect everything to work as normal, because the reason the \<body\> tag is used to define the body of a page is because we needed a way to make sure people can use a webpage without having to define an entire new language for each one.
But no, a website is not a social construct because you don't have to have a society to have a website. I can have two machines connected and host an html file on one of them and stare at it on the other one all by myself and it will still be a website on a web! No contractual agreement is necessary!
But anyway, it's amazing that you posted on my comment! I am a huge fan!
Everyone thought it was amazing even though it was just a dumb http server returning pages[req.path] :-) Latency was under 10ms which was pretty amazing for a 2012 KVM VPS.
> And when I say fast, I mean that I have tried so hard to find some static file server that could beat what my site does. I tried really hard. I compared my site to Nginx, openresty, tengine, Apache, Go's standard library, Warp in Rust, Axum in Rust, and finally a Go standard library HTTP server that had the site data compiled into ram. None of them were faster, save the precompiled Go binary (which was like 200 MB and not viable for my needs). It was hilarious. I have accidentally created something so efficient that it's hard to really express how fast it is.
In the real world, use Go, Node, etc.
There has to be a point of diminishing return. And again, I'm not discarding the dev side of things but it seems a lot of extra tooling and complexity cor not much gain.
I am too much of an OCD perfectionist and don't have the guts to ship this often.
I have CDO too but I work around it by sheer trolling with infrastructure, like my hacked up to hell CDN: https://xeiaso.net/blog/xedn
A lot of our (and in particular, my) best features come from of relocating the boundaries between things, to make space for features that weren't considered in the original design. With monolithic systems we see this late in the lifecycle in the form of Conway's Law. If you stick this problem in front of the CI/CD mirror, it's painful to face. CI/CD argues that if something is difficult we should do it all the time so that it's routine (or stop doing it entirely).
However there's a conspicuous lack of tools and techniques to make that practical. The only one I really know of is service retirement (replace 2-3 services with 2 new, refactored services), and we don't have static analysis tools that can tell us deterministically when we can remove an API. We have to do it empirically, which is fundamentally on par with println debugging.
Seeing the initial comments here I think it would be better to go with the original title.
Great blog by the way :)
I'm not a guy, I'd prefer if you used they to refer to me, but she works too.
The PAM one was a really fun talk to write. I need to finish that postmortem on how that talk went wrong.
http://cppcms.com/wikipp/en/page/main
https://github.com/Tatoeba/tatowiki the wiki of tatoeba.org ( https://en.wiki.tatoeba.org/articles/show/main# ) is written in it