However, I think the author critically under-guesses the sizes of things (even just for storage) by a reasonably substantial amount. e.g.: Quote tweets do not go against the size limit of the tweet field at Twitter. Likely they are embedding a tweet reference in some manner or other in place of the text of the quoted tweet itself but regardless a tweet takes up more than 280 unicode characters.
Also, nowhere in the article are hashtags mentioned. For a system like this to work you need some indexing of hashtags so you aren't doing a full scan of the entire tweet text of every tweet anytime someone decides to search for #YOLO. The system as proposed is missing a highly critical feature of the platform it purports to emulate. I have no insider knowledge but I suspect that index is maybe the second largest thing on disk on the entire platform, apart from the tweets themselves.
Hashtags are a search feature and basically need the same posting lists as for search, but if you only support hashtags the posting lists are smaller. I already have an estimate saying probably search wouldn't fit. But I think hashtag-only search might fit, mainly because my impression is people doing hashtag searches are a small fraction of traffic nowadays so the main cost is disk, not sure though.
I did run the post by 5 ex-Twitter engineers and none of them said any of my estimates were super wrong, mainly just brought up additional features and things I didn't discuss (which I edited into the post before publishing). Still possible that they just didn't divulge or didn't know some number they knew that I estimated very wrong.
Another poster dug into some implementation details that I'm not going to go into. I think you could shoehorn it into an extremely large server alongside the rest of your project but then you're looking at processing overhead and capacity management around the indexes themselves starting to become a more substantial part of processing power. Consider that for each tweet you need to break out what hashtags are in it, create records, update indexes, and many times there's several hashtags in a given tweet.
When I last ran analytics on the firehose data (ca. 2015/16) I saw something like 20% of all tweets had 3 or more hashtags. I only remember this fact because I built a demo around doing that kind of analytics. That may have changed over time obviously, however without that kind of information we don't have a good guesstimate even of what storage and index management there looks like. I'd be curious if the former Twitter engineers you polled worked on the data storage side of things. Coming at it from the other end of things, I've met more than a few application engineers who genuinely have no clue how much work a DBA (or equivalent) does to get things stored and indexed well and responsively.
While some of those writes may well be acceptable to lose, letting you write to caches, effectively you need to assume there are more analytics events triggering writes to something than there are tweet views.
What I mean is, Twitter seems to be processing data based on whatever it is in the tweet and doesn't maintain some grand coherent database.
So I changed my Twitter handle and opened a new account with my original Twitter handle and to my surprise, I was receiving notifications of engagement with tweets my old account sent previously.
I also heard that a method for spamming Twitter trending topics is to send tweets and delete them quickly.
My impression is that Twitter is big on real time processing. They definitely don't search the entire database for #YOLO tweets, instead they seem to be searching the almost-live stuff and some archived stuff(probably ranked and saved as noteworthy tweets or something).
While true, and not to take away from the parent comment, I've noticed that the size of things is often partially the result of scaling out horizontally. Most companies I've worked at end up with a lot of duplicate records as each subsystem might want a copy or to cache a copy.
It's often fine to start without a fully decoupled system (net present value of the time and money needed to scale out might be far too high), but you need to know whether or not it's likely to come and what to look for so you can start preparing in time.
When interviewing developers I always ask them what is the largest public web site they ever worked on and then probe about performance issues they encountered and how they resolved them in order gague how far along they are in their skill development.
I would never plan to run a production service on a single server just because coordinating changes in the active dataset among two or more production servers often changes your design significantly, and you want to plan for that because the consumer grade hardware we all use has a nasty habit of not working after power cycles (which still tends to be the most strain a system goes through, even in a world of SSD storage).
Adding images, videos, other large attachments, rich search, and all the advertising and billing and analytics stuff would blow this out of the water, but... maybe not by as much as people think...? I would not be surprised if a very performance-engineered version of Twitter could run on a few dozen racks full of beefy machines like this with HPC-grade super-fast fabric interconnects.
I have a strong sense that most large scale systems are way less efficient than what's possible. They trade development ease, modularity, and velocity for performance by using a lot of microservices, flabby but easy and flexible protocols (e.g. JSON over HTTP), slow dynamic languages, and layers of abstraction that could be integrated a lot more tightly.
Of course that may be a rational trade-off if velocity, flexibility, and labor costs and logistics matter more than hardware, power, or data center floor space costs.
I didn't get the impression that this would duplicate the entire functionality of Twitter, just what amounts to the MVP functionality. If you are only talking about the MVP it's at least somewhat plausible with a lot of careful engineering and highly efficient data manipulation.
I agree mostly. Where I differ in that I would argue that hashtags were THE thing that Twitter is most known for but that could be a perspective from having been on the platform for forever and a day and I recognize not everyone may make that same association anymore.Yes, this would be big enough to need specifically factoring into a real implementation design. But it would not be big enough to invalidate the proposed idea so I understand leaving it off, at least initially, to simplify the thought experiment.
Similarly to support a message responding to, or quoting, a single other you only need one or two NULLable ID references, 16 bytes per message, which will likely be dwarfed by the message text in the average case. Given it likely makes sense to use something like SQL Server's compression options for data like this the average load imposed will be much smaller than 16 bytes/message.
We are fiddling, fairly insignificantly, measurable but to massively, with constants a & b in O(a+bN) here, so the storage problem is still essentially of order O(N) [where N is total length of the stored messages].
I'd probably go as far to say that the indexes _generally_ at twitter could be larger than the tweets
The data structures for the @BeefWellington timeline of tweets and the one for the #BeefWellington timeline of tweets could look roughly the same.
I really wonder how much of a challenge this is and how much it occupies, not even talking about disk, but continuing the theoretical exercise in the linked URL, you can get 1U size servers with 2TB of RAM these days.
Text search, hashtag index, some structured data for popular tweets, etc...
In order to deliver search results I would not be surprised if tweets are duplicated/denormalized, for quick search/lookup.
I want to add another concept that may impact, considerably, the storage, which is "threads". I'm not sure what is the percentage of threads/tweets but what I consider an important factor is that threads do not have a maximum number of characters.
> There’s a bunch of other basic features of Twitter like user timelines, DMs, likes and replies to a tweet, which I’m not investigating because I’m guessing they won’t be the bottlenecks.
Each of these can, in fact, become their own bottlenecks. Likes in particular are tricky because they change the nature of the tweet struct (at least in the manner OP has implemented it) from WORM to write-many, read-many, and once you do that, locking (even with futexes or fast atomics) becomes the constraining performance factor. Even with atomic increment instructions and a multi-threaded process model, many concurrent requests for the same piece of mutable data will begin to resemble serial accesses - and while your threads are waiting for their turn to increment the like counter by 1, traffic is piling up behind them in your network queues, which causes your throughput to plummet and your latency to skyrocket.
OP also overly focuses on throughput in his benchmarks, IMO. I'd be interested to see the p50/p99 latency of the requests graphed against throughput - as you approach the throughput limit of an RPC system, average and tail latency begin to increase sharply. Clients are going to have timeout thresholds, and if you can't serve the vast majority of traffic in under that threshold consistently (while accounting for the traffic patterns of viral tweets I mentioned above) then you're going to create your own thundering herd - except you won't have other machines to offload the traffic to.
"I also didn’t try to investigate configuring an IBM mainframe, which stands a chance of being the one type of “machine” where you might be able to attach enough storage to fit historical images."
It seems theoretically possible it could accomodate the entirety of Twitter in 'one machine'.
There was a HPC cluster at Princeton when I worked there (which, looking at their website, has since been retired) that was assembled by SGI and outfitted with a customized Linux unikernel that presented itself as a single OS image, despite being comprised several disparate racks of individual 2-4u servers. You might be able to metaphorically duct-tape enough machines together with a similar technique to be able to run the author's pared-down scope within a single OS image.
With respect to the IBM z-series specifically - if the goal of the exercise is to save money on hardware costs, I'm imagining purchasing an IBM mainframe is in direct opposition to that goal. :) I'm not familiar enough with its capabilities to say one way or the other.
Because OP is a junior developer, he reads a lot of theory and blog posts, does a lot of research, but doesn't have much practical experience. Just look at his resume and what he wrote. As a result, most of what he write about is based on what he have read about senior developers doing in the companies he have worked for, perhaps he created some supporting software for core services but did not design or implemented the core, so he doesn't have firsthand experience. This is evident to anyone who has actually used DPDK (which is ridiculous proposal for Twitter like service in 2023 where you have XDP and io_uring, it's not HFT), designed and implemented high volume, low latency web services and knows where the bottleneck is in that kind of services from experience, theory will not give you that intuition and knowledge.
You add another feature and it requires a little bit more RAM, and another feature that needs a little bit more, and.. eventually it doesn't all fit.
Now you have to go distributed.
And your entire system architecture and all your development approaches are built around the assumptions of locality and cache line optimization and all of a sudden none of that matters any more.
Or you accept that there's a hard ceiling on what your system will ever be able to do.
This is like building a video game that pushes a specific generation of console hardware to its limit - fantastic! You got it to do realtime shadows and 100 simultaneous NPCs on screen! But when the level designer asks if they can have water in one level you have to say 'no', there's no room to add screenspace reflections, the console can't handle that as well. And that's just a compromise you have to make, and ship the game with the best set of features you can cram into that specific hardware.
You certainly could build server applications that way. But it feels like there's something fundamental to how service businesses operate that pushes away from that kind of hyperoptimized model.
Vertical scaling was absolutely the way most big applications were built up until well into the 90s. Companies like Oracle were really built on the fact that getting performance and reliability out of a single highly-contested massive server is hard but important if that's the way you're going. Linux became dominant primarily because horizontal scaling won that argument and it won it pretty much exactly because of:
1)what you said - you hit a hard cap on how big you can make your main server at which point you are really screwed. Scalability pain points become a hard wall.
2) when I say "server" I mean "servers" of course because you'd need an H/A failover, at which point you've eaten the cost of replication, handling failover etc and you may as well distribute
3) cost. Because hardware cost vs capability is exponential, as your requirements become bigger you pretty rapidly hit a point where lots of commodity hardware becomes cheaper for a given performance point than few big servers
So there's a reason that distributed systems on commodity hardware became the dominant architectural paradigm. It's not the only way to do it, but it's a reasonable default for many use cases. For a very high-throughput system like twitter it seems a very obvious choice.
Clearly there are costs to distribution, so if you can get away with a simpler architecture then as always Occam's razor applies. Also if you can easily distribute later then it probably makes sense to leave that option open and explore it when you need it rather than overcomplicate too early.
I’m always reminded of how stackoverflow essentially runs off a single database server. If they can do it, most web properties can do it.
It's a similar phenomenon to the observation that tech "innovations" tend to recapitulate research that had its roots in the 50-60-70s.
"The industry" doesn't seem to put much stock in generational knowledge transfer.
Indeed, with the hyperoptimized version here, the moment you tip over into two machines each machine will need two copies of every tweet from anyone who has followers sharded to both machines, so the capacity of two machines is going to be far less than twice the capacity of one as a large proportion of tweets will cause writes on both shards. This inefficiency will now always be with you - the average number of writes per user per tweet will go up until your number of shards approaches average follower counts.
This is why it's common to model this with fan-out on write, because the moment you accept that there is a risk you'll tip over into a sharded model you need to account for that. If asked the question of such a design, it's worth pointing out that if you can guarantee it fits on one machine, and this is true for many more problems than people expect, then you can save a lot, but then I'd set out the more complex model and contrast it to the single-machine model.
You don't need to fan-out to every account even in such a distributed system, certainly. You can fan-out to every shard/instance, and keeping that cached in RAM would still allow you to be far more efficient than e.g. Mastodon (which does fan-out to every instance for the actual post data, but relies on a Postgres database)
That "fundamental" thing is the cultural expectation that SaaS offerings constantly grow in features, rather than in reliability or performance. As your example from the world of video games demonstrates, there is no industry-wide belief that things must be able to do ever-more, forever. It's really mostly SaaS and desktop software that has this weird and unreasonable culture around it. That's why your word processor can now send emails, and your email provider now does translations as well.
Data growth through user growth or just normal day-to-day usage is expected.
Of course they could fit a much larger dataset on one machine today.
(But I will note the article is also assuming a chronological timeline by default, but that of course hasn't been true for years - the ranking Twitter does now is far more complex)
Edit: Unless I missed something, the author never argued that Twitter should be hosted on one machine and therefore criticizing the “fun stunt” like this makes no sense to me
I did not looked into source code yet but I suppose that OP if not implemented already than there should ideas for implementation.
In addition: from my POV implementation of scaling for such service should be trivial: - sharding of data between instances by a criteria (e.g. regional) or by hash - configure network routing
I think it should work
It’s interesting to see how much can be done with a single machine, because most projects will never be this big.
Though there will still be other concerns like redundancy to deal with.
It's not sarcasm, I have twitter account but I never understood hype about twitter.
I see nothing in twitter from technical POW, closed twitter protocol looks very strange, they banned Trump, they were profitable in 2021, Elon Musk bought them for ~44 bln.
Maybe sellout of company with problems for such price is success.
Back in the 486 days you wouldn't be keeping, in RAM, data about every single human on earth (let's take "every single human on earth" as the maximum number of humans we'll offer our services to with on our hypothetical server). Nowadays keeping in RAM, say, the GPS coordinates of every single human on earth (if we had a mean to fetch the data) is doable. On my desktop. In RAM.
I still don't know what the implications are.
But I can keep the coordinates of every single humans on earth in my desktop's RAM.
Let that sink in.
P.S: no need to nitpick if it's actually doable on my desktop today. That's not the point. If it's not doable today, it'll be doable tomorrow.
From scanning every message of every person, it's going to expand to recognizing every face from every camera, and transcribing and analyzing every spoken word recorded.
It was a little more than ten years ago for me. I realized that a hard disk could store a database of every human alive, including basic information, links (family relations) and maybe a photo.
I still don't know what the implications are.
Maybe we don't want to know, but it's not really that difficult to think about.
Which is already largely true today with the advent of serverless. Most maintenance work can center around application logic rather than scaling physical machines/maintaining versioning.
It's clear that many modern applications would take an order of magnitude more people to run even just 20 years ago. That trend will only continue
That's getting to the point you could store 20 bytes per atom in a terabyte.
(The big bottleneck is that you need picosecond resolution simulation steps and to cover minutes to see a protein fold.)
>It is amusing to consider how much of the world you could serve something like Twitter to from a single beefy server if it really was just shuffling tweet sized buffers to network offload cards. Smart clients instead of web pages could make a very large difference. [1]
Very interesting to see the idea worked out in more detail.
[1] https://twitter.com/id_aa_carmack/status/1350672098029694998
Except that's not what it is doing at all.
It assembles all the Tweets internally, applies an ML model to produce a finalised response to the user.
I strongly doubt that entire datacenters need to be used if and only if Twitter obsessively optimized for hardware usage efficiency over everything else. In reality they don't and make some pretty big compromises to actually get stuff built. Hardware is cheap, people are not.
There are storage size issues (like how big is their long tail; quite large I'd imagine), but its a fun thing to think about.
HTTP with connection: keep-open can serve 100k req/sec. But that's for one client being served repeatedly over 1 connection. And this is the inflated number that's published in webserver benchmark tests.
For more practical down to earth test, you need to measure performance w/o keep-alive. Request per second will drop to 12k / sec then.
And that's for HTTP without encryption or ssl handshake. Use HTTPS and watch it fall down to only 400 req / sec under load test [ without connection: keep-alive ].
That's what I observer.
I'm talking about a hypothetical HTTPS server that used optimized kernel-bypass networking. Here's a kernel-bypass HTTP server benchmarked doing 50k new connections per core second while re-using nginx code: https://github.com/F-Stack/f-stack. But I don't know of anyone who's done something similar with HTTPS support.
One thing which we noticed was that there was a considerable difference in performance characteristics based on how we parallelized the load testing tool (multiple threads, multiple processes, multiple kubernetes pods, pods forced to be distributed across nodes).
I think that when you run non-distrubuted load tests you benefit from bunch of cool things which happen with http2 and Linux (multiplexing, resource sharing etc) which might make applications seem much faster than they would be in the real world.
"Quant uses the warpcore zero-copy userspace UDP/IP stack, which in addition to running on on top of the standard Socket API has support for the netmap fast packet I/O framework, as well as the Particle and RIOT IoT stacks. Quant hence supports traditional POSIX platforms (Linux, MacOS, FreeBSD, etc.) as well as embedded systems."
I'm running about 2000 requests/s in one of my real-world production systems. All of the requests are without keep-alive and use TLS. They use about one core for TLS and HTTP processing.
Why do we want to apply ML at the cost of a significant fleet cost increase? Because it can make the overall system consistently perform against external changes via generalization, thus the system can evolve more cheaply. Why do we want to implement a complex logging layer although it doesn't bring direct gains on system performance? Because you need to inspect the system to understand its behavior and find out where it needs to change. The list can go on and I can give you hundreds of reasons why we need all these apparently unnecessary complexities and overheads can be important for systems' longevity.
I don't deny the existence of accidental complexities (probably Twitter can become 2~3x simpler and cheaper given sufficient eng resource and time), but in many cases you probably won't be able to confidently say if some overheads are accidental or essential since system engineering is essentially a highly predictive/speculative activity. To make this happen, you gotta have a precise understanding of how the system "currently works" to make a good bet rather than re-imagination of the system with your own wish list of how the system "should work". There's a certain value on the latter option, but it's usually more constructive to build an alternative rather than complaining about the existing system. This post is great since the author actually tried to build something to prove its possibility, this knowledge could turn out to be valuable for other Twitter alternatives later on.
Sure, you need to invest into it but those are things you can reuse for every app and feature you build.
And those are not the reason why those systems are so complex, those are just ways to keep complex systems running and manageable. In most they also do not stand in the way of making system better but help in it.
They need to exist because the architecture of system grew organically from smaller system over and over again and big restructurization was deemed not worth it. It's "just have a bunch more hardware and engineers" vs "we're not delivering features and we might not get rewrite right".
And every time you throw money at the problem the problem becomes a bigger problem and potential benefits from "getting it right" are also getting bigger. But nobody wants to be herald that tells management "we 're going to spend 6-12 months" for somethinkg that have few years of pay-off
You can even run Linux on them now. The specs he cites would actually be fairly small for a mainframe, which can reach up to 40TB of memory.
I'm not saying this is a good idea, but it seems better than what the OP proposes.
I also bet that mainframes have software solutions to a lot of the multi-tenancy and fault tolerance challenges with running systems on one machine that I mention.
You would be surprised. First off, SSDs are denser than hard drives now if you're willing to spend $$$.
Second, "plug in" doesn't necessarily mean "in the chassis". You can expand storage with external disk arrays in all sorts of ways. Everything from external PCI-e cages to SAS disk arrays, fibre channel, NVMe-over-Ethernet, etc...
It's fairly easy to get several petabytes of fast storage directly managed by one box. The only limit is the total usable PCIe bandwidth of the CPUs, which for a current-gen EPYC 9004 series processors in a dual-socket configuration is something crazy like 512 GB/s. This vastly exceeds typical NIC speeds. You'd have to balance available bandwidth between multiple 400 Gbps NICs and disks to be able to saturate the system.
People really overestimate the data volume put out by a service like Twitter while simultaneously underestimating the bandwidth capability of a single server.
I think every big internet service uses user-space networking where required, so that part isn't new.
Netapp is at something > 300TB storage per node IIRC, but in any case it would make more sense to use some cloud service. AWS EFS and S3 don't have any (practically reachable) limit in size.
Some commodity machines use external SAS to connect to more disk boxes. IMHO, there's not a real reason to keep images and tweets on the same server if you're going to need an external disk box anyway. Rather than getting a 4u server with a lot of disks and a 4u additional disk box, you may as well get 4u servers with a lot of disks each, use one for tweets and the other for images. Anyway, images are fairly easy to scale horizontally, there's not much simplicity gained by having them all in one host, like there is for tweets.
No, it doesn’t. It’s a fun exercise in approaching Twitter as an academic exercise. It ignores all of the real-world functionality that makes it a business rather than a toy.
A lot of complicated businesses are easy to prototype out if you discard all requirements other than the core feature. In the real world, more engineering work often goes to ancillary features that you never see as an end user.
This is not apples to apples but Whatsapp is a product that entirely ran on 16 servers at the time of acquisition (1.5 billion users). It really begs the question why Twitter uses so much compute if there are companies that have operated significantly more efficiently. Twitter was unprofitable during acquisition and spent around half their revenue on compute, maybe some of these features were not really necessary (but were just burning money)?
I see a lot of comments here assuming that this proves something about Twitter being inefficient. Before you jump to conclusions, take a look at the author’s code: https://github.com/trishume/twitterperf
Notably absent are things like serving HTTP, not to even mention HTTPS. This was a fun exercise in algorithms, I/O, and benchmarking. It wasn’t actually imitating anything that resembles actual Twitter or even a usable website.
I'm now somewhat confident I could implement this if I tried, but it would take many years, the prototype and math is to check whether there's anything that would stop me if I tried and be a fun blog post about what systems are capable of.
I've worked on a team building a system to handle millions of messages per second per machine, and spending weeks doing math and building performance prototypes like this is exactly what we did before we built it for real.
Of course. I was commenting here to counter all of the comments declaring that this proves Twitter doesn’t need all of their servers, etc.
It’s a fun article, but the comments here interpreting it as proving something about Twitter engineers being bad are kind of depressing.
There was an article just yesterday about how Jane Street had developed an internal exchange way faster than any actual exchange by building it from the ground up, thinking about how the hardware works and how agents can interact with it.
Modern software like Slack or Twitter are just reinventing what IRC or BBS did in the past, and those were much leaner, more reliable and snappier than their modern counterparts, even if they didn't run at the same scale.
It wouldn't be surprising at all that you could build something equivalent to Twitter on just one beefy machine, maybe two for redundancy.
> Once you understand your computer has 16 cores running at 3GHz and yet doesn't boot up in .2 nanoseconds you understand everything they have taken from you.
With their infinite VC money at their disposal, and with their programmers having 100 GHz machines with thousands of cores, 128 TB of RAM and FTL internet connections, tech companies don't really have any incentive to actually reduce bloat.
Edit: it's still quite sad. I feel like we had languages with a way better future, and more promising programming architectures, back in the 80s.
The blog post kind of gets a very cut-down version of Twitter running on a single machine. Actual Twitter absolutely would not work.
Of course it's a proof-of-concept, it's not a drop-in replacement for Twitter.
https://gist.github.com/jboner/2841832
Essentially IO is expensive except within a datacenter but even in a data center, you can do a lot of loop iterations in a hot loop in the time it takes to ask a server for something.
There is a whitepaper which talks about the raw throughput and performance of single core systems outperforming scalable systems. These should be required reading of those developing distributed systems.
http://www.frankmcsherry.org/assets/COST.pdf A summary: http://dsrg.pdos.csail.mit.edu/2016/06/26/scalability-cost/
How much RAM did your advertising network need? Becuase that is what makes twitter a business! How are you building your advertiser profiles? Where are you accounting for fast roll out of a Snapchat/Instagram/BeReal/Tiktok equivalent? Oh look, your 140 characters just turned into a few hundreds megs of video that you're going to transcode 16 different ways for Qos. Ruh Roh!
How are your 1,000 engineers going to push their code to production on one machine?
Almost always the answer to "do more work" or "buy more machines" is "buy more machines".
All I'm saying is I'd change it to "Toy twitter on one machine" not Production.
> How are your 1,000 engineers going to push their code to production on one machine?
That might actually be the reason why Twitter barely keeps afloat. 1k engineers for a product that's already built and hasn't fundamentally changed nor evolved in years makes me wonder what business value those engineers actually provide.
If you have a 1 KB piece of data that you need to send to a customer, ideally that should require less than 1 KB of actual NIC traffic thanks to HTTP compression.
If processing that 1 KB takes more than 1 KB of total NIC traffic within and out of your data centre, the you have some level of amplification.
Now, for writes, this is often unavoidable because redundancy is pretty much mandatory for availability. Whenever there's a transaction, an amplification factor of 2-3x is assumed for replication, mirroring, or whatever.
For reads, good indexing and data structures within a few large boxes (like in the article) can reduce the amplification to just 2-3x as well. The request will likely need to go through a load balancer of some sort, which amplifies it, but that's it.
So if you need to process, say, 10 Gbps of egress traffic, you need a total of something like 30 Gbps at least, but 50 Gbps for availability and handling of peaks.
What happens in places like Twitter is that they go crazy with the microservices. Every service, every load balancer, every firewall, proxy, envoy, NAT, firewall, and gateway adds to the multiplication factor. Typical Kubernetes or similar setups will have a minimum NIC data amplification of 10x on top of the 2-3x required for replication.
Now multiply that by the crazy inefficient JSON-based protocols, the GraphQL, an the other insanity layered on to "modern" development practices.
This is how you end up serving 10 Gbps of egress traffic with terabits of internal communications. This is how Twitter apparently "needs" 24 million vCPUs to host text chat.
Oh, sorry... text chat with the occasional postage-stamp-sized, potato quality static JPG image.
It needs to assemble tweets internally, sort them with an ML model, add in relevant ads and present a single response to the user because end-user latency matters.
And each of these systems eg. ads has their own features, complexities, development lifecycle and scaling requirements. And of course deploying them continuously without downtime. That is how you end up with disparate services and a lot of them for redundancy reasons.
I know you think you’re smarter than everyone at Twitter. But those who really know what they are doing have a lot more respect for the engineers who built this insanity. There are always good intentions.
You ignored one possibility - that twitter engineers, or people managing them might be just incompetent and all of that might just be overly complex POS
There is that weird disgusting trend to assume just because company got big that means the tech choices were immaculate, and not everything else there is to successful companies.
You can make perfectly well doing company on totally mediocre product that hit the niche at right time
Feel free to continue using that (historically-correct) answer in interviews. :P
Edit: Still a nice writeup!
They are knowledgeable to a certain level but they simply aren't great engineers who almost always are humble, cautious, thoughtful and respectful of the intentions behind what other engineers build.
Anyone who thinks they can jump in and replace any tech stack without an extensive deep dive of the business requirements, design decisions, cost constraints, resource limitations etc that drove the choices deserves the pain and unemployment that inevitably follows.
Isn't this exactly what modern key value stores like RocksDB, LMDB etc are built for?
I think that this may make sense for some applications, but I also think that if you can utilize software abstractions to improve developer efficiency, it reduces risk in the long run.
[1] https://en.wikipedia.org/wiki/P4_(programming_language)
[2] https://www.intel.com/content/www/us/en/products/network-io/...
I agree with you specialists are expensive but even a team of software engineers runs into the multi million dollar territory.
Why not spend the same amount AND cut down resource use? Hyperscalers have shifted to custom hardware already.
[1] https://github.com/fiberhood/MorphleLogic/blob/main/README_M...
[1] Flexible Content-based Publish/Subscribe over Programmable Data Planes https://pure.rug.nl/ws/files/206093441/Flexible_Content_base...
Like last time, and the time before that, and the time before that, and the time before that.
I understand the frustration with flavor of the week "best practices" and the constant churn of frameworks and ideas, but software engineering as a practice IS moving forward. The difficulty is separating the good ideas (CI/CD, for example) from the trends (TDD all the things all the time) ahead of time.
> I did all my calculations for this project using Calca (which is great although buggy, laggy and unmaintained. I might switch to Soulver) and I’ll be including all calculations as snippets from my calculation notebook.
I've always wanted an {open source, stable, unit-aware} version of something like this which could be run locally or in the browser (with persistence on a server). I have yet to find one. This would be a massive help to anyone who does systems design.
Unsolicited story time:
Prior to my joining the company Hostway had transitioned from handling all email in a dispersed fashion across shared hosting Linux boxes with sendmail et al, to a centralized "cluster" having disparate horizontally-scaled slices of edge-SMTP servers, delivery servers, POP3 servers, IMAP servers, and spam scanners. That seemed to be their scaling plan anyways.
In the middle of this cluster sat a refrigerator sized EMC fileserver for storing the Maildirs. I forget the exact model, but it was quite expensive and exotic for the time, especially for an otherwise run of the mill commodity-PC based hosting company. It was a big shiny expensive black box, and everyone involved seemed to assume it would Just Work and they could keep adding more edge-SMTP/POP/IMAP or delivery servers if those respective services became resource constrained.
At some point a pile of additional customers were migrated into this cluster, through an acquisition if memory serves, and things started getting slow/unstable. So they go add more machines to the cluster, and the situation just gets worse.
Eventually it got to where every Monday was known as Monday Morning Mail Madness, because all weekend nobody would read their mail. Then come Monday, there's this big accumulation of new unread messages that now needs to be downloaded and either archived or deleted.
The more servers they added the more NFS clients they added, and this just increased the ops/sec experienced at the EMC. Instead of improving things they were basically DDoSing their overpriced NFS server by trying to shove more iops down its throat at once.
Furthermore, by executing delivery and POP3+IMAP services on separate machines, they were preventing any sharing of buffer caches across these embarrassingly cache-friendly when colocated services. When the delivery servers wrote emails through to the EMC, the emails were also hanging around locally in RAM, and these machines had several gigabytes of RAM - only to never be read from. Then when customers would check their mail, the POP3/IMAP servers always needed to hit the EMC to access new messages, data that was probably sitting uselessly in a delivery server's RAM somewhere.
None of this was under my team's purview at the time, but when the castle is burning down every Monday, it becomes an all hands on deck situation.
When I ran the rough numbers of what was actually being performed in terms of the amount of real data being delivered and retrieved, it was a trivial amount for a moderately beefy PC to handle at the time.
So it seemed like the obvious thing to do was simply colocate the primary services accessing the EMC so they could actually profit from the buffer cache, and shut off most of the cluster. At the time this was POP3 and delivery (smtpd), luckily IMAP hadn't taken off yet.
The main barrier to doing this all with one machine was the amount of RAM required, because all the services were built upon classical UNIX style multi-process implementations (courier-pop and courier-smtp IIRC). So in essence the main reason most of this cluster existed was just to have enough RAM for running multiprocess POP and SMTP sessions.
What followed was a kamikaze-style developed-in-production conversion of courier-pop and courier-smtp to use pthreads instead of processes by yours truly. After a week or so of sleepless nights we had all the cluster's POP3 and delivery running on a single box with a hot spare. Within a month or so IIRC we had powered down most of the cluster, leaving just spam scanning and edge-SMTP stuff for horizontal scaling, since those didn't touch the EMC. Eventually even the EMC was powered down, in favor of drbd+nfs on more commodity linux boxes w/coraid.
According to my old notes it was a Dell 2850 w/8GB RAM we ended up with for the POP3+delivery server and identical hot spare, replacing racks of comparable machines just having less RAM. >300,000 email accounts.
https://patents.google.com/patent/US20120136905A1/en (licensed under Innovators Patent Agreement, https://github.com/twitter/innovators-patent-agreement)
I could have definitely served all the chronological timeline requests on a normal server with lower latency that the 1.1 home timeline API. There are a bunch of numbers in the calculations that he is doing that are off but not by an order of magnitude. The big issue is that since I left back then Twitter has added ML ads, ML timeline and other features that make current Twitter much harder to fit on a machine than 2013 Twitter.
The second is, it's interesting to understand social media industry wide infra cost per user. If you look at FB, Snap, etc. they are within all within an order of magnitude in cost per DAU (DAU / Cost of revenue) of each other. This can be verified via 10-ks which show Twitter at $1.4B vs. SNAP 1.7B Cost of Revenue. The major difference between the platforms is revenue per user, with FB being the notable exception.
Also would you summarize the patent/architecture? The link is a bit opaque/hard to read.
Note: Cost of Revenue does also include TAC and revenue sharing (IIRC) and not just Infra costs but in theory they would also be at similar levels.
eg. SNAPs 10-k https://d18rn0p25nwr6d.cloudfront.net/CIK-0001564408/da8288a...
Sure its expensive, and you have to deal with IBM, who are either domain experts or mouth breathers. Sure it'll cost you $2m but!
the opex of running a team of 20 engineers is pretty huge. Especially as most of the hard bits of redundant multi-machine scaling are solved for you by the mainframe. Redundancy comes for free(well not free, because you are paying for it in hardware/software)
Plus, IBM redbooks are the golden standard of documentation. Just look at this: https://www.redbooks.ibm.com/redbooks/pdfs/sg248254.pdf its the redbook for GPFS (scalable multi-machine filesystem, think ZFS but with a bunch more hooks.)
Once you've read that, you'll know enough to look after a cluster of storage.
Through intense digging I found a researcher who left a notebook public including tweet counts from many years of Twitter’s 10% sampled “Decahose” API and discovered the surprising fact that tweet rate today is around the same as or lower than 2013! Tweet rate peaked in 2014 and then declined before reaching new peaks in the pandemic. Elon recently tweeted the same 500M/day number which matches the Decahose notebook and 2013 blog post, so this seems to be true! Twitter’s active users grew the whole time so I think this reflects a shift from a “posting about your life to your friends” platform to an algorithmic content-consumption platform.
So, the number of writes has been the same for a good long while.
But sure, go ahead and take this as evidence that 10 people could build Twitter as I'm sure that's what will happen to this post. If that's true, why haven't they already done so? It should only take a couple weeks and one beefy machine, right?
SEAN: So if I asked you about art you’d probably give me the skinny on every art book ever written. Michelangelo? You know a lot about him. Life’s work, political aspirations, him and the pope, sexual orientation, the whole works, right? But I bet you can’t tell me what it smells like in the Sistine Chapel. You’ve never actually stood there and looked up at that beautiful ceiling. Seen that.
I think Twitter does (or at some point did) use a combination of the first and second approach. The vast majority of tweets used the first approach, but tweets from accounts with a certain threshold of followers used the second approach.
I know it's not the core premise of the article, but this is very interesting.
I believe that 90% of tweets per day are retweets, which supports the author's conclusion that Twitter is largely about reading and amplifying others.
That would leave 50 million "original" tweets per day, which you should probably separate as main tweets and reply tweets. Then there's bots and hardcore tweeters tweeting many times per day, and you'll end up with a very sobering number of actual unique tweeters writing original tweets.
I'd say that number would be somewhere in the single digit millions of people. Most of these tweets get zero engagement. It's easy to verify this yourself. Just open up a bunch of rando profiles in a thread and you'll notice a pattern. A symmetrical amount of followers and following typically in the range of 20-200. Individual tweets get no likes, no retweets, no replies, nothing. Literally tweeting into the void.
If you'd take away the zero engagement tweets, you'll arrive at what Twitter really is. A cultural network. Not a social network. Not a network of participation. A network of cultural influencers consisting of journalists, politicians, celebrities, companies and a few witty ones that got lucky. That's all it is: some tens of thousands of people tweeting and the rest leeching and responding to it.
You could argue that is true for every social network, but I just think it's nowhere this extreme. Twitter is also the only "social" network that failed to (exponentially) grow in a period that you might as well consider the golden age of social networks. A spectacular failure.
Musk bought garbage for top dollar. The interesting dynamic is that many Twitter top dogs have an inflated status that cannot be replicated elsewhere. They're kind of stuck. They achieved their status with hot take dunks on others, but that tactic doesn't really work on any other social network.
Totally out of topic here, but could be he just wants the ability to amplify his own ideas. Also, why measure Twitter value (arbitrarily?) by number of unique tweets, rather than by read tweets?
That was some time ago, though.
9 web servers to serve the entire network. I wish more developers were aware of just how performant modern (non-cloud) hardware is.
The ultimate extension of this "run it all on one machine" meme would be to run the bots on the single machine along with the service.
Not so, serving an ad to a bot gains you no revenue, because ad networks charge for clicks, not impressions. If a significant percentage of your ad clicks are from bots, you're running a defective advertising platform and won't have customers for long regardless.
I learned this the hard way when I was running a medium-sized MapReduce job in grad school that was over 100x faster when run as a local direct computation with some numerical optimizations.
Most then suggest scale that would make the service run comfortable from a not-too powerful machine, and then go to design data-center spanning distributed service.
https://lenovopress.lenovo.com/assets/images/LP1195/Mellanox...
A litanny of "gotchas", where someone attempts to best the OP. What about x, y and z? It can't possibly scale. Twitter is so much more than this, etc.
The OP isn't making the assertion that Twitter should replace their current system with a single large machine.
The whole thread paints a picture of HN like it is full of a bunch of half-educated, uncreative negative brats.
To the people that encourage a fun discussion, thank you! Great things are not built by people who only see how something cannot possibly work.
I love that Tristan put out this post and made it so detailed with plenty of assumptions to cover. I also like to hear about possible issues and assumptions which the crowd calls out. Even naysayers can be helpful.
I want both, but I don't want to crowd to go to far and kill the desire to produce this kind of content.
I find it much more appealing to just make the whole thing run on one fast machine. When you suggest this tend to people say "but scaling!", without understanding how much capacity there is in vertical.
The thing most appealing about single server configs is the simplicity. The more simple a system easy, likely the more reliable and easy to understand.
The software thing most people are building these days can easily run lock stock and barrel on one machine.
I wrote a prototype for an in-memory message queue in Rust and ran it on the fastest EC2 instance I could and it was able to process nearly 8 million messages a second.
You could be forgiven for believing the only way to write software is is a giant kablooie of containers, microservices, cloud functions and kubernetes, because that's what the cloud vendors want you to do, and it's also because it seems to be the primary approach discussed. Every layer of such stuff add complexity, development, devops, maintenance, support, deployment, testing and (un)reliability. Single server systems can be dramatically mnore simple because you can trim is as close as possible down to just the code and the storage.
It's very simple to make a PoC on a very powerful machine, make it ready from production serving hunderd of millions of users is completely different.
In my experience this ended up with more complicated.
Those systems are typically developed by people who already left and are undocumented, and they become extremely difficult to figure out the config (packages, etc files... oh, where even the service files are located?) and almost impossible to reproduce.
It might be okay to leave it there, but when we need to modify or troubleshoot the system a nightmare begins...
Maybe I was just unlucky, but at least k8s configs are more organized and simpler than dealing with a whole custom configured Linux system.
We have python service which consumes gigabytes of RAM for quite simple task. I'm sure that I'd rewrite it with Rust to consume tens of megabytes of RAM at most. Probably even less.
But I don't have time for that, there are more important things to consider and gigabytes is not that bad. Especially when you have some hardware elasticity with cloud resources.
I think that if you can develop world-scale twitter which could run on a single computer, that's a great skill. But it's a rare skill. It's safer to develop world-scale twitter which will run on Kubernetes and will not require rare developer skills.
Indeed.
Lots of examples out there, one being Let's Encrypt[1] who run off one MySQL server (with a few read replicas but only one write).
[1] https://letsencrypt.org/2021/01/21/next-gen-database-servers...
If anything, Kubernetes allows you to save cost by going with a scalable number of small, inexpensive, fully utilized machines, vs one large, expensive, underused one.
Like kafka.
My impression is that it is the serialisation that comes with each service-to-service communication that is really expensive.
What if your unique machine crash?
I think you mean vertical, right?
https://k3s.io/ makes it really easy to set up, too.
As soon as you start accounting for redundancy you have to fan out anyway.
Meanwhile the company I just left was spending more than this for dozens of kubernetes clusters on AWS before signing a single customer. Sometimes I wonder what I'm still doing in this industry.
Yup.
Cloud is 21st century Nickel & Diming.
Sure it sounds cheap, everything is priced in small sounding cents per unit.
But then it very quickly becomes a compounding vicious circle ... a dozen different cloud tools, each charged at cents per unit, those units often being measured in increments of hours....next thing you know is your cloud bill has as many zeros on the end of it as the number of cloud services you are using. ;-)
And that's before we start talking about the data egress costs.
With colo you can start off with two 1/4 rack spaces at two different sites for resilience. You can get quite a lot of bang for your buck in a 1/4 rack with today's kit.
You can get up to $100k and it's a big reason many startups go in that direction.
Also $20k is nothing when you factor in developer time etc.
I suppose there's a chance AI will get to the point where we can feed it a ruby/python/js/whatever code base and it can emit the functionally equivalent machine code as a single binary (even a microservices mess).
There's some big problems with this approach today, namely, it's not always right, and it may sometimes be half right (miss edge cases).
But think of where this AI technology is headed -- it stands to reason it will eventually work pretty much perfect.
And then I think we'll see another very strong trend -- large AI models replacing other forms of software. Why write a compiler when GPT3 can compile C to asm? Why write an interpreter when GPT3 can "compile" python to C?
The AI model is hilariously less-efficient than traditional software, but it will be far far cheaper and faster to create than the traditional equivalent.
What other types of software will be replaced by AI models?
The stuff that actually is CPU-bound often ends up being written in an appropriate language, or uses C extensions (e.g. ML and data science libraries for Python).
This post is perfect world thinking. We don't live in a perfect world.
However, the stateless workload can still operate in a read-only manner if the stateful component failed.
I run an email forwarding service[1], and one of challenge is how can I ensure the email forwarding still work even if my primary database failed.
And I come up with a design that the app boot up, and load entire routing data from my postgres into its memory data structure, and persisted to local storage. So if postgres datbase failed, as long as I have an instance of those app(which I can run as many as I can), the system continue to work for existing customer.
The app use listen/notify to load new data from postgres into its memory.
Not exactly the same concept as the artcile, but the idea is that we try to design the system in a way where it can operate fully on a single machine. Another cool thing is that it easiser to test this, instead of loading data from Postgres, it can load from config files, so essentially the core biz logic is isolated into a single machine.
---