Is the app stack naturally resource heavy or is this setup particular different to how a instance should be?
1. Those 30k users who are following users on other servers, which pulls in the content of those users. I'm on hachyderm but I would guess only about 20% of the people I follow are. That means my user is pulling in 250 other users data into the system. Of course most, if not all, of the people are follow are being followed by someone else so it's not a pure multiplier. At the same time it does mean it's a lot more data moving in.
2. NFS, which is where they had the problems, was being used as a media store. People on mastodon and twitter like sharing images and other media. Even people who run single user nodes but follow a lot of people end up using a ton of storage space. 30k people scrolling through timelines and actively pulling that data out, while queues are pushing data in, can be tough to scale. Switching to an object store really helped fix that.
On top of that the mastodon app is very very sidekiq heavy. For those not familiar with ruby, sidekiq is basically a queue workload system (similar to python's celery). You scale up by having more queue workers running. The problem with NFS is that all of those queues are sharing the filesystem, making the filesystem a point of failure and scaling pain. Adding more queue workers makes the problem worse by adding to the filesystem load, rather than resolving the problems. Switching to an object store helps until the next centralized service (in this case postgres) reaches its limits.
So basically the 30k users each following their own set of users creates a multiplier on how many users the instance is actually working with. The more users on either side of the equation the more work that needs to be done. If this was a 30k user forum where every user existed on the instance the load would be significantly less.
It is not NFS that is a SPOF, it is a single NFS server that is a SPOF. There exists distributed NFS systems (OneFS, Panasas) that can tolerate the loss of up to N servers before the service gets disrupted.
the main problem was slow io because of faulty disks which brought everything to a crawl.
Imagine your subscriptions and account living on one server, then when you log in that server gives you the list and your client goes and gets all the data.
We already had this sort of federation figured out, it's the open web. We just have to find a way to get the open web to provide the things that google, facebook, reddit, provided.
Easy way to contribute content.
Discovery for new content.
Search ability.
Kill the things centralized websites provided, let people host websites within that system, and let the clients handle dealing with the fact that there are all sorts of providers out there.
Would be interesting seeing Twitters complete backend and while mastodon might not be apples to apples also interested cost per user to infrastructure analysis too.
I dread to think how much a busy 30k user instance does.
Fwiw this sounds to me like what happens when you use "retail" SSDs (drives marketed for use in user laptops) underneath a high write traffic application such as a relational database. Often such drives will either wear out or will turn out to have pathological performance characteristics (they do something akin to GC eventually), or they just have firmware bugs. Use enterprise rated drives for an application like this.
So to be clear, we did try to "offline" a drive from the ZFS pool just to see if this was a viable path. The ZFS pool was set up a few years ago and has gone through a few iterations of disks. The mirrors were unbalanced. We had pairs of drives of one manufacturer/speed mirrored with pairs of drives from another manufacturer/speed. We know this configuration was wrong, again we didn't intend for our little home lab to turn into a small production service.
I think after spending a few hours trying to "offline" the disk, and then repairing the already brittle ZFS configuration to getting the database/media store back to a "really broken and slow but still technically working" state we just decided to pull the plug and move to Hetzner. Offlining the disk caused even more cascading failures and took about 30 minutes just for the software. We could have technically shut down production to try without the database running on it, but at that point we decided to just get out of the basement.
If it would have been as easy as popping a disk in/out of the R630 (like one would imagine) we would have certainly done that.
To be honest I am still very interested in performing more analysis on ZFS on a 6.0.8 Linux kernel. I am not convinced ZFS didn't have more to do with our problems than we think. I will likely do a follow up article on benchmarking the old disks with and without ZFS in the future.
zfs-2.1.4-1 zfs-kmod-2.1.6-1 6.0.8-arch1-1
The different speed is an issue, but I always recommend mixing pairs so that you don’t end up like me, when all spinning metal of the same RAID-5 array failed in a short period. Wasn’t a great day.
Lucky me I had a contingency plan.
Focus your efforts on making robust backups instead. You don't want to be that only guy in org who knows how to do ZFS things when it breaks.
We're running few racks of servers, ZFS is delegated to big boxes of spinning rust where its benefits (deduplication/compression) are used well, but on a bunch of SSDs it is just overkill.
Also unless there is something horribly wrong about how often data is written, that SSD should run for ages.
We ran (for a test) consumer SSDs in busy ES cluster and they still lasted like 2 years just fine
The whole setup was a bit of overcomplicated too. RAID10 with 5+1 or 7+1 (yes Linux can do 7 drive RAID10) with hotspare woud've been entirely fine, easier, and most likely faster. You need backups anyway so ZFS doesn't give you much here, just extra CPU usage
Either way, monitoring wait per drive (easy way is to just plug collectd [1] into your monitoring stack, it is light and can monitor A TON of different metrics)
Remember this isn't a company: its hobbyist/enthusiasts putting their own resources into something or running with donations when available. There's no venture capital to absorb operating losses here. Remember the old "storm/norm/conform/perform" analogy. We are still very much pushing along into norm territory, and articles like this will help establish a conform phase ... but it will take time.
Probably because they wanted to migrate to hetzner anyway and took the chance to do it now instead of later.
But I do agree that it would have been probably a better idea.
As a mass-market hosting provider, Hetzner is subject to constant fraud, abuse and hacked customer servers, and in consequence, their abuse department is very trigger-happy and will usually shoot first and ask questions later. They can and will kick out customers that cause too much of a headache, regardless of their ToS.
Their outbound DDoS detection systems are very sensitive and prone to false positives, such as when you get attacked yourself and the TCP backscatter is considered a portscan. If the system is sufficiently confident that you are doing something bad, it automatically firewalls off your servers until you explain yourself.
Likewise, inbound abuse reports sometimes lead to automated or manual blocks before you can respond to them.
They also rate limited or blocked entire port ranges in the past to get rid of Chia miners and similar miscreants with no regards to collateral damage to other services and without informing their other customers.
Their pricing is good and service is otherwise excellent, and if you do get locked out, you can talk to actual humans to sort it out. But, only after the damage is already done. If you use them, have a backup plan.
The write up is cool. Reminiscent of things we used to do back in that early rails 2-3 era. Just funny we're back where we started.
TLDR: if you want to run ruby on rails on bare metal be ready to run something with 8+ cores, 10k rpm disks minimum and more bandwidth than you can support out of your basement.
Like, I did it, but I wouldn't recommend it, restarting NFS server is gnarly, HA support on OSS side is... not really there last time I checked, and overall PITA.
* [1] https://syncthing.net/
Mastodon should have been based on DHT with each "terminal" aka "profile" having much higher autonomy.
Otherwise, it just gives more tools to people who left Twitter to continue doing same societal damage.
p.s.: it is time to stop writing back-ends in Ruby when every other popular alternative (sans Python-based ones) is more powerful and scalable.
If you end up having a "decentralized system" with 30k users per instance, you basically just have a centralized system that federates with other instances. Sure, it is kind of decentralized, but the admins of that 30k instance are effectively able to read the DMs, impersonate users and delete their content.
I personally think (and I'm trying to formalize my ideas somehow with OpenDolphin) that a centralized instance that is only used to serve the __signed__ / encrypted content solves some parts of the decentralization issues we're seeing here - whilst still giving the users some of the features of decentralized platforms.
If you like / dislike the idea, help us out! We're trying to build a community to build together something great. Every contribution counts (:
And btw, yes, I do agree: that hardware for 30k users doesn't make any sense - it really shows that something isn't optimized :(
There's a working group on end to end encryption already and I do believe they will solve that problem.
Archiving content is trivial and can be automated. It's also pretty easy to migrate between instances- I started on one run by a friend but which I felt was too small, moved to one of the biggest ones, and then ended up on hachyderm. I may end up moving again as I feel like the service is getting rather big- it's one of the largest instances now and there are benefits to being on smaller instances that tend to push people towards them.
Tip: On https://about.opendolphin.social/about, the word "Retribution" is probably not the word you want here. "Compensation", maybe?
Building on that the list of address+pubkeys can be put somewhere to search (DHT on server nodes?) so if someone moves shop they can still be found. Then the client could either subscribe directly to who they want (akin to RSS), or get on someone's instance (which would be akin to RSS aggregator) to participate in their community.
Post: we hit scaling issues caused by our failing disks and running image hosting and databases over NFS
HN: It’s obviously Ruby on Rails fault
I can say with certainty that Ruby specifically was not the bottleneck in our case. I do think that the rails paradigm can often lead to interdependent systems. We see this at GitHub and we also see this in Mastodon. Service A will do reads/writes against the same tables in the database that Service B also does. When service A is moved to an isolation zone, it can still impact Service B's performance.
In other words, I think any stateful framework with the flexibility that Ruby on Rails encourages bad behavior that can contribute to a noisy neighbor problem.
The point I am trying to drive home is that I agree. I can confidently say that Ruby on Rails is not the culprit in our case. To be honest I just ignore anyone who is quick to point fingers and assign blame either technically or personally.
Sorry hacker news got you down. If it helps my family and I are making Sunday morning pancakes with my puppy Björn today and we are all wishing you the best day ever.
The frontend serves 30,000 users. The backend processes their posts alongside the posts of everyone that replies to them and all posts from people that the “home” users follow. So, while the home user base is 30,000, the effective load on the back end is much more, depending on how many followers/following a person has.
You can find posts from people who were hosting their own single user instance that went down because of a popular post that got federated across multiple instances.
I either understand something wring about it or that still points out to Mastodon being slow and badly engineered.
Like, the updates are per server not subscriber right ? That's at worst few thousand requests, all of them can be essentially served from RAM. Even if you get 10k responses to your comment, 10k responses within say 5 minutes is still only like 30-40 req/sec, that shouldn't be much even for Ruby
That doesn't give confidence to the typical self-hosting Mastodon user who goes 'viral' somehow. So they should expect that if they were to have a post to go viral across instances they have to become a sys-admin for the day to bring it back and scale it up to handle the traffic.
No wonder normal users are not self-hosting their own instances to fully own and self-verify themselves on Mastodon and have to search for an instance to re-centralize to.
That is very disappointing and not a great sell for Mastodon so-called 'verification', but not at all surprising.
Mastodon is still a solution pretending to search for a problem, already reminding us of the failure of federated social networks in the long term and eventually withering away and re-centralizing with larger instances.
Hardware having been obtained (via Hetzner, instead of in Kris Nova's home lab), the instance has scaled.