> “WE’RE NOT NETFLIX!” I finally snapped. “Netflix has 500 engineers. We have 4. Netflix has dedicated DevOps teams. We have one guy. Netflix has millions of users. We have 50,000.”
Then
> Lesson 5: The Monolith Isn’t Your Enemy
> A well-structured monolith can:
> Scale to millions of users (Shopify, GitHub, Stack Overflow prove this)
Because Shopify, Github and Stack Overflow have 4 engineers each as well.
It kind of seems real because it reads like the it's written by the kind of person that would make high level arch decisions without even understanding what the f they are doing.
Based on my experience microservices do introduce additional fixed costs compared to monoliths (and these costs can be too expensive for small teams), so everything you've quoted makes complete sense.
https://en.wikipedia.org/wiki/Stack_Overflow
It is NOW that Stack Overflow may have more than one engineer working on it.
> Microservices didn’t scale our startup. They killed it.
...and then at the end,
> We lost 6 months. We lost some good engineers. We burned through money we didn’t have. But we survived.
...So did microservices kill the startup or not?
If you are small and not have scaling problems, it is highly unlikely that you see a real difference between monolith or microservice except on the margin.
But lots of things look off in the article: Billing needed to ... Create the order
What? Billing service is the one creating the orders instead of the opposite?
Monday: A cascading failure took down the entire platform for 4 hours. One service had a memory leak, which caused it to slow down, which caused other services to time out, which caused retries, which brought everything down. In the monolith days, we would’ve just restarted one app. Now we had to debug a distributed system failure.
Hum, they could have restarted the service that failed, but if they had a leak in their code, even being a monolith the whole app would have gone done until the thing is fixed even constantly restarting. And I don't imagine the quality of your monolith service that is constantly restarting in full...Finally it claims that Monday their service started to be slow, and already Wednesday the customer threatened to leave them because of the service to be slower. Doesn't look like to be a customer very hooked or needing your service if only after 2 days of issues they already want to leave.
Also, something totally suspicious is that, even if small or moderate size of company you could still have people push some architecture that they prefer, no company with a short few months cash runaway will decide to do a big refactor of the whole architecture if everything was good on the first place and no problem encountered. What will happen in theory is that you will start to face a wall, degrading performances with scale of something like that and then decide that you will have to do something, a rework. And then there will be the debate and decision about monolith, microservice, whatever else...
Was it something in the payment space?
The mistake here was having an architect full stop. The team is too small, a good tech lead can manage to plan a service with 50k MAU (and way beyond) without an architect. The problem with some companies that get millions in seed funding is that they need to spend the money and they do so by adding roles that shouldn't exist at that stage.
Another favourite antipattern: making devops a bottleneck. Don’t over-engineer production, don’t buy abstraction you can’t afford, and educate your colleagues to lower the bus factor.
Dedicated devops that aren’t co-founders are notorious for cv optimizing: working with cool, but time-consuming stuff they don’t yet master, at the cost of delivery-time risk.
But that starts to fall down too any time too many people are talking about software they aren’t responsible for deploying or fixing.
> ...you should be wary of spending too much time on things that customers don’t see
I don't think this is entirely true because there are some things that will help you ship faster like good architecture and a system design that is as simple as possible. These are worth investing, despite their obscurity to the end user, because doing it well can result in a faster pipeline and more stability.It’s when the invisible stuff becomes a chore, and blocks or slows down releasing value, such as worrying about micro services.
> And ironically? Now that we’re back on a monolith and shipping fast again, we’ve started growing again. Customers are happier. The team is happier.
So Microservices did not kill your startup?
And why did you stop instances of your monolith before the Microservices version was mature and ready???
That's because Medium is a bunch of APIs and (micro) services, not a monolith like it should be.
Heck, it could be plain static HTML because it's just text for crying out loud!
Instead, it uses a GraphQL query through JSON to obtain the text of the article... that it already sent me in HTML.
Total page weight of 17 MB, of which 6.7 MB is some sort of non-media ("text") document or script.
This is user-hostile architecture astronaut madness, and is so totally normal in the modern internet that nobody even bats and eye when text takes appreciable amounts of time to render on a 6 GHz multi-core computer with 1 Gbps fibre Internet connectivity.
Your customers hate this. Your architects love it because it keeps them employed.
Those grey loading placeholders for text are called skeleton loaders BTW, polyfills are libraries used to support newer browser APIs in older browsers and not something you can exactly see on a website (without checking the devtools)
You know a team has lost the architectural plot when their answer for all performance problems is more caching. And once you add caching it’s hard to sell any other sort of improvements because the caching poisons the perf analysis.
Their solution took forever because the system was less deterministic than we even knew. They were starting to wrap it up when I went on a tear cleaning up low level code that was nickel and diming us. By the time they launched they were looking at achieving half of the response time improvement they were looking for, in twice the time they estimated to do so. And they cheated. They making two requests about 10% of the time, which made the p50 time into a lie, because two smaller requests pull down the average but not the cost per page load. But I scooped them and made the slow path faster, undercutting another 25% of their perf improvements.
I ended up doing more to improve the Little’s Law situation in three months of working on it half time than they did in two man years. And still nothing changed. They are now owned by a competitor. That I believe shut down almost all of their services.
I'm not sure why Medium does the weird blanking thing but my guess is that it's because it's deciding whether to let you read the article or instead put up a paywall. There are a lot of SPA sites out there, many of which aren't particularly economical with frontend resources, and they generally don't do that unless they're trying to enforce some kind of paywall or similar.
Sure, but a significant motivation for using GraphQL is to stitch together a bunch of microservices into a cohesive API for the front end.
My comment about Medium using microservices was just an informed guess, but a good one. They started migrating from a monolith to microservices back in 2018: https://medium.engineering/microservice-architecture-at-medi...
Is it a coincidence that that's around the time frame that I noticed the Medium web site becoming slower than it used to be?
You don’t have a market fit and you’re running your dev team like a larger company? What are the odds? Pretty high actually.
- don't blindly jump into a new architecture because it's cool
- choose wisely the size of your services. It's not binary, and often it makes sense to group responsibilities into larger services.
- microservices have some benefits, moduliths (though not mentioned in the article) and monoliths have theirs. They all also have their set of disadvantages.
- etc
But anyway, the key lesson (which does not seem like a conclusion the author made) is:
Don't put a halt to your product/business development to do technician only work.
I.e if you can't make a technical change while still shipping customer value, that change may not be worth it.
There are of course exceptions, but in general you can manage technical debt, archtectural work, bug fixing, performance improvements, dx improvements, etc, while still shipping new features.
Microservices solve a logistical problem. Rob wants to push code every two days. Steve wants to push every three. Thom deals with business who wants to release at whim and preferably within a few hours. Their commissions and bonuses are not reduced by how much chaos they case the engineering team. It’s an open feedback loop.
As you add more employees they start tripping over each other on the differences between trunk and deployed. Thats when splitting into multiple services starts to look attractive. Unfortunately they create their own weather and so if you can use process to delay this point you’re gonna be better off.
Everyone eventually merges code they aren’t 100% sure about. Some people do it all the time. However microservices magnify this because it’s difficult to test changes that cross service boundaries. You think you have it right but unless you can fit the entire system onto one machine, you can’t know. And distributed systems usually don’t concern themselves with whether the whole thing will fit onto a dev laptop.
So then you have code in preprod you are pretty sure will work but aren’t completely sure. Stack enough “pretty sure”s over time and as team sizes grow and you’re gonna have incidents on the regular. Separate deployment reduces the blast radius, but doesn’t eliminate it. Feature toggles reduce it more than an order of magnitude, but that still takes you from problems every week to a couple a year. Which in high SLA environments still makes people cranky.