In defense of simple architectures (opens in new tab)

(danluu.com)

368 pointsabzug4y ago189 comments

189 comments

There are some web apps still in production that I wrote almost a decade ago in Node+Express in the simplest, dumbest style imaginable. The only dependencies are Express and some third-party API connectors. The database is an append-only file of JSON objects separated by newlines. When the app restarts, it reads the file and rebuilds its memory image. All data is in RAM.

I figured these toys would be replaced pretty quickly, but turns out they do the job for these small businesses and need very little maintenance. Moving the app to a new server instance is dead simple because there's basically just the script and the data file to copy over, so you can do OS updates and RAM increases that way. Nobody cares about a few minutes of downtime once a year when that happens.

There are good reasons why we have containers and orchestration and stuff, but it's interesting to see how well this dumb single-process style works for apps that are genuinely simple.

danenania4y ago

Built-in first-class concurrency (ala node, golang, rust, etc.) is a huge win for simple architectures, since it lets you avoid adding a background queue, or at least delay it for a very long time.

I think people are also too quick to add secondary data stores and caches. If you can do everything with a transactional SQL database + app process memory instead, that is generally going to save you tons of trouble on ops, consistency, and versioning issues, and it can perform about as well with the right table design and indexes.

For example: instead of memcache/redis, set aside ~100 MB of memory in your app process for an LRU cache. When an object is requested, hit the DB with an indexed query for just the 'updatedAt' timestamp (should be a sub-10ms query). If it hasn't been modified, return the cached object from memory, otherwise fetch the full object from the DB and update the local cache. For bonus points, send an internal invalidation request to any other app instances you have running when an object gets updated. Now you have a fast, scalable, consistent, distributed cache with minimal ops complexity. It's also quite economical, since the RAM it uses is likely already over-provisioned.

This is exactly the approach that EnvKey v2[1] is using, and it's a huge breath of fresh air compared to our previous architecture. Just MySQL, Node/TypeScript, and eventually consistent replication to S3 for failover. We also moved to Fargate from EKS (AWS kubernetes product), and that's been a lot simpler to manage as well.

1 - https://v2.envkey.com

gregmac4y ago

> For example: instead of memcache/redis, set aside a ~100 MB of memory in your app process for an LRU cache. When an object is requested, hit the DB with an indexed query for just the 'updatedAt' timestamp (should be a sub-10ms query). If it hasn't been modified, return the cached object from memory, otherwise fetch the full object from the DB and update the local cache.

I've never built something with this type of mechanism for a DB query, but it's interesting. I don't think I've ever timed a query like this, but I feel like it's going to be an "it depends" situation based on what fields you're pulling back, if you're using a covering index, just how expensive the index seek operation is, and how frequently data changes. I've mainly always treated it as "avoid round trips to the database" -- zero queries is better than one, and one is better than five.

I also guess it depends on how frequently it's updated: if 100% of the time the timestamp is changed, you might as well just fetch (no caching). Based on all the other variables above, the inflection point where it makes sense to do this is going to change.

Interesting idea though, thanks.

> For bonus points, send an internal invalidation request to any other app instances you have running when an object gets updated. Now you have a fast, scalable, consistent, distributed cache with minimal ops complexity.

Now you have to track what other app servers exist, handle failures/timeouts/etc in the invalidation call, as well as have your app's logic able to work properly if this invalidation doesn't happen for any reason (classic cache invalidation problem). My inclination is at this point you're on the path of replicating a proper cache service anyways, and using Redis/Memcache/whatever would ultimately be simpler.

danenania4y ago

It definitely does depend on various factors, but if your query is indexed, both the SQL DB request and the Redis/Memcache lookup of the full object are likely to be dominated by internal network latency. If your object is large, the DB single-field lookup could easily be faster since you're sending less back over the wire.

In other words, a single-field indexed DB lookup can be treated more like a cache request. Though for heavier/un-indexed queries, your "avoid round trips to the database" advice certainly applies.

With this architecture, the internal invalidation request is just an optimization. It isn't necessary and it doesn't matter if it fails, since you always check the timestamp with a strongly consistent DB read before returning a cached object.

conradfr4y ago

> Built-in first-class concurrency (ala node, golang, rust, etc.) is a huge win for simple architectures, since it lets you avoid adding a background queue, or at least delay it for a very long time.

>For example: instead of memcache/redis, set aside ~100 MB of memory in your app process for an LRU cache.

Erlang/Elixir for the win with (almost transparent multi-core) concurrency and ETS ;)

danenania4y ago

Oh yeah, Erlang/Elixir certainly belong in that list (probably at the front of it).

dmw_ng4y ago

> database is an append-only file of JSON objects separated by newlines. When the app restarts, it reads the file and rebuilds its memory image. All data is in RAM

Apps like this tend to perform like an absolute whippet too (or if they dont, getting them to perform well is often a 5 line change). It's really freeing to be able to write scans and filters with simple loops that still return results faster than a network roundtrip to a database.

The problem is always growth, either GC jank from a massive heap, running out of RAM, or those loops eventually catching up with you. Fixing any one of these eventually involves either serialization or IO, at which point the balance is destroyed and a real database wins again.

Beltalowda4y ago

Another issue with "just a JSON file" as a database is that you need to be a bit careful to avoid race conditions and the like, e.g. if two web pages try to write the same database at the same time. It's not an issue for all applications, and not that hard to get right, but does require some effort. This is a huge reason I prefer SQLite for simple file storage needs.

brundolf4y ago

A normal Express app (assuming it's one process per JSON file) shouldn't have that problem, because JavaScript is single-threaded

3 more replies

endorphine4y ago

Doesn't the fact that its opened in append only mode (Linux) mitigate data races with regards to writes?

3 more replies

bob10294y ago

> The problem is always growth, either GC jank from a massive heap, running out of RAM, or those loops eventually catching up with you

Absolutely. The challenge is having enough faith that it will take long enough to catch up to you.

Statistically speaking, it won't catch up to you and if it does, it will take so long you should have seen it coming from miles away and had time to prepare.

In my systems that use an in-memory/append-only technique, I try to keep only the pointers and basic indexes in memory. With modern PCIe flash storage, there is no good justification for keeping big fat blobs around in memory anymore.

aprdm4y ago

Could you expand what you mean by keeping pointers and basic index in memory?

1 more reply

pavlov4y ago

Yes, you need to be sure that you understand the growth pattern if you want to YOLO in RAM. If your product aims to be the next Instagram, this is clearly not the architecture.

But a lot of small businesses are genuinely small. They may not sign up new customers that often. When they do, the impact to the service is often very predictable ("Amy at customer X uses this every other day, she's very happy, it generates 100 requests / week"). If growth picks up, there would be signs well in advance of the toy service becoming an actual problem.

jkaptur4y ago

> If your product aims to be the next Instagram, this is clearly not the architecture.

But maybe! https://instagram-engineering.com/dismissing-python-garbage-...

uuyi4y ago

I would love to see more stuff like that.

An application I have written recently for personal use is a double entry accounting system after GNUcash hosed itself and gave me a headache. This is based on Go and SQLite. The entire thing is one file (go embed rocks) and serves a simple http interface with a few JS functions like it is 2002 again. The back end is a proper relational model that is stored in one .db file. It is fully transactional with integrity checks. To run it you just start program and open a browser. To backup you just copy the .db file. You can run reports straight out of SQLite in a terminal if you want.

This whole concept could scale to tens of users fine for LOB applications and consume little memory or resources.

CraigJPerry4y ago

>> This whole concept could scale to tens of users

I strongly suspect this approach scales to tens of thousands of users. Maybe 30-40k users would be my guess on a garden variety intel i5 desktop from the past 3 years or so.

I say this because that hardware (assuming NVMe storage) will do north of 100k connect + select per second (connect is super cheap in sqlite, you're just opening a local file), assuming 2-3 selects per page serve gets me to the 30-40k number. The http server side won't be the bottleneck unless there's some seriously intensive logic being run.

uuyi4y ago

Interesting point. I may have to write a performance test suite for it now and test this.

weaksauce4y ago

sqlite really does very well with reads but not as much for locking writes. not saying it couldn't scale to many users but I think the other person is a bit optimistic on a double entry accounting app being only reads. I would imagine it could certainly easily serve a few hundred though if not a few thousand.

kirab4y ago

SQLite does really well for locking writes as well as soon as you activate the WAL.

powersurge3604y ago

Check out alpinejs or stimulusjs and combine it with htmx to get to a SPA like experience with very little additional complexity! Htmx let’s you serve partials over the wire instead of a page load so you can update the page incrementally and alpine and stimulus are both tools to add JS sprinkles like you’ve described in a way that is unobtrusive.

uuyi4y ago

I appreciate the notion but my objective was to do the exact opposite of this and keep away from external dependencies and scripts where possible, apart from the solitary go-sqlite3.

The result is about 30K of source (including Go, CSS, HTML templates) which is less than minified alpinejs!

mbrodersen4y ago

Adding dependencies is the opposite of simple.

smm114y ago

This. Not that I'm all about janky, but my road is littered with stuff I didn't think would make it through summer, and everything I check is still ticking 5, 7, 10 years later.

LONG ago I was amused by a Sun box in a closet that nobody knew anything about. I heard about the serial label printer that stopped working eight months ago, which was eight months after I shut off the Sun. I brought it back up again late one Friday, and the old/broken label printer magically worked again.

Now my stuff is that.

uxamanda4y ago

Ha, now you can use the label printer to label the machine!

derefr4y ago

How do you feel about SQLite? Because when I read this architecture description, it mostly vibes with me, until I think about what happens to the data in the event of a power cut.

Is there logic in your app to potentially throw the last line away (incl. truncating it from the file) if it's invalid due to being the result of a non-atomic write? If so, seems a bit Not-Invented-Here compared to just using a (runtime-embedded) library that does that for you :)

weaksauce4y ago

> SQLite responds gracefully to memory allocation failures and disk I/O errors. Transactions are ACID even if interrupted by system crashes or power failures. All of this is verified by the automated tests using special test harnesses which simulate system failures.

from the sqlite about page. it's one of the most bullet proof and hardened pieces of software out there I think. it was made basically for exactly the ops use case of a file on disk. but who knows what the use case is for them. maybe writes to the db are few and far between so it's a fairly moot point.

pavlov4y ago

The AOF reader will discard and emit a warning about lines that can’t be parsed, but that’s the extent of it.

These apps are on Digital Ocean, and I don’t remember ever having unplanned downtime with them. They do sometimes migrate instances with advance notice, but that’s a clean shutdown.

I’m sure SQLite is a better choice for almost any app. My reason for not using it was to try avoiding dependencies out of curiosity, and also that I honestly really don’t like writing SQL — it just feels boring and error-prone. (Like eating celery, I know objectively it’s good for me.)

derefr4y ago

> My reason for not using it was to try avoiding dependencies out of curiosity, and also that I honestly really don’t like writing SQL — it just feels boring and error-prone.

Well, sure, but you can just read and write JSON blobs to a single-column table in SQLite. See also, "SQLite makes a better BLOB store than the filesystem does" (https://www.sqlite.org/fasterthanfs.html)

eternityforest4y ago

SQLite doesn't do "lines" like a text file, or truncate anything(Or, it probably does internally, but that's not how users think of it).

It's a real SQL DB with real records and transactions, and it is one of the most trusted and reliable pieces of software ever made. Like, check out the change log to get a sense of how they do stuff.

derefr4y ago

I think you misread what I said, or maybe didn't read the post I was replying to. I was pointing out that flat files have a problem with truncation; and that SQLite, despite being a very different thing than an append-only text file, can be effectively used as "an append only text file, but ACID." My question on how the GP poster feels about SQLite, is down to SQLite being the obvious solution to a problem I wasn't sure they realized they had — but also potentially still being "too many dependencies" for them.

justinpombrio4y ago

If you're not writing data very often, this isn't a concern. For example, if you assume 5 microseconds to write a line, and 1 line written per hour on average, then the chance that the power goes out while you're writing a line is 10^-9, i.e. will never happen.

ammanley4y ago

Reminds me a lot of this (first paragraph): https://litestream.io/blog/why-i-built-litestream/

Well done on building an easy-to-maintain single node app with few dependencies. You would be the SWE I would send prayers of thanks too after onboarding (and for not making me crawl through a massive Helm chart/CloudFormation template hell).

spurgu4y ago

I have literal dozens of these kinds (Node+Express under PM2) of small apps running around everywhere in production (almost 8 years = almost a decade ;). Using SQLite (when you need an actual DB) makes things a lot easier in this regard as well.

I've tried doing some thing in Python (my initial programming love) over the years but I keep going insane because every time I'm forced to read up on the state of ecosystem (choosing versions and package managers and whatnot) and it drives me insane. I just install Node+Express and can get to work immediately (and finish quickly).

mbrodersen4y ago

I suspect that 95% of business applications could be implemented just fine with that architecture. However I would use SQLite instead of a plain file. Just for added commit safety.

kragen4y ago

For simple write patterns like this, I've had better robustness experiences with dumb filesystem access than with SQLite.

deterministic4y ago

That makes sense.

epolanski4y ago

I often rewrite in my free time what I do at work without dependencies and I'm often amazed at how far and faster you can move.

sam0x174y ago

I did a very similar setup to this as a boy using a PHP file that re-wrote itself as a sort of key-based data store. This was fantastic on free shared web hosting sites since no free db options were available. This was circa 2005.

mftb4y ago

I do almost this exact thing for all my personal stuff. I have 5 or 6 going in a vm for simple things like my bookmarks, etc... works great. I could definitely see it solving many small business use-cases.

anandh074y ago

Totally agree with simple architecture for simple and small use cases. Just curious to know, are you running any small businesses this way?

lifefeed4y ago

I was interviewing for software jobs recently, and while I was studying up on the "system design" portion I kept circling around the same insight that Dan Luu writes about so well here.

I would sit down at an interview and try to create these "proper" system designs with boxes and arrows and failovers and caches and well tuned databases. But in the back of my mind I kept thinking, "didn't Facebook scale to a billion users with PHP, MySQL, and Memcache?"

It reminds me of "Command-line Tools can be 235x Faster than your Hadoop Cluster" at https://adamdrake.com/command-line-tools-can-be-235x-faster-... , and the occasional post by https://rachelbythebay.com/w/ where she builds a box that's just fast and with very basic tooling (and a lot of know-how).

hinkley4y ago

Before 'eventual consistency' was coined as a phrase, there was an old, powerful and deeply unsexy form of eventual consistency called "batch processing".

For small batches you do an interval at a convenient time, such as a time of day where the hardware is undersubscribed for some other task. As the batches grow then you have an online system that continues to work until you get rather far down the list of consequences in queuing theory. Once you get 24 hours and 1 minute of tasks per day you never catch up (it never ceases to amaze me how often I can find someone who will fight me on this point), and you must be aware that substantially before that breaking point, you can experience rather long average queuing delays.

But if your workload is spiky, you can smear 250 minutes of peak traffic out over 6-18 hours with no problems at all. You need a safe place to stash the queue and a little sophistication around recovering from failures/upgrades. Those aren't necessarily Simple tools, but if that's the most complex part of your system you're doing pretty okay.

mapme4y ago

I think that’s a large oversimplification of Facebook. While it’s true a lot of FB storage is MySQL backed they also created many complex systems such as:

- Cassandra (based on dynamo/big table)

- wrote a custom KV store named RocksDb that is open source/now a company

- wrote a custom photos storage system that replaced an NFS based design

- wrote another custom binary object store

- wrote a custom geo distributed graph db (Tao)

- wrote an in house distributed FS replacement for HDFS

https://www.cs.cornell.edu/projects/ladis2009/papers/lakshma...

https://www.usenix.org/legacy/event/osdi10/tech/full_papers/...

https://www.usenix.org/system/files/conference/osdi14/osdi14...

https://www.usenix.org/system/files/conference/atc13/atc13-b...

https://www.cs.princeton.edu/~wlloyd/papers/tectonic-fast21....

https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A...

kragen4y ago

I think all of this was after their first billion users?

dismantlethesun4y ago

Cassandra was released in 2008. Facebook hit 1 billion users ~ 2012, and 2 billion ~ 2017. Back when Cassandra was released, they had a 'mere' 100 million users.

Facebook stats:

https://www.statista.com/statistics/264810/number-of-monthly...

1 more reply

stackbutterflow4y ago

Some people say that Leetcode is nothing more than rote learning. Some disagree and I disagree. There's a minimal amount of rote learning that helps but you need more than that. On the other hand "system design" interviews are...strange? You, someone who never built something at the scale of these giant tech companies, are being asked to come up on the spot with a design for a system that'd scale to billions of users. There's a 100% chance that if you were to attempt building such systems and you'd never done it before, you'd discover holes in your original design left and right. Leetcode tests your logic, how you think, your IQ maybe. But system design interviews consists in regurgitating knowledge that you've crammed in your head by studying books/articles/videos on system design. It's closer to reciting poetry than problem solving. In my experience it could be replaced with multiple choice questions and you'd get the same result.

What am I missing?

mgraczyk4y ago

I ask a lot of system design interviews based on Facebook's products. What I look for is the ability to propose something simple, proposing the right metrics and data collection to understand scaling needs, then making reasonable guesses about which parts of the system need to scale. PHP + MySQL + Memcache is great until you also need to do ML inference (high CPU/GPU load), need to store user-uploaded video, or want to stream new content to users in near realtime (live comments).

They key is to add the minimum amount of "stuff" to a simple design to convincingly scale for some new hypothetical need.

rr8084y ago

Yeah I just did an interview where my design was a database, a few lambdas and a webserver and after I was thinking they must think I dont know much, I should have beefed it up a bit.

barbazoo4y ago

It's important to justify the design you come up with. Explain why the design is simple, pros and cons and when you'd opt for a more complicated one to solve which particular issue.

quirino4y ago

I imagine this is one of the posts the comment is talking about https://rachelbythebay.com/w/2020/05/07/serv/

bob10294y ago

I think the biggest problem for most developers is not understanding what one computer can actually do and how reliable they are in practice.

Additionally, understanding of how tolerant 99% of businesses are to real-world problems that could hypothetically arise can help one not frustrate over insane edge case circumstances. I suspect a non-zero number of us have spent time thinking about how we could provide deterministic guarantees of uptime that even unstoppable cosmic radiation or regional nuclear war couldnt interrupt.

I genuinely hope that the recent reliability issues with cloud & SAAS providers has really driven home the point that a little bit of downtime is almost never a fatal issue for a business.

"Failover requires manual intervention" is a feature, not a caveat.

nomemory4y ago

Some people don't even realise how much traffic a simple web app with server side rendering (decently written), hosted on an average dedicated server can hold... They dont need cloud, autoscaling, microservices, kafka, event driven architectures, etc.

We've lost our way in the masked marketing the cloud providers are creating to help us solve problems we will never encounter, unless we are building the next Netflix or Facebook.

bob10294y ago

If you want to get an idea of where things are at right now, this is a good place to start looking:

https://www.techempower.com/benchmarks

If you just need plaintext services, something like ~7 million requests per second is feasible at the moment.

By being clever with threading primitives, you can preserve that HTTP framework performance down through your business logic and persistence layers too.

KronisLV4y ago

Their benchmarks are also really cool because you can choose to filter down technologies by what you personally know or just want to compare, for example: https://www.techempower.com/benchmarks/#section=data-r20&hw=...

Thus, in my case those numbers might be closer to the following:

  - plaintext: up to 2'500'000 requests per second, most technologies go up to around 500'000
  - data updates: up to 14'000 requests per second (20 updates per request, so 280'000 updates per second)
  - fortunes: up to 300'000 requests per second (full CRUD and sorting)
  - multiple queries: up to 32'000 requests per second (20 queries per request, so 640'000 queries)
  - single query: up to 530'000 requests per second, most technologies go up to around 100'000
  - JSON serialization: up to 970'000 requests per second, most technologies go up to around 200'000

Of course, their setup also plays a part, since the VPSes that i'd go for probably wouldn't be comparable to a Dell R440 Xeon Gold.

It's really nice to have this data, but the code that's written also plays a really big factor - i've seen people who write code with N+1 problems in it and call ORMs in loops and adamantly defend that choice because "such code is easier to reason about" instead of a simple DB view that would be 20-100x faster. With such code, it'd be closer to the "multiple queries" test.

Then again, these tests basically tell you that in 90% of the cases you should go for Java or .NET, abandoning Python, PHP and Ruby for them (though one could also introduce Rust into the mix and say the same), which realistically won't happen and people will use whatever technologies and practices that they feel comfortable with.

I've seen applications that work fine with hundreds of thousands of page loads per minute (multiple requests per load) and i've seen systems that roll over and die with 100 concurrent users, lots of variety out there.

trasz4y ago

Also, those complicated architectures are often quite unreliable anyway - just in ways that don't show in metrics. Slack comes to mind: not only its functionality is poor compared to eg IRC, but it fails in hilarious ways, eg showing duplicated messages, or not showing them at all. Another example is YouTube - the iOS app gets confused when displaying an ad, which results in starting the playback at a wrong time offset. I guess it's because companies like those don't care about actual reliability - what they do care about is availability.

snvzz4y ago

Slack comically uses gigabytes of RAM and plenty of CPU time in the client side.

WJW4y ago

Wtf are you doing with it? My slack instance (on linux) is resting around 300 MB resident set size and 0% cpu. 300 MB is still a lot for a chat app, but it is definitely not gigabytes.

1 more reply

ChrisMarshallNY4y ago

Obligatory I Am Developer toon: https://twitter.com/iamdevloper/status/1072503943790497798/p...

1 more reply

musicale4y ago

It's a nice demonstration of the efficiency of web apps vs. native apps.

1 more reply

joshlemer4y ago

How could you say that Slack has poor functionality compared to IRC?

exfascist4y ago

When you type something into IRC that message shows up in the log and every online users client pretty reliably. Furthermore the high degree of diversity among clients provides a pretty extreme amount of client side functionality that Slack completely lacks (scripting is a huge one.)

2 more replies

lkxijlewlf4y ago

> I think the biggest problem for most developers is...

... reading blogs and such where some loud mouth is telling them about so called "best practices" and so they bring that back to work with them.

There are not enough loud mouths telling people to keep it simple (until you can't or know better).

_jal4y ago

Past the proof of concept, "developers" should frankly not be making these decisions. People who understand systems and failure analysis should be. You might have devs with that experience, but they're comparatively rare.

As far as complexity... if you get big enough, you can't avoid it. My meta-rule is to only accept additional complexity if solving the issue some other way is impractical.

It is almost always far, far easier to add additional moving parts to your production environment than it is to remove them after they're in use.

cameronh904y ago

These requirements don't come out of nowhere. Normally they come from:

1. CEOs/whoever that don't listen to how much additional complexity it is to build a system with extremely high uptime and demand it anyway.

2. Developers with past experience that systems going down means they get called in the middle of the night.

3. Industry expectations. Even if you're a small finance company where all your clients are 9-5 and you could go down for hours without any adverse impacts, regulators will still want to see your triple redundant, automated monitoring, high uptime, geographically distributed, tested fault tolerant systems. Clients will want to see it. Investors will check for it when they do due diligence.

Look at how developers build things for their own personal projects and you'll see that quite often they're just held together with duct tape running on a single DO instance. The difference is, if something goes wrong, nobody is going to be breathing down their neck about it and nobody is getting fired.

eternityforest4y ago

If the additional complexity is just "Use this premade thing" and it only adds a half hour here and there of work, while also giving you essentially a premade and pre-documented workflow than new people will instantly know(Whatever your "bloated" tool tells you to do), then it might be a net win anyway.

If the extra complexity is microservices and containers you might have an issue, but microservices are kind of a UNIX philosophy derivative, I'm not sure the complexity is really intentionally added(Like when someone uses an SPA framework or something), it just kind of shows up by itself when you pile on thousands of separate simple things without really realizing the big picture is a nightmare.

mwcampbell4y ago

> "Failover requires manual intervention" is a feature, not a caveat.

I was scarred by the DDoS of Linode on Christmas Day 2015 (as a Linode customer at the time). I believe that was the only time my Christmas was ever interrupted by work. Of course, one might respond that being the one perpetually on-call sysadmin isn't ideal.

ilovecaching4y ago

The vast, vast, vast majority of organizations don't need micro services, don't need half of the products they bought and now have to integrate into their stack, and are simply looking to shave their yak to meet the bullet list of "best practices" for year 202X. Service oriented architectures and micro services solve a particular problem for companies that are operating on a massive scale and can invest (read waste money) on teams devoted to tooling. What most companies should do is build a monolith that makes money, but hire good software engineers that can write packages/modules whatever with high levels of cohesion and loose coupling, so that one day when you become the next Google, it will be less of a pain to break it into services. But in the end it really doesn't matter if it's painful anyway, because you'll have the money to hire an army of people to do it while the original engineers take their stock and head off to early retirement.

danielvaughn4y ago

I'd never worked with micro-services before this latest freelance project. I start working with this platform that is basically "note taking but with a bit of AI/ML". So okay, a bit of complexity with the ML stuff, but otherwise a standard CRUD app.

The application itself is a total of 3 pages, encompassing maybe 20 endpoints at the most, with about 100 daily active users. For the backend, some genius decided to build a massive kubernetes stack with 74 unique services, which has been costing said company over $1K/month just in infra costs. It took me literally weeks to get comfortable working on the backend, and so much stuff has broken that I have no idea how to fix.

Not only that, but the company has never had more than 1 engineer working on it at a time (they're very small even though they've been around a bit). If there were such a thing as developer malpractice, I'd sue whoever built it.

midrus4y ago

This is so common, sadly. I've seen this happen a lot. And those geniuses advocating for these insane infrastructure usually leave the company after they scratched their itch with kubernetes or whatever they were interested in playing with I'm s strong believer that your first few engineers are very very, very important and you need to hire very experienced people.

In the cases I've seen this, honestly I think the ones to blame where the stakeholders, for hiring very young people, for the cheapest rate they could and giving them full responsibility and the Senior Architecure Something Something title when those people don't have more than a couple years experience and are just building what they read in a blog two weeks ago.

DerArzt4y ago

> 3 pages .... 74 unique services

Just, wat.

Sounds like the architect was doing some resume driven development cause damn.

danielvaughn4y ago

In all honesty I'm very angry about it. It was built a while ago, and the dev is no longer here, but I almost want to track him down and make him help fix it. This founder isn't technical, so he's been leaning on developers for guidance, and this guy basically built him a skyscraper when what he really needed was a shed. It hurts to think about all the time and money that he's poured into just maintaining it. Crazy.

1 more reply

Calamitous4y ago

Heh, that beats the ratio at my current job: 7 pages and ~45 unique services.

ammanley4y ago

_74_ services on a k8 stack for 3 pages and 100 active daily users?????

This has to be a crime.

danielvaughn4y ago

Ok I feel very validated now - I'm not used to microservices so didn't know what was typical. It felt crazy, so good to know based on this comment's responses that it is indeed crazy.

For example, in order to sign up a user...the client hits the /signup endpoint, which first lands on the server-gateway service. Then that is passed along to an account-service which creates the user. Then the accounts-service hits a NATS messaging service twice - one message to send a verification email, and another to create a subscription. The messaging service passes the first message along to the verification-service, which sends out a sendgrid email. Then the second message gets passed along to a subscription-service-worker. The subscription-service-worker adds a job to a queue, which when it gets processed, hits the actual subscription-service, which sends along a request to Stripe to create the customer record and trial subscription.

6 services in order to sign up a user, in what could have been done with about 100-300 lines of Node.

1 more reply

ryanbrunner4y ago

I think especially for small teams starting out, complex architecture can be a huge trap.

Our architecture is extremely simple and boring - it would probably be more-or-less recognizable to someone from 2010 - a single Rails MVC app, 95+% server-rendered HTML, really only a smattering of Javascript (some past devs did some stuff with Redshift for certain data that was a bad call - we're in the process of ripping that out and going back to good old Postgres)

Our users seem to like it though, and talk about how easy it is to get set up. Looking at the site, the interactions aren't all that different from what we would build if we were using a SPA. But we're just 2 developers at the moment, and we can move faster than much larger teams just because there's less stuff to contend with.

woah4y ago

That doesn't sound like it's really any simpler than a json API server (written in node, python, go, or anything else), and a SPA. Maybe the lesson is "build with what you know if you want to go fast".

ryanbrunner4y ago

In my experience SPAs bring a lot of headaches that you just don't really need to think about with traditional HTML. Browser navigation, form handling, a lot of accessibility stuff comes out of the box for free, and there's one source of truth about what makes a particular object valid or how business logic works (which is solvable in the SPA world but brings a lot of complexity when you need to share logic between the client and the server, especially when they're in different languages).

Frankly out of all the things that make our architecture simple and efficient, I would say server rendered HTML is by far the biggest one.

woah4y ago

Probably depends on the requirements. If the product should basically feel like a static web page, and you are OK making design and product decisions that work easily in that paradigm, then a server side framework built to make static web pages is going to be simpler.

If you have product or design requirements that it should feel more dynamic like a native app, then trying to patch that on top of a static webpage might get messy.

1 more reply

theonething4y ago

In my experience, adding SPAness doubles the complexity of your application. Now you're maintaining and synchronizing the same state in two places and adding extra code in a different language (if you're not using JS on the backend).

ggpsv4y ago

There is some overhead in using SPAs when your application could have been built in the way that the parent comment suggests.

Some front-end frameworks are closing this gap but I wouldn't necessary say they're equally a simple. See https://macwright.com/2020/05/10/spa-fatigue.html

In other words, choose the right tool for the job.

taeric4y ago

My favorite trap in all of this, is that this thinking will fail most tech interviews. It is incredibly frustrating.

winrid4y ago

Yep. Failed an interview because I used EJS (SSR) and Node to build a simple Twitter in 30mins. The interviewer saw that it was three files and did not seem impressed.

I guess they wanted me to use lots of little components in an SPA which I did in my day job, but it didn't seem nessisary for the task...

syngrog664y ago

3 files? "Luxury!"

I could implement a Twitter in 1 Python or Go file, hosted on 1 machine

granted its concurrent user capacity and traffic load capacity would be insufficient for actual Twitter. but all the basics would work, in the small

ta9884y ago

I'm sure there are horrible kubernetes deploys out here that you can deploy with a good ol' curl | sudo

danrocks4y ago

You had a machine?

LUXURY!

bob10294y ago

Trap or integrated win-win?

We use one of these "aggressively simple" architectures too. At this point, I would quit my job instantaneously if I had to even look at k8s or whatever the cool kids are using these days.

WrtCdEvrydy4y ago

> look at k8s or whatever the cool kids are using these days.

I'm fine with complex architecture and would actually welcome someone choosing something complex but the issue is that we have perverse incentives at work to introduce stuff just to pad our resume.

Kubernetes was designed for companies deploying thousands of small APIs/applications where management is a burden. I've seen companies that deploy 3 APIs running Kubernetes and having issues...

adra4y ago

Man, kubernetes is so much easier than the smattering of crap that you have to jungle together before it. Puppet and co? No thanks. Terraform? It's fine, but only a part of a CI/CD picture. If you think the alternatives are better, I really have to wonder how much of the trenches crap that people in your org deal with regularly that you're insulated from. That, or you're a release-quarterly kinda company?

hajhatten4y ago

We're using Nomad + Consul + a custom little cli and I would never go back to K8s from this.

Not a yaml document in sight.

1 more reply

throw0101a4y ago

> Puppet and co? No thanks.

Puppet? Luxury. I started my configuration management journey with cfengine. And the folks that I first heard CM about started with Makefiles:

* http://www.infrastructures.org/papers/bootstrap/bootstrap.ht...

* https://www.usenix.org/legacy/publications/library/proceedin...

bob10294y ago

We wrote our own tools for most things. Our build is a single dotnet publish command, followed by copying the output to an S3 bucket for final consumption.

That output is 100% of what you need to run our entire product stack on a blank vm.

Monolithic pays for itself in so many ways. Sqlite and other in-process database solutions are a major factor in our strategy.

mkl954y ago

I guess the only thing you can do is avoid those places. Last time I checked Wave were on a hiring spree.

wanda4y ago

I think that probably says more about the tech companies than anything else.

aidos4y ago

In terms of the choices they're unsure about; I'd say it's best to stay away from Celery / RabbitMQ if you don't really need it. For us just using RQ (Redis backed queue) has been a lot less hassle. Obviously it's all going to depend on your scale, but it's a lot simpler.

RE the sqlalchemy concern; you do need to decide on where your transactions are going to be managed from and have a strict rule about not allow functions to commit / rollback themselves. Personally I think that sqla is a great tool, it saves a lot of boilerplate code (and data modelling and migrations are a breeze).

But overall the sentiments in this article resonate with my experience.

wackget4y ago

I don't know how the author can claim that they run a "simple" architecture.

From their job pages:

Our stack :

    backend: Python 3 (+ mypy)
    API layer: GraphQL
    android frontend: Kotlin/Jetpack
    iOS frontend: Swift/SwiftUI
    web frontend: TypeScript/React
    database: Postgres
    infrastructure: GCP / Terraform
    orchestration: Kubernetes

That is not simple by any stretch of the imagination.

rendall4y ago

How would you simplify this?

nunez4y ago

for starters i wouldnt use kubernetes. love the system, but boy is it complicated. i'd use a few cloud function or stick them in VMs behind a load balancer and call it good.

polio4y ago

> As for Kubernetes, we use Kubernetes because knew that, if the business was successful (which it has been) and we kept expanding, we’d eventually expand to countries that require us operate our services in country. The exact regulations vary by country, but we’re already expanding into one major African market that requires we operate our “primary datacenter” in the country and there are others with regulations that, e.g., require us to be able to fail over to a datacenter in the country.

scrubs4y ago

Nah, I don't much like the tone of this article. Not at all.

The engineering message should be: keep your architecture as simple as possible. And here are some ways (to follow) on how to find that minimal and complete size 2 outfit foundation in your size 10 hoarder-track-suite-eye-sore.

Do we really need to be preached at with a warmed over redo of `X' cut it for me as a kid so I really don't know why all the kids think their new fangled Y is better? No we don't.

If you have stateless share nothing events your architecture should be simple. Should or could you have stateless share nothing even if that's not what you have today? That's where we need to be weighing in.

Summary: less old guy whining/showing-off and more education. Thanks. From the Breakfast club kids.

rendall4y ago

I'm not sure why this particular author is so popular on HN, but he hits the front page regularly.

kragen4y ago

Because he says things that are true and less well known than they should be, and then gives a clearly written argument that shows you that they are true using logic and empirical evidence.

So far that just describes any good science paper or math textbook; the difference is that he's writing about questions of great interest to HN, like in this case "how to build a web service", "how to do software version tracking" (https://danluu.com/monorepo/), "how to do statistics" (https://danluu.com/linear-hammer/), "why hardware development is hard" (https://danluu.com/why-hardware-development-is-hard/), "why everything is broken" (https://danluu.com/nothing-works/), and "how to hire talented people" (https://danluu.com/talent/). These are topics where there is an enormous amount of hot air out there on the web, but very little that is epistemologically justifiable.

rendall4y ago

I don't disagree with you, but other people write on these topics more compellingly, and do not include off-putting Wolfram/Doctorow-style self regard. I chalk it up to his being astoundingly prolific.

https://news.ycombinator.com/from?site=danluu.com

1 more reply

gherkinnn4y ago

At the risk of making an ad-hominem attack, I found this website unreadable.

Minimalism is fine. But there comes a point when there's so little, it is nothing. danluu.com is a bucket of sand facing an overbuilt cathedral.

dan-robertson4y ago

You can read the same content here: https://www.wave.com/en/blog/simple-architecture/

Beltalowda4y ago

I set a user style in Stylus for danluu.com:

  body { font: 16px/1.6em sans-serif; max-width: 50em; margin: auto; }

Can even add it manually in the inspector if you want.

throwaquestion54y ago

Thanks for sharing. Personally I also add background-color: #edd1b0; for any site I plan on read more than five minutes. For me is more pleasant to read compared with than a white background.

mpyne4y ago

I do something similar, except it's the CSS from whatever of the 'Better Fucking Website' hits come up for me on search each time.

I do wish I didn't have to do that though.

Beltalowda4y ago

I find it's a lot easier to add a simple CSS to an understyled website, than remove huge fixed banners, weird low-contrast thin fonts, etc. from overstyled websites.

chubot4y ago

Reader mode in your browser goes a long way, I think they all have it now

hu34y ago

Thanks. In Chrome I had to enable reader mode in chrome://flags

aidos4y ago

Ah thanks. I use it on mobile and did a cursory look for it on Chrome last night without success.

surfer78374y ago

Just boils down to not optimising until you need to. Start with a 3 tier web app (unless your requirements lead you to another solution), then start with read replicas, load balancing, sharding, redis/RabbitMQ etc

zrail4y ago

Realistically almost every web app can start as a one-tier web app that uses SQLite as a data store and serves mostly HTML.

a9h74j4y ago

I have a dumb question ...

In almost all performance areas -- gaming, PCs, autos, etc -- there are usually whole publications dedicated to performing benchmarks and publishing those results.

Are there any publications or sites which implement a few basic applications against various new-this-season "full stacks" or whatnot, and document performance numbers and limit-thresholds on different hardware?

Likewise, there must be stress-test frameworks out there. Are there stress-test and scalability-test third-party services?

zie4y ago

Fossil SCM is a great example of a sqlite application that has stood the test of time. I don't know what sqlite.org's traffic is like, but it's not tiny and it runs on a tiny VPS without issue(and has for years now).

SpikeMeister4y ago

TechEmpower has benchmarks for different web stacks: https://www.techempower.com/benchmarks/

1 more reply

reggieband4y ago

I understand his point but I actually think micro-services can be simpler than monoliths.

Even for his architecture, it sounds like they have an API service, a queue and some worker processes. And they already have kubernetes which means they must be wrapping all of that in docker. It seems like a no-brainer to me to at least separate out the code for the API service from the workers so that they can scale independently. And depending on the kind of work the workers are doing you might separate those out into a few separate code bases. Or not, I've had success on multiple projects where all jobs are handled by a set of workers that have a massive `switch` statement on a `jobType` field.

I think there is some middle ground between micro-services and monoliths where the vast majority of us live. And in our minds we're creating these straw-man arguments against architectures that rarely exist. Like a literal single app running on a single machine vs. a hundred independent micro-services stitched together with ad-hoc protocols. Micro-services vs. monoliths is actually a gradient where we rarely exist at either ludicrous extreme.

deterministic4y ago

It is pretty much impossible for a micro-service architecture to be simpler than a well designed monolith. To create a micro-service architecture from a well designed monolith you need to take the N libraries the monolith is built from and add protocols/serialisation/deployment etc. to each library. Each of which adds new distributed failure scenarios you now have to test/handle.

lliamander4y ago

So, I definitely agree with this. Most of us don't have to do any thing at FAANG scale. But what counts as simple?

It's quite easy these days to deploy an app using AWS Lambda, DynamoDB, SNS, etc., all with a single Cloud Formation template. Is that simple? In one sense I've abstracted away a lot of the operational work that comes with self-hosted, but now I've intertwined (Rich Hickey might say complected) myself into Amazon's ecosystem.

Also, is a document store like DynamoDB, MongoDB, etc., simpler than a relational database like Postgres? On the one hand, a document database's interface is very simple compared to the complexity like SQL. On the other, that simplicity is generally considered a necessary sacrifice to scale. If you don't need to scale, why make the sacrifice?

Also, there can be simple things that are better at scaling. Elixir is a very nice scripting language like Ruby or Python, but it also has much better performance scaling (comparable with NodeJS or Go).

calpaterson4y ago

> GraphQL libraries weren’t great when we adopted GraphQL (the base Python library was a port of the Javascript one so not Pythonic, Graphene required a lot of boilerplate, Apollo-Android produced very poorly optimized code)

What do people use instead of Graphene? Strawberry?

fernandogrd4y ago

There is also ariadne

aadvark694y ago

Simple architectures work well, until they don't. A good example is ye olde ruby on rails monolith. Dead simple to set up and iterate quickly, but once you reach a certain organization and/or codebase size, velocity starts to degrade exponentially

hw4y ago

Tell that to Github, Shopify, Gitlab, Zendesk etc that have been doing fine.

How often do you hit ‘that certain size’ when velocity starts to degrade anyway?

X works well until it doesnt… is not exactly a compelling argument. That can be said of simple and complex architectures, or just anything at all

deterministic4y ago

Complex architectures work well, until they don’t. Fixing complex architectures is much harder than fixing simple architectures. So would you prefer a simple architecture or a complex one? The answer should be obvious.

endisneigh4y ago

How far can you get with a single Postgres instance on a single machine? I know things like cockroach and citus existence but generally Postgres isn’t sharded as far as I know.

dan-robertson4y ago

You can scale up that one machine a lot. If you start with a normal sized machine you have a lot of overhead in increasing ram/cpu on that machine (eg you could start with say 16 cores and 100G ram or less and scale up to like 2TB ram and 64/128 cores). There’s also runway for scaling things by eg shooting down certain long-running queries that cause performance problems or setting up read replicas.

So even if you’re a bit worried about scaling it, you can at least feel the problems are far away enough that you shouldn’t care until later.

ta9884y ago

We are serving a several tens of TB database with tens of thousands of daily user doing very heavy queries on a single machine that did cost us $15k a few years ago (we have fallbacks and replication and whatnot don't worry). The same machine also has java services. You can really do a lot on today's machines.

zozbot2344y ago

Postgres supports sharding out of the box. The documentation tells you how to do it, using foreign data wrapper and table partitioning.

zie4y ago

Pretty far!

endisneigh4y ago

How far was exactly? Like tps for reads and writes with what specs?

I’ve been looking for real world performance.

bpicolo4y ago

50k-100k reads per second was pretty doable on mysql even back in 2014-2015 era.

You can get a 60TB NVMe instance with 96 cores these days - https://aws.amazon.com/ec2/instance-types/i3en/. Relational databases just scream on the dang things.

> 2 million random IOPS at 4 KB block sizes and up to 16 GB/s of sequential disk throughput

zie4y ago

That's complicated based on workload, etc. A single PG node will obviously never scale to Google or Facebook levels.

Attend a PG conference and you will run into plenty of people running PG with similar use cases(and maybe similar loads) to you.

I can say we run a few hundred concurrent users backed by PG on a small to medium sized VPS without issues. Our DB is in the 3 digit GB range on disk, but not yet TB range.

flakiness4y ago

Tangent: How do you read danluu.com? I'm using readermode.io and it's fine, but would love to hear what other people are using.

wackget4y ago

A web browser.

AceJohnny24y ago

> one major African market requires we operate our “primary datacenter” in the country

What country could that be? That sounds challenging.

dang4y ago

What's the year on this? anybody know?

Normally I check the Internet Archive, but https://web.archive.org/web/*/https://danluu.com/simple-arch....

Beltalowda4y ago

Based on Dan's Twitter, March 2022: https://twitter.com/danluu/status/1501644166983421953

That links to the original on wave.com, dated March 9th this year.

Jtsummers4y ago

The "previous" article at the bottom is the most recent article in his archive, which was apparently published in March 2022. So I'm guessing this year, and either this month or last month. But the archive doesn't seem to have been updated yet with this article.

dang4y ago

Ah ok - it's new then. Thanks to both of you!

bmn__4y ago

FYI, GP wrote "the archive" and meant the index page: https://danluu.com

Article's Last-Modified header says 2022-04-06.

sydthrowaway4y ago

What is Wave?

pphysch4y ago

African fintech backed by Stripe

jameshart4y ago

This doesn't sound very simple at all. It's a single codebase that handles all your mobile API interactions, authentication, account management, presumably usage tracking and notifications, and all your offline processing, all interacting with a single database and queue infrastructure? And that same codebase marshalls all that through a GraphQL API and implements a custom data protocol?

And you're calling that simple?

I've worked on monolithic codebases, and the one thing none of them have ever been is simple. They have complex interdependencies (oh hey, like database transaction scopes); they have that 'one weird way of doing things' that affects every part of the system (like, 'everything has to be available over GraphQL')...

mbrodersen4y ago

I have worked on massive well designed monoliths and they were so much easier to maintain than equivalent micro-service implementations would have been. Monoliths and Micro-services will be easy/hard/impossible to maintain depending on how well they are designed. Not depending on whether it is a monolith or micro-service architecture.

j / k navigate · click thread line to collapse

189 comments

pavlov4y ago

There are good reasons why we have containers and orchestration and stuff, but it's interesting to see how well this dumb single-process style works for apps that are genuinely simple.

danenania4y ago

Built-in first-class concurrency (ala node, golang, rust, etc.) is a huge win for simple architectures, since it lets you avoid adding a background queue, or at least delay it for a very long time.

1 - https://v2.envkey.com

gregmac4y ago

Interesting idea though, thanks.

danenania4y ago

In other words, a single-field indexed DB lookup can be treated more like a cache request. Though for heavier/un-indexed queries, your "avoid round trips to the database" advice certainly applies.

conradfr4y ago

> Built-in first-class concurrency (ala node, golang, rust, etc.) is a huge win for simple architectures, since it lets you avoid adding a background queue, or at least delay it for a very long time.

>For example: instead of memcache/redis, set aside ~100 MB of memory in your app process for an LRU cache.

Erlang/Elixir for the win with (almost transparent multi-core) concurrency and ETS ;)

danenania4y ago

Oh yeah, Erlang/Elixir certainly belong in that list (probably at the front of it).

dmw_ng4y ago

> database is an append-only file of JSON objects separated by newlines. When the app restarts, it reads the file and rebuilds its memory image. All data is in RAM

Beltalowda4y ago

brundolf4y ago

A normal Express app (assuming it's one process per JSON file) shouldn't have that problem, because JavaScript is single-threaded

3 more replies

endorphine4y ago

Doesn't the fact that its opened in append only mode (Linux) mitigate data races with regards to writes?

3 more replies

bob10294y ago

> The problem is always growth, either GC jank from a massive heap, running out of RAM, or those loops eventually catching up with you

Absolutely. The challenge is having enough faith that it will take long enough to catch up to you.

Statistically speaking, it won't catch up to you and if it does, it will take so long you should have seen it coming from miles away and had time to prepare.

aprdm4y ago

Could you expand what you mean by keeping pointers and basic index in memory?

1 more reply

pavlov4y ago

Yes, you need to be sure that you understand the growth pattern if you want to YOLO in RAM. If your product aims to be the next Instagram, this is clearly not the architecture.

jkaptur4y ago

> If your product aims to be the next Instagram, this is clearly not the architecture.

But maybe! https://instagram-engineering.com/dismissing-python-garbage-...

uuyi4y ago

I would love to see more stuff like that.

This whole concept could scale to tens of users fine for LOB applications and consume little memory or resources.

CraigJPerry4y ago

>> This whole concept could scale to tens of users

I strongly suspect this approach scales to tens of thousands of users. Maybe 30-40k users would be my guess on a garden variety intel i5 desktop from the past 3 years or so.

uuyi4y ago

Interesting point. I may have to write a performance test suite for it now and test this.

weaksauce4y ago

kirab4y ago

SQLite does really well for locking writes as well as soon as you activate the WAL.

powersurge3604y ago

uuyi4y ago

I appreciate the notion but my objective was to do the exact opposite of this and keep away from external dependencies and scripts where possible, apart from the solitary go-sqlite3.

The result is about 30K of source (including Go, CSS, HTML templates) which is less than minified alpinejs!

mbrodersen4y ago

Adding dependencies is the opposite of simple.

smm114y ago

This. Not that I'm all about janky, but my road is littered with stuff I didn't think would make it through summer, and everything I check is still ticking 5, 7, 10 years later.

Now my stuff is that.

uxamanda4y ago

Ha, now you can use the label printer to label the machine!

derefr4y ago

How do you feel about SQLite? Because when I read this architecture description, it mostly vibes with me, until I think about what happens to the data in the event of a power cut.

weaksauce4y ago

pavlov4y ago

The AOF reader will discard and emit a warning about lines that can’t be parsed, but that’s the extent of it.

These apps are on Digital Ocean, and I don’t remember ever having unplanned downtime with them. They do sometimes migrate instances with advance notice, but that’s a clean shutdown.

derefr4y ago

> My reason for not using it was to try avoiding dependencies out of curiosity, and also that I honestly really don’t like writing SQL — it just feels boring and error-prone.

eternityforest4y ago

SQLite doesn't do "lines" like a text file, or truncate anything(Or, it probably does internally, but that's not how users think of it).

It's a real SQL DB with real records and transactions, and it is one of the most trusted and reliable pieces of software ever made. Like, check out the change log to get a sense of how they do stuff.

derefr4y ago

justinpombrio4y ago

ammanley4y ago

Reminds me a lot of this (first paragraph): https://litestream.io/blog/why-i-built-litestream/

spurgu4y ago

mbrodersen4y ago

I suspect that 95% of business applications could be implemented just fine with that architecture. However I would use SQLite instead of a plain file. Just for added commit safety.

kragen4y ago

For simple write patterns like this, I've had better robustness experiences with dumb filesystem access than with SQLite.

deterministic4y ago

That makes sense.

epolanski4y ago

I often rewrite in my free time what I do at work without dependencies and I'm often amazed at how far and faster you can move.

sam0x174y ago

mftb4y ago

anandh074y ago

Totally agree with simple architecture for simple and small use cases. Just curious to know, are you running any small businesses this way?

lifefeed4y ago

I was interviewing for software jobs recently, and while I was studying up on the "system design" portion I kept circling around the same insight that Dan Luu writes about so well here.

hinkley4y ago

Before 'eventual consistency' was coined as a phrase, there was an old, powerful and deeply unsexy form of eventual consistency called "batch processing".

mapme4y ago

I think that’s a large oversimplification of Facebook. While it’s true a lot of FB storage is MySQL backed they also created many complex systems such as:

- Cassandra (based on dynamo/big table)

- wrote a custom KV store named RocksDb that is open source/now a company

- wrote a custom photos storage system that replaced an NFS based design

- wrote another custom binary object store

- wrote a custom geo distributed graph db (Tao)

- wrote an in house distributed FS replacement for HDFS

https://www.cs.cornell.edu/projects/ladis2009/papers/lakshma...

https://www.usenix.org/legacy/event/osdi10/tech/full_papers/...

https://www.usenix.org/system/files/conference/osdi14/osdi14...

https://www.usenix.org/system/files/conference/atc13/atc13-b...

https://www.cs.princeton.edu/~wlloyd/papers/tectonic-fast21....

https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A...

kragen4y ago

I think all of this was after their first billion users?

dismantlethesun4y ago

Cassandra was released in 2008. Facebook hit 1 billion users ~ 2012, and 2 billion ~ 2017. Back when Cassandra was released, they had a 'mere' 100 million users.

Facebook stats:

https://www.statista.com/statistics/264810/number-of-monthly...

1 more reply

stackbutterflow4y ago

What am I missing?

mgraczyk4y ago

They key is to add the minimum amount of "stuff" to a simple design to convincingly scale for some new hypothetical need.

rr8084y ago

Yeah I just did an interview where my design was a database, a few lambdas and a webserver and after I was thinking they must think I dont know much, I should have beefed it up a bit.

barbazoo4y ago

It's important to justify the design you come up with. Explain why the design is simple, pros and cons and when you'd opt for a more complicated one to solve which particular issue.

quirino4y ago

I imagine this is one of the posts the comment is talking about https://rachelbythebay.com/w/2020/05/07/serv/

bob10294y ago

I think the biggest problem for most developers is not understanding what one computer can actually do and how reliable they are in practice.

I genuinely hope that the recent reliability issues with cloud & SAAS providers has really driven home the point that a little bit of downtime is almost never a fatal issue for a business.

"Failover requires manual intervention" is a feature, not a caveat.

nomemory4y ago

We've lost our way in the masked marketing the cloud providers are creating to help us solve problems we will never encounter, unless we are building the next Netflix or Facebook.

bob10294y ago

If you want to get an idea of where things are at right now, this is a good place to start looking:

https://www.techempower.com/benchmarks

If you just need plaintext services, something like ~7 million requests per second is feasible at the moment.

By being clever with threading primitives, you can preserve that HTTP framework performance down through your business logic and persistence layers too.

KronisLV4y ago

Thus, in my case those numbers might be closer to the following:

  - plaintext: up to 2'500'000 requests per second, most technologies go up to around 500'000
  - data updates: up to 14'000 requests per second (20 updates per request, so 280'000 updates per second)
  - fortunes: up to 300'000 requests per second (full CRUD and sorting)
  - multiple queries: up to 32'000 requests per second (20 queries per request, so 640'000 queries)
  - single query: up to 530'000 requests per second, most technologies go up to around 100'000
  - JSON serialization: up to 970'000 requests per second, most technologies go up to around 200'000

Of course, their setup also plays a part, since the VPSes that i'd go for probably wouldn't be comparable to a Dell R440 Xeon Gold.

trasz4y ago

snvzz4y ago

Slack comically uses gigabytes of RAM and plenty of CPU time in the client side.

WJW4y ago

Wtf are you doing with it? My slack instance (on linux) is resting around 300 MB resident set size and 0% cpu. 300 MB is still a lot for a chat app, but it is definitely not gigabytes.

1 more reply

ChrisMarshallNY4y ago

Obligatory I Am Developer toon: https://twitter.com/iamdevloper/status/1072503943790497798/p...

1 more reply

musicale4y ago

It's a nice demonstration of the efficiency of web apps vs. native apps.

1 more reply

joshlemer4y ago

How could you say that Slack has poor functionality compared to IRC?

exfascist4y ago

2 more replies

lkxijlewlf4y ago

> I think the biggest problem for most developers is...

... reading blogs and such where some loud mouth is telling them about so called "best practices" and so they bring that back to work with them.

There are not enough loud mouths telling people to keep it simple (until you can't or know better).

_jal4y ago

As far as complexity... if you get big enough, you can't avoid it. My meta-rule is to only accept additional complexity if solving the issue some other way is impractical.

It is almost always far, far easier to add additional moving parts to your production environment than it is to remove them after they're in use.

cameronh904y ago

These requirements don't come out of nowhere. Normally they come from:

1. CEOs/whoever that don't listen to how much additional complexity it is to build a system with extremely high uptime and demand it anyway.

2. Developers with past experience that systems going down means they get called in the middle of the night.

eternityforest4y ago

mwcampbell4y ago

> "Failover requires manual intervention" is a feature, not a caveat.

ilovecaching4y ago

danielvaughn4y ago

midrus4y ago

DerArzt4y ago

> 3 pages .... 74 unique services

Just, wat.

Sounds like the architect was doing some resume driven development cause damn.

danielvaughn4y ago

1 more reply

Calamitous4y ago

Heh, that beats the ratio at my current job: 7 pages and ~45 unique services.

ammanley4y ago

_74_ services on a k8 stack for 3 pages and 100 active daily users?????

This has to be a crime.

danielvaughn4y ago

Ok I feel very validated now - I'm not used to microservices so didn't know what was typical. It felt crazy, so good to know based on this comment's responses that it is indeed crazy.

6 services in order to sign up a user, in what could have been done with about 100-300 lines of Node.

1 more reply

ryanbrunner4y ago

I think especially for small teams starting out, complex architecture can be a huge trap.

woah4y ago

ryanbrunner4y ago

Frankly out of all the things that make our architecture simple and efficient, I would say server rendered HTML is by far the biggest one.

woah4y ago

If you have product or design requirements that it should feel more dynamic like a native app, then trying to patch that on top of a static webpage might get messy.

1 more reply

theonething4y ago

ggpsv4y ago

There is some overhead in using SPAs when your application could have been built in the way that the parent comment suggests.

Some front-end frameworks are closing this gap but I wouldn't necessary say they're equally a simple. See https://macwright.com/2020/05/10/spa-fatigue.html

In other words, choose the right tool for the job.

taeric4y ago

My favorite trap in all of this, is that this thinking will fail most tech interviews. It is incredibly frustrating.

winrid4y ago

Yep. Failed an interview because I used EJS (SSR) and Node to build a simple Twitter in 30mins. The interviewer saw that it was three files and did not seem impressed.

I guess they wanted me to use lots of little components in an SPA which I did in my day job, but it didn't seem nessisary for the task...

syngrog664y ago

3 files? "Luxury!"

I could implement a Twitter in 1 Python or Go file, hosted on 1 machine

granted its concurrent user capacity and traffic load capacity would be insufficient for actual Twitter. but all the basics would work, in the small

ta9884y ago

I'm sure there are horrible kubernetes deploys out here that you can deploy with a good ol' curl | sudo

danrocks4y ago

You had a machine?

LUXURY!

bob10294y ago

Trap or integrated win-win?

We use one of these "aggressively simple" architectures too. At this point, I would quit my job instantaneously if I had to even look at k8s or whatever the cool kids are using these days.

WrtCdEvrydy4y ago

> look at k8s or whatever the cool kids are using these days.

I'm fine with complex architecture and would actually welcome someone choosing something complex but the issue is that we have perverse incentives at work to introduce stuff just to pad our resume.

Kubernetes was designed for companies deploying thousands of small APIs/applications where management is a burden. I've seen companies that deploy 3 APIs running Kubernetes and having issues...

adra4y ago

hajhatten4y ago

We're using Nomad + Consul + a custom little cli and I would never go back to K8s from this.

Not a yaml document in sight.

1 more reply

throw0101a4y ago

> Puppet and co? No thanks.

Puppet? Luxury. I started my configuration management journey with cfengine. And the folks that I first heard CM about started with Makefiles:

* http://www.infrastructures.org/papers/bootstrap/bootstrap.ht...

* https://www.usenix.org/legacy/publications/library/proceedin...

bob10294y ago

We wrote our own tools for most things. Our build is a single dotnet publish command, followed by copying the output to an S3 bucket for final consumption.

That output is 100% of what you need to run our entire product stack on a blank vm.

Monolithic pays for itself in so many ways. Sqlite and other in-process database solutions are a major factor in our strategy.

mkl954y ago

I guess the only thing you can do is avoid those places. Last time I checked Wave were on a hiring spree.

wanda4y ago

I think that probably says more about the tech companies than anything else.

aidos4y ago

But overall the sentiments in this article resonate with my experience.

wackget4y ago

I don't know how the author can claim that they run a "simple" architecture.

From their job pages:

Our stack :

    backend: Python 3 (+ mypy)
    API layer: GraphQL
    android frontend: Kotlin/Jetpack
    iOS frontend: Swift/SwiftUI
    web frontend: TypeScript/React
    database: Postgres
    infrastructure: GCP / Terraform
    orchestration: Kubernetes

That is not simple by any stretch of the imagination.

rendall4y ago

How would you simplify this?

nunez4y ago

for starters i wouldnt use kubernetes. love the system, but boy is it complicated. i'd use a few cloud function or stick them in VMs behind a load balancer and call it good.

polio4y ago

scrubs4y ago

Nah, I don't much like the tone of this article. Not at all.

Do we really need to be preached at with a warmed over redo of `X' cut it for me as a kid so I really don't know why all the kids think their new fangled Y is better? No we don't.

Summary: less old guy whining/showing-off and more education. Thanks. From the Breakfast club kids.

rendall4y ago

I'm not sure why this particular author is so popular on HN, but he hits the front page regularly.

kragen4y ago

Because he says things that are true and less well known than they should be, and then gives a clearly written argument that shows you that they are true using logic and empirical evidence.

rendall4y ago

https://news.ycombinator.com/from?site=danluu.com

1 more reply

gherkinnn4y ago

At the risk of making an ad-hominem attack, I found this website unreadable.

Minimalism is fine. But there comes a point when there's so little, it is nothing. danluu.com is a bucket of sand facing an overbuilt cathedral.

dan-robertson4y ago

You can read the same content here: https://www.wave.com/en/blog/simple-architecture/

Beltalowda4y ago

I set a user style in Stylus for danluu.com:

  body { font: 16px/1.6em sans-serif; max-width: 50em; margin: auto; }

Can even add it manually in the inspector if you want.

throwaquestion54y ago

Thanks for sharing. Personally I also add background-color: #edd1b0; for any site I plan on read more than five minutes. For me is more pleasant to read compared with than a white background.

mpyne4y ago

I do something similar, except it's the CSS from whatever of the 'Better Fucking Website' hits come up for me on search each time.

I do wish I didn't have to do that though.

Beltalowda4y ago

I find it's a lot easier to add a simple CSS to an understyled website, than remove huge fixed banners, weird low-contrast thin fonts, etc. from overstyled websites.

chubot4y ago

Reader mode in your browser goes a long way, I think they all have it now

hu34y ago

Thanks. In Chrome I had to enable reader mode in chrome://flags

aidos4y ago

Ah thanks. I use it on mobile and did a cursory look for it on Chrome last night without success.

surfer78374y ago

zrail4y ago

Realistically almost every web app can start as a one-tier web app that uses SQLite as a data store and serves mostly HTML.

a9h74j4y ago

I have a dumb question ...

In almost all performance areas -- gaming, PCs, autos, etc -- there are usually whole publications dedicated to performing benchmarks and publishing those results.

Likewise, there must be stress-test frameworks out there. Are there stress-test and scalability-test third-party services?

zie4y ago

SpikeMeister4y ago

TechEmpower has benchmarks for different web stacks: https://www.techempower.com/benchmarks/

1 more reply

reggieband4y ago

I understand his point but I actually think micro-services can be simpler than monoliths.

deterministic4y ago

lliamander4y ago

So, I definitely agree with this. Most of us don't have to do any thing at FAANG scale. But what counts as simple?

calpaterson4y ago

What do people use instead of Graphene? Strawberry?

fernandogrd4y ago

There is also ariadne

aadvark694y ago

hw4y ago

Tell that to Github, Shopify, Gitlab, Zendesk etc that have been doing fine.

How often do you hit ‘that certain size’ when velocity starts to degrade anyway?

X works well until it doesnt… is not exactly a compelling argument. That can be said of simple and complex architectures, or just anything at all

deterministic4y ago

endisneigh4y ago

How far can you get with a single Postgres instance on a single machine? I know things like cockroach and citus existence but generally Postgres isn’t sharded as far as I know.

dan-robertson4y ago

So even if you’re a bit worried about scaling it, you can at least feel the problems are far away enough that you shouldn’t care until later.

ta9884y ago

zozbot2344y ago

Postgres supports sharding out of the box. The documentation tells you how to do it, using foreign data wrapper and table partitioning.

zie4y ago

Pretty far!

endisneigh4y ago

How far was exactly? Like tps for reads and writes with what specs?

I’ve been looking for real world performance.

bpicolo4y ago

50k-100k reads per second was pretty doable on mysql even back in 2014-2015 era.

You can get a 60TB NVMe instance with 96 cores these days - https://aws.amazon.com/ec2/instance-types/i3en/. Relational databases just scream on the dang things.

> 2 million random IOPS at 4 KB block sizes and up to 16 GB/s of sequential disk throughput

zie4y ago

That's complicated based on workload, etc. A single PG node will obviously never scale to Google or Facebook levels.

Attend a PG conference and you will run into plenty of people running PG with similar use cases(and maybe similar loads) to you.

I can say we run a few hundred concurrent users backed by PG on a small to medium sized VPS without issues. Our DB is in the 3 digit GB range on disk, but not yet TB range.

flakiness4y ago

Tangent: How do you read danluu.com? I'm using readermode.io and it's fine, but would love to hear what other people are using.

wackget4y ago

A web browser.

AceJohnny24y ago

> one major African market requires we operate our “primary datacenter” in the country

What country could that be? That sounds challenging.

dang4y ago

What's the year on this? anybody know?

Normally I check the Internet Archive, but https://web.archive.org/web/*/https://danluu.com/simple-arch....

Beltalowda4y ago

Based on Dan's Twitter, March 2022: https://twitter.com/danluu/status/1501644166983421953