Squeeze the hell out of the system you have (opens in new tab)

(blog.danslimmon.com)

687 pointssbmsr2y ago373 comments

373 comments

The bit on the database performance issues leads me to my hottest, flamiest take for new projects:

- Design your application's hot path to never use joins. Storage is cheap, denormalize everything and update it all in a transaction. It's truly amazing how much faster everything is when you eliminate joins. For your ad-hoc queries you can replicate to another database for analytical purposes.

On this note, I have mixed feelings about Amazon's DynamoDB, but one things about it is to use it properly you need to plan your use first, and schema second. I think there's something you can take from this even with a RDBMS.

In fact, I'd go as far to say as joins are unnecessary for nonanalytical purposes these days. Storage is so mind booglingly cheap and the major DBs have ACID properties. Just denormalize, forreal.

- Use something more akin to UUIDs to prevent hot partitions. They're not a silver bullet and have their own downsides, but you'll already be used to the consistently "OK" performance that can be horizontally scaled rather than the great performance of say integers that will fall apart eventually.

/hottakes

my sun level take would be also to just index all columns. but that'll have to wait for another day.

theptip2y ago

Honestly I couldn’t disagree more. I built a startup and paid little attention to perf for years 1-5, and finally in year 6 we started to get bitten by some perf issues in specific tables, and spent a few engineer-months optimizing.

In terms of tech debt it would have been way more expensive to make everything perform well from the start, we would have moved much slower and probably failed during a few crunch points.

Instead we paid probably a few $k/mo more than we really needed to on machines, and in return saved man-months of effort at a time when we couldn’t hire enough engineers and the opportunity cost for feature work was huge. (Keep in mind that making everything perform well would have required us to do 10-20x as much work, because we could not know ahead of time where the hot spots would be. Some were surprising.)

Joins may be evil at scale, but most startups don’t have scale problems, at least not at first.

Denormalizing can be a good optimization but you pay a velocity cost in keeping all the copies in sync across changes. Someone will write the bug that misses a denormalized non-canonical field and serves up stale data to a user. It’s usually cheaper (in total cost, ie CapEx+OpEx) to write the join and optimize later with a read-aside cache or whatever, rather than contorting your schema.

otikik2y ago

> a few $k/mo

Isn’t that the cost of one engineer already?

isityouyesitsme2y ago

In straight dollars, perhaps yes. But the new servers don't show up and spend 3 to 6 months before accomplishing anything meaningful, don't require sick time which can cause the optimizations to slip, and don't take 3 months to find the right fit for hire.

Part of the cost consideration is deterministic results. I will pay a premium for near-guaranteed good but probably sub-optimal results and will actively avoid betting on people I haven't met and don't know exist.

In my hiring, I hire now to solve problems we expect to hit after 4 quarters. It almost never makes sense to hire anyone into a full-time role for any project in a shorter timeframe. If you were wrong about the specific problems you expect to have in a year, you have a person who is trained in your development environment, tooling, and projects, and you already budgeted to use them in-depth in a year. There's no emergency. There is time to pivot. But if you're wrong about the need to hire someone now full time, you front load all of the risk and if it doesn't work out, you are stuck with an employee you do not need (and stuck is the right word. Have you ever terminated someone? It is harder than you think it is, and I don't mean just for emotional reasons).

Buy hardware over people. Treat the people you have as if the business depends on them. Let them know that it does. Everyone is happier this way.

dragonwriter2y ago

> > a few $k/mo

> Isn’t that the cost of one engineer already?

Only for very cheap engineers and very large values of “a few”. $120k/year is pretty low total compensation for an engineer (and the cost of an engineer exceeds their total comp because there is also gear, and the share of management, HR, and other support they consume) and amounts to $10k/month.

theptip2y ago

In the Bay Area, no, an engineer costs an order of magnitude more. (For a round number, think $15-20k/mo including office space, benefits, etc. for a senior engineer; that's perhaps a bit high for the period I'm discussing but it also isn't attempting to price the cost of equity grants. At that time Google was probably spending something like $35-40k/mo (maybe higher, I don't know their office/perk costs) on equivalent talent at SWE5 including the liquid RSU grants.) But of course run the cost/benefit calc for your own cost of labor.

More importantly, it's critical to think in terms of opportunity cost. Like I said, we couldn't hire engineers fast enough at that time, so if I put someone on this work it would be taking them off some other important project. Plausibly for a fast-growing startup that means eschewing work that's worth $1-2m/eng-yr or more (just looking at concrete increases in company valuation, not present value of future gains). So we're talking on the order of $100k/eng-mo opportunity cost.

re-thc2y ago

> I built a startup and paid little attention to perf for years 1-5, and finally in year 6 we started to get bitten by some perf issues in specific tables, and spent a few engineer-months optimizing.

This screams of if I don’t see it it the problem doesn’t exist view of the world.

How do you know it’s not a problem? Perhaps customers would have signed up if it as faster?

The problem is also treating it in terms of business value and/or cost.

A lot of things are “free” and yet it’s ignored.

For most people, in simple cases like turning on http3, brotli, switching to newer instances and many others are all quick wins that I see ignored 90% of the time.

A good design, implementing some good practices etc are performance specific and don’t always cost more.

kreetx2y ago

A denormalized database model is considered bad desig to begin with, and has performance costs on its own. This is why the OP says this is a "hot take". :)

Maybe there are situations where this actually helps, although the resulting datastructure to me looks more like a multi-key cache.

1 more reply

crazygringo2y ago

That is a hot take... ;)

But joins should never impact performance in a large way if they're on the same server and properly indexed. "It's truly amazing how much faster everything is when you eliminate joins" is just not true if you're using joins correctly. Sadly, many developers simply never bother to learn.

On the other hand, having to write a piece of data to 20 different spots instead of 1 is going to be dramatically slower performance-wise, not to mention make your queries tremendously more complex and bug-prone, when you remember to update a value in 18 spots but forget about 2 of them.

You mention cheap storage as an advantage for denormalizing, but storage is the least problem here. It's a vastly larger surface area for bugs, and terrible write performance (that can easily chew up read performance).

jiggawatts2y ago

Storage might be cheap, but memory and bandwidth aren’t.

Memory is the new disk, and disk is the new tape.

You want everything to remain resident in memory, and spool backups and transaction logs to disk.

If you’re joining from disk, you’ve probably done something wrong.

E.g.: compression is often a net win because while it uses more CPU, it allows more data to fit into memory. And if it doesn’t fit, it reduces the disk I/O required.

This is why I look upon JSON-based document databases in horror. They’re bloating the data out many times over by denormalizing and then expand that into a verbose and repetitive text format.

This is why we have insanity like placeholders for text on web apps now — they’re struggling to retrieve a mere kilobyte of data!

yashap2y ago

Joins are not inherently expensive, but they can lead to expensive queries. For example, say I want to find the 10 most recent users with a phone number as their primary contact method:

SELECT …

FROM User

JOIN ContactMethod on ContactMethod.userId = User.id

WHERE ContactMethod.priority = ‘primary’ AND ContactMethod.type = ‘phoneNumber’

ORDER BY User.createdAt DESC

LIMIT 10

If there are a very large number of users, and a very large number of phone number primary contacts, you cannot make this query fast/efficient (on most RDBMSes). You CAN make this query fast/efficient by denormalizing, ensuring the user creation date and primary contact method are on the same table, and then creating a compound index. But if they’re in separate tables, and you have to join, you can’t make it efficient, because you can’t create cross-table compound indeces.

This pattern of join, filter by something in table A, sort by something in table B, and query out one page of data, is something that comes up a lot. It’s why ppl thing joins are generally expensive, but it’s more like they’re expensive in specific cases.

zepolen2y ago

For 10 million users + telephones, this takes 1ms.

    create table users (
        id serial primary key not null,
        created_at timestamp not null default now()
    );

    create table users_telephones (
        user_id int references users(id) not null,
        is_primary boolean not null default true,
        telephone varchar not null
    );

    insert into users
    select i, NOW() + (random() * (interval '90 days')) + '30 days' from generate_series(1, 10000000) i;
    insert into users_telephones select id, true, random() :: text from users limit 10000000; -- all users have a primary telephone
    insert into users_telephones select id, false, random() :: text from users limit 200000; -- some users have a non primary telephone
    create index on users(created_at);
    create index on users_telephones(user_id);
    create index on users_telephones(user_id, is_primary) where is_primary;

    select count(*) from users;
    count   
    ----------
    10000000
    (1 row)

    Time: 160.911 ms


    select count(*) from users_telephones;
    count   
    ----------
    10200000
    (1 row)

    Time: 176.361 ms


    select
        *
    from
        users u
        join users_telephones ut on u.id = ut.user_id
    where
        ut.is_primary
    order by
        created_at
    limit
        10;

    id    |         created_at         | user_id | is_primary |     telephone      
    ---------+----------------------------+---------+------------+--------------------
    9017755 | 2023-09-11 11:45:37.65744  | 9017755 | t          | 0.7182410419408853
    6061687 | 2023-09-11 11:45:39.271054 | 6061687 | t          | 0.3608686654204689
    9823470 | 2023-09-11 11:45:39.284201 | 9823470 | t          | 0.3026398665522869
    2622527 | 2023-09-11 11:45:39.919549 | 2622527 | t          | 0.1929579716250771
    7585920 | 2023-09-11 11:45:40.256742 | 7585920 | t          | 0.3830236472843005
    5077138 | 2023-09-11 11:45:41.076164 | 5077138 | t          | 0.9058939392225689
    1496883 | 2023-09-11 11:45:42.459194 | 1496883 | t          | 0.1519510558344308
    9234364 | 2023-09-11 11:45:42.965896 | 9234364 | t          | 0.8254433522266105
    6988331 | 2023-09-11 11:45:43.130548 | 6988331 | t          | 0.9577098184736457
    7916398 | 2023-09-11 11:45:43.559425 | 7916398 | t          | 0.9681218675498862
    (10 rows)

    Time: 0.973 ms

5 more replies

franckpachot2y ago

With an index on User (createdAt, id) and one on ContactMethod ( primary,ContactMethod,userId), it should be fast (check that the the execution plan starts with User). Except if lot of recent users have no phones, but that will not be better in a single table (except if columnar storage)

2 more replies

Turskarama2y ago

In my experience with SQL, a query like that should return in under a second even if you have 100k or more users.

There are some other tricks you can use if you're clever/lucky as well. If you're just using integer IDs (which is reasonable if your system isn't distributed) then you could order by userid on you ContactMethod table and still get the same speed as you would with no join.

1 more reply

Zanfa2y ago

> If there are a very large number of users, and a very large number of phone number primary contacts, you cannot make this query fast/efficient

I think specific numbers would help make this point better. With a few hundred thousand to low millions of users this should be plenty fast in Postgres for example. That’s magnitudes more than most startups ever reach anyway.

1 more reply

yashap2y ago

Lots of replies to this one! I created a little benchmark that you can easily run yourself, as long as you have Docker installed. It shows how, for cases like the one I described above, the only way to have consistently fast queries (i.e. even with a cold cache) is to denormalize, so you can create the ideal compound index. The normalize/join version takes 15x longer, which can be the difference between 1s and 15s queries, 2s and 30s, etc.

The benchmark: https://gist.github.com/yashap/6d7a34ef37c6b7d3e4fc11b0bece7...

Note: I think in almost all cases you should start with a denormalized schema and use joins. But when you hit cases like the above, it's fine to denormalize just for these specific cases - often you'll just have one or a few such cases in your entire app, where the combination of data size/shape/queries means you cannot have efficient queries without denormalizing. And when people say "joins are slow", it's often cases like this that they're talking about - it's not the join itself that's slow, but rather that cross-table compound indexes are impossible in most RDBMSes, and without that you just can't create good enough indexes for fast queries with lots of data and cold caches.

1 more reply

tlarkworthy2y ago

Yes this is the exact situation where sql falls short. You can't make cross-table indexes to serve OLAP-esq queries, and the most recent X is the common one for pagination in applications. I prefer to denormalize manually at write time in a transaction, rather than use triggers or materialized views.

1 more reply

unnouinceput2y ago

If the tables involved in the join are of 100M+ records what I do when the joins use varchar columns to improve the performance is to use an additional integer column of the varchar one that is a CRC of it (or hash if you prefer that) and use the integer one instead in the join.

1 more reply

endisneigh2y ago

I agree but I’m talking in the context where you can’t vertically scale anymore.

I also don’t think it’s worth the trouble “never using joins” for an existing project. Denormalize as necessary. But for a green one I honestly think since our access patterns can be understood as you continue you can completely get rid of joins.

Again, assuming your new project can’t fit on a single machine. If it can you’re best just following the “traditional” advice, or better yet keep everything in memory.

crazygringo2y ago

Well then you're only really talking about < 0.1% of projects, since a single server and replication and caching will get you as far as you need, even for most social networks.

And if you are needing to massively shard because you're Facebook or Twitter, then it's not much of a hot take at all. But it's also massively oversimplified advice. Because the tradeoffs between joins and denormalization depend entirely on each specific scenario, and have to be analyzed use case by use case. In many cases, joins still win out -- e.g. retrieving the profile name and photo ID for the author of each post being displayed on your screen.

I'm just worried that people without experience will see your advice and think it's a good rule of thumb for their single-server database, because they think joins are scary and you've provided some kind of confirmation.

fastball2y ago

How is it that your new project can't fit on a single machine?

3 more replies

kaba02y ago

Also, one could just create a materialized view for the join and solve the problem at the proper layer.

blitzar2y ago

9 out of 10 will fail before they need to scale

feoren2y ago

There are "tall" applications and "wide" applications. Almost all advice you ever read about database design and optimization is for "tall" applications. Basically, it means that your application is only doing one single thing, and everything else is in service of that. Most of the big tech companies you can think of are tall. They have only a handful of really critical, driving concepts in their data model.

Facebook really only has people, posts, and ads.

Netflix really only has accounts and shows.

Amazon (the product) really only has sellers, buyers, and products, with maybe a couple more behind the scene for logistics.

The reason for this is because tall applications are easy. Much, much easier than wide applications, which are often called "enterprise". Enterprise software is bad because it's hard. This is where the most unexplored territory is. This is where untold riches lie. The existing players in this space are abysmally bad at it (Oracle, etc.). You will be too, if you enter it with a tall mindset.

Advice like "never user joins" and "design around a single table" makes a lot of sense for tall applications. It's awful, terrible, very bad, no-good advice for wide applications. You see this occasionally when these very tall companies attempt to do literally anything other than their core competency: they fail miserably, because they're staffed with people who hold sacrosanct this kind of advice that does not translate to the vast space of "wide" applications. Just realize that: your advice is for companies doing easy things who are already successful and have run out of low-hanging fruit. Even tall applications that aren't yet victims of their own success do not need to think about butchering their data model in service of performance. Only those who are already vastly successful and are trying to squeeze out the last juices of performance. But those are the people who least need advice. This kind of tall-centered advice, justified with "FAANG is doing it so you should too" and "but what about when you have a billion users?" is poisoning the minds of people who set off to do something more interesting than serve ads to billions of people.

DylanDmitri2y ago

For Amazon, if we consider everything for website retail purchases, I would estimate tens of thousands of table schemas. This is counting:

- Sellers (Amazon, third party, retail store) - Inventory (forecasting, recommendation) - Customers (comments, ratings, returns, preferences) - Warehouses (5+ distinct types, filled with custom machines) - Transit Options (long haul, air, vans, cars, bikes, walking, boats) - Delivery Partners (DSP, Flex, Fedex, UPS - forecasting capacity here) - Routing (between warehouses, within warehouses, to specific homes) - skipping AWS - skipping billing - skipping advertising on amazon.com (bidding, attribution, etc)

There's optimizations and metrics collected and packages transition between all these layers. There's hundreds of "neat projects" running to special case different things; all them useful but adding complexity.

For example ordering prescriptions off Amazon pharmacy needs effectively its own website and permissions and integrations. Probably distinct sorting machines with supporting databases for them. Do you need to log repairs on those machines? Probably another table schema.

You want to normalize international addresses? And fine tune based on delivery status and logs and customer complaints and map data? Believe it not like 20 more tables. Oh this country has no clear addresses? Need to send experienced drivers to areas they already know. Need to track that in more tables.

r00fus2y ago

But these aren't the website, right? Amazon runs "wide" enterprise systems in the back end for sure.

1 more reply

donavanm2y ago

I apologize up front if I completely misunderstand your intent. However ...

> Amazon (the product) really only has sellers, buyers, and products, with maybe a couple more behind the scene for logistics.

Is a comically bad hot take that is so entirely divorced from reality. A full decade ago the item catalog (eg ASINs or items to purchase) alone had closer to 1,000 different subsystems/components/RPCs etc for a single query. I think you'd have to go back to circa 2000 before it could be optimistically described as a couple of databases for the item catalog.

DylanDmitri sibling comment is a hell of a lot closer to the truth, and I'd hazard is still orders of magnitude underestimating what it takes to go from viewing an item detail page to completing checkout, let alone picking or delivery. Theres a reason the service map diagram, again circa 2010, was called "the deathstar."

> "FAANG is doing it so you should too" and "but what about when you have a billion users?" is poisoning the minds of people

This part I completely agree with. And many individual components in those giant systems are dead simple. I dare say the best ones are simplistic even.

strgcmc2y ago

Ex-Amazonian here, and while I agree with the facts you present, I do think the "tall" vs "wide" debate is being misapplied here.

Amazon is extremely and perversely obsessed with, and good at, building decoupled systems at scale, which in essence means lots and lots of individual separate "tall" systems, instead of monolithic "wide" systems.

So IMO, Amazon subscribes to a "forest-of-'tall'-services" philosophy. And even at that meta level, I would say the forests are better off when they grow taller, rather than wider.

wruza2y ago

This kind of tall-centered advice, justified with "FAANG is doing it so you should too" and "but what about when you have a billion users?" is poisoning the minds of people

The world runs on success stories, not on technology. I wish “wide” thinking was default, for both un-delusion and better development in this area. But everyone is amazed with facebook (not the site, just money), so they have to imitate it, like those tribes who build jets out of wood.

pipe_connector2y ago

I agree with the characterization of applications you've laid out and think everyone should consider whether they're working on a "tall" (most users use a narrow band of functionality) or a "wide" (most users use a mostly non-overlapping band of functionality) application.

I also agree with your take that tall applications are generally easier to build engineering-wise.

Where I disagree is that I think in general wide applications are failures in product design, even if profitable for a period of time. I've worked on a ton of wide applications, and each of them eventually became loathed by users and really hard to design features for. I think my advice would be to strive to build a tall application for as long as you can muster, because it means you understand your customers' problems better than anyone else.

feoren2y ago

> I've worked on a ton of wide applications, and each of them eventually became loathed by users and really hard to design features for.

Yes, I agree that this is the fate of most. But I refuse to believe it's inevitable; rather, I think it comes from systemic flaws in our design thinking. Most of what we learn in a college database course, most of what we read online, most all ideas in this space, transfer poorly to "wide" design. People don't realize this because those approaches do work well for tall applications, and because they're regarded religiously. This is why I call them so much harder.

1 more reply

xyzzy1232y ago

Thanks I think this is a really interesting way to look at things.

What is the market for "wide" applications though? It seems like any particular business can only really support one or two of them, for some that will be SAP and for others it might be Salesforce (if they don't need much ERP), or (as you mentioned) some giant semi homebrewed Oracle thing.

Usually there is a legacy system which is failing but still runs the business, and a "next gen" system which is not ready yet (and might never be, because it only supports a small number of use cases from the old software and even with an army of BAs it's difficult to spec out all the things the old software is actually doing with any accuracy).

Or am I not quite getting the idea?

feoren2y ago

I think you're getting the idea -- both your points kinda highlight that this is something that companies want, but are not really getting.

As for the market, various sources have the "enterprise software market", whatever that means, at somewhere around $100 billion to $300 billion. We also see companies trying over and over to do this kind of thing. The demand is clearly there.

Certainly the mandate "help run the business" is a wide concern, and that's an OK working definition of "enterprise", and what most existing solutions are trying to do. There are hundreds of interconnected concerns, lots of things to coordinate, etc.

There are other wide concerns, though. Almost anything in engineering and science. Take, for example, the question "how can we reduce our greenhouse gas emissions?" which a lot of companies are asking (or being forced to ask). If you wanted to build a SAAS product for helping companies reduce their GHG, you've got a wide problem, because there are a thousand activities that can emit GHG, and any given company is going to be doing dozens of them at once. But each company is different. Each state and country thinks of things differently. You might not even have the same calculations state-to-state.

Hard problems in science and engineering are just naturally cross-disciplinary, meaning your system has to know a lot of things about a lot of subjects. There are just thousands of little complicating differences and factors. If you're trying to solve a problem like this, absolutely do not de-normalize your database.

other_herbert2y ago

Lotus notes is wide… I imagine their scope creep checker was just a sticky note that said Absolutely!!

1 more reply

ozim2y ago

Facebook was done by Zuckerberg without any specialized knowledge and people would like to go that route because it seems easier, making Twitter/FB/Instagram clone you don't really have to know anything about insurances or handling industrial waste. Then it is basically people joining based on other people

Nowadays there are bunch of regulations on handling user data that one cannot do without knowing but when these companies started that was not an issue.

My point is market for "wide" applications is huge but it is much more fragmented. Of Course SAP and Salesforce are taking cut in that by having "one app for everything"

To get contracts you have to have specialized knowledge in specific area that your SaaS app would provide more value than configuring some crappy version in SAP. So you cannot just make an app in your basement and watch people sign up, but you have to spend a lot of leg work getting customers. That is why it is not really "hot" area for startups, because there is a lot of good money there but not unicorn money and most likely you won't be able to have 2 or 3 different specialist niche products so you could diversify investment but you would have to commit to a niche which makes it also not really interesting for a lot of entrepreneurs who most likely would lie to jump to something more profitable when possible.

pravus2y ago

> What is the market for "wide" applications though?

Just my experience, but essentially these target industries, not necessarily consumers or singular entities. Hence the term "enterprise". As someone who worked on a fairly reasonable ERP for academic purposes, even just calculating a GPA is extremely complicated in the backend:

    * There are multiple schemes for calculating GPAs
    * Each scheme needs to support multiple grading types (A-F, pass/fail, etc)
    * Each scheme needs to support multiple rounding rules
    * Displays of GPAs will need to be scaled properly based on the output context
    * GPA values will need to be normalized for use in calculations in other parts of the system
    * State legislatures mandate state-specific usages of GPAs which must be honored for legal compliance
    * All GPA calculations must have historical context in case the rules changes so that old transcripts can be revived correctly
    * Institutions themselves will have custom rules (maybe across schools or departments) for calculations which must be incorporated into everything else
    * This pretty much has to work every time

I don't know exactly how many tables GPAs themselves took, but overall the system was over 4,000 tables and 10,000+ stored procedures/functions. Also, I worked in the State of Texas which has its own institution-supported entity performing customizations to this ERP for multiple universities that are installed separately but required for full compliant operation.

I would compare this to most modern "tall" applications which would more-than-likely offer you maybe up to 3 different GPA options with some basic data syncing or something. They might offer multiple rounding types if they thought that far. These apps are generally extremely niche and typically work for very basic workloads. They can capture a lot of easy value for entry-level stuff but immediately fail at everything else.

evanelias2y ago

Your initial premise is flawed though. For example, as someone who worked on Facebook's database team, I can tell you that Facebook has thousands upon thousands of tables (distinct logical table definitions, i.e. not accounting for duplication from physical sharding or replication).

Some of these store things for user-facing product entities and associations between them -- you missed the vast majority of product functionality in your "people, posts, and ads" claim. Others are for internal purposes. Some workloads use joins, others do not.

Nothing about Facebook's database design is "tall", nor is it "easy". There are a lot of huge misconceptions out there about what Facebook's database architecture actually looks like!

Advice like "never user joins" and "design around a single table" is usually just bad advice for most applications. It has nothing to do with Facebook, and ditto for Amazon based on the sibling replies from Amazon folks.

yafbum2y ago

Doesn't that also say something like, it's an easier road to success if you find the tall way to market and scale, scale, scale once you find it? What is the "wide" success story to take inspiration from?

other_herbert2y ago

That’s awesome …. I’ve gotta remember this when our PO’s want to add things that have absolutely no business being in our software…

endisneigh2y ago

I agree with your sentiment, but even enterprises work on multiple “tall” features.

If they didn’t then I’d change my advice to be simply multi tenant per customer and replicate into a column store for cross customer analytics.

What advice would you give for a “wide” application?

poidos2y ago

Very insightful, thank you for writing this.

macNchz2y ago

Over the years I think I've encountered more pain from applications where the devs leaned on denormalization than from those that developed issues with join performance on large tables.

You can mash those big joins into a materialized view or ETL them into a column store or whatever you need to fix performance later on, but once someone has copied the `subtotal_cents` column onto the Order, Invoice, Payment, NotificationEmail, and UserProfileRecentOrders models, and they're referenced and/or updated in 296 different places...it's a long road back to sanity.

wizofaus2y ago

Can't say I've ever come across a scenario where a join itself was the performance bottleneck. If there's any single principle I have observed is "don't let a table get too big". More often than not it's historical-record type tables that are the issue - but the amount of data you need for day-to-day operations is usually a tiny fraction of what's actually in the table, and you're bound to start finding operations on massive tables get slow no matter what indexes you have (and even the act of adding more indexes becomes problematic. And just indexing all columns isn't enough for traditional RMDBSes at least - you have to index the right combinations of columns for them to be used. Might be different for DynamoDb).

tremon2y ago

don't let a table get too big

I'd amend that to "don't let your scan coverage get too big". Understanding how much data must be loaded in memory and compared is essential to writing performant database applications. And yes, those characteristics change over time as the data grows, so there may be a one-size-fits-all solution. But "table too large" can pretty much always be solved by adding better indexes or by partitioning the table, and making sure common queries (query's?) hit only one partition.

As a simple example: a lot of queries can be optimized to include "WHERE fiscal_year = $current". But you need to design your database and application up front to make use of such filtered indexes.

wizofaus2y ago

If your primary issue is read/query performance, then sure, well-designed indexes, partitioning and, as you say, carefully constructed WHERE clauses are often enough to maintain decent performance even with millions of records. But if you then to do deletions or bulk updates or schema transformations you're in for some serious downtime.

pixl972y ago

In enterprise clients you commonly run into issues where the company thinks you have to save all data forever. Very often this runs in a pattern where the application is lightly used a first then uptake increases over time. Then you run into the slowness issue. They typically expand DB sizing, bit eventually run into the problem where archiving is needed. This can be a huge problem when it's an after thought instead of a primary design. All kinds of fun when you have key relationships between different tablea.

wizofaus2y ago

It's more that once it gets to a certain size (say, 100s of 1000s of rows), doing anything with the table is painfully slow, often requiring you to take your application offline for considerable periods. Even deleting rows can take 10s of minutes at a time, and it can certainly take a very long time to work out what indexes need to be added and whether they're actually helping.

Yes, sometimes the pressure comes from management etc., but more often than not it would be premature optimisation to add the archiving, so it's a matter of finding a balance and "predicting" at what point the archiving needs to happen.

Table partitioning can help too but only so much.

8note2y ago

Dynamo is quick for that, so long as you are picking good partition keys.

Instead, it'll throw you hot key throttling if you start querying one partition too much

latchkey2y ago

> Design your application's hot path to never use joins.

Grab (uber of asia) did this religiously and it created a ton of friction within the company due to the way the teams were laid out. It always required one team to add some sort of API that another team could take advantage of. Since the first team was so busy always implementing their own features, it created roadblocks with other teams and everyone started pointing fingers at each other to the point that nothing ever got done on time.

Law of unintended consequences

tedunangst2y ago

Hard to follow the link. How would you join two tables between teams that don't communicate?

latchkey2y ago

You don’t, that’s the problem.

endisneigh2y ago

yes, this is a fair point. there's no free lunch after all. without knowing more about what happened with Grab I'd say you could mitigate some of that with good management and access patterns, though.

latchkey2y ago

All in all though, I don't think that 'never use joins' is a good solution either since it does create more developer work almost every way you slice it.

I think the op's solution of looking more closely at the hot paths and solving for those is a far better solution than re-architecting the application in ways that could, or can, create unintended consequences. People don't consider that enough, at all.

Don't forget that hot path resolution is the antithesis of 'premature optimization'.

> you could mitigate some of that with good management and access patterns

the CTO fired me for making those sorts of suggestions about better management, and then got fired himself a couple months later... ¯\_(ツ)_/¯... even with the macro events, their stock is down 72% since it opened, which doesn't surprise me in the least bit having been on the inside...

1 more reply

taylodl2y ago

My hot take: always use a materialized view or a stored procedure. Hide the actual, physical tables from the Application's account!

The application doesn't need to know how the data is physically stored in the database. They specify the logical view they need of the data. The DBAs create the materialized view/stored procedure that's needed to implement that logical view.

Since the application is never directly accessing the underlying physical data, it can be changed to make the retrieval more efficient without affecting any of the database's users. You're also getting the experts to create the required data access for you in the fastest, most efficient way possible.

We've been doing this for years now and it works great. It's alleviated so many headaches we used to have.

wvenable2y ago

"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious." -- Fred Brooks, The Mythical Man Month (1975)

> The application doesn't need to know how the data is physically stored in the database.

In all the applications that I've designed, the application and the database design are in sync. That's not say that you wouldn't use materialized views to deal with certain broad queries but I just don't see how this level of abstraction would make a big difference.

taylodl2y ago

There's the physical data model, and there's the logical data model. The application(s) only deal with the logical data model. They don't need to worry about how the data is physically stored.

This allows for the forward-compatible evolution of the logical data model which may necessitate extreme changes to the physical data model to keep everything performant. The client application(s) aren't affected by all the changes.

1 more reply

walterbell2y ago

Interface contracts and indirection FTW.

2011, "Materialized Views" by Rada Chirkova and Jun Yang, https://dsf.berkeley.edu/cs286/papers/mv-fntdb2012.pdf

> We cover three fundamental problems: (1) maintaining materialized views efficiently when the base tables change, (2) using materialized views effectively to improve performance and availability, and (3) selecting which views to materialize. We also point out their connections to a few other areas in database research, illustrate the benefit of cross-pollination of ideas with these areas, and identify several directions for research on materialized views.

lanstin2y ago

That's how it was at AOL. But also, in general, people who wrote C code never designed SQL stuff. We'd come up with some requirements, meet with a DBA (who at the time got paid a lot more money, I always assumed because their work was inherently dull), they'd put together stored procedures for use to call, and then do whatever on the physical table side. They did sometimes change the physical table stuff without us having to change anything (not sharding tho, that was edit a TCL file and restart when they said to restart).

downWidOutaFite2y ago

This doesn't work because DBAs are rarely on the dev team's sprint schedule. If the DBAs are blocking them devs can and will figure out how to route around the gatekeepers. In general, keep the logic in the app not the db.

icedchai2y ago

Also, many companies don't even have DBAs these days. DBA is, at best, a part time job for a senior+ engineer.

taylodl2y ago

We have sprints. We also have super responsive DBAs. Keeping the logic in the app is the path to unresponsive database calls. Been there, done that. Not going back to that crap.

In all seriousness, I won't work for an organization that works the way you describe. It's a red flag and a sign of organizational issues, personality issues, and ineffective management. Don't need to waste my time at a place like that.

porridgeandrice2y ago

Yeah, I like doing this too. Not _always_, but for a few things. I use it to emulate partial functions.

In a show hosting/ticket booking app for example, I never want in any case user facing search/by-id endpoints to serve a show from 2 months ago. So I create a view `select * from shows where time > now`. I can now use this as a 'table' and apply more filters and joins to this if I wish.

alfor2y ago

But for the saves the structure is visible?

taylodl2y ago

You can update underlying data via a materialized view.

1 more reply

tibbetts2y ago

Premature denomalization is expensive complexity. Denormalization is a great tool, maybe an under-used tool. But you should wait until there are hot paths before using it.

endisneigh2y ago

I agree. To be clear I'm not suggesting anyone start denormalizing everything. I'm saying if you're fortunate enough to be on a green project, you should design the schema around the access patterns which will surely be "denomarlized." as opposed to designing a normalized schema and designing your access patterns around those.

chaxor2y ago

Thanks, I hate it.

This seems close to the territory of "why do I need a database? I just keep a bunch of text files with really long names that described exactly what I did to compute the file. They're all in various directories, so if you need to find one just do some greps and finds on the whole system"

I recognize there's a big gap, but boy howdy does what you're suggesting sound messy.

paulddraper2y ago

> Design your application's hot path to never use joins

I had a Chief Architect who decreed this.

So engineers wound up doing joins in application code, with far worse performance, filtering, memory caching, etc.

iamwil2y ago

If you don't use joins, how do you associate records from two different tables when displaying the UI? Do you just join in the application? Or something else?

endisneigh2y ago

this has opinionated answers.

if you ask Amazon, they might suggest that you design around a single table (https://aws.amazon.com/blogs/compute/creating-a-single-table...).

in my opinion it's easier to use join tables. which are what are sometimes temporarily created when you do a join anyways. in this case, you permanently create table1, table2, and table1_join_table2, and keep all three in sync transactionally. when you need a join you just select on table1_join_table2. you might think this is a waste of space, but I'd argue storage is too cheap for you to be thinking about that.

that being said, you really have to design around your access patterns, don't design your application around your schema. most people do the latter because it seems more natural. what this might mean in practice is that you do mockups of all of the expected pages and what data is necessary on each one. then you design a schema that results in you never having to do joins on the majority, if not all, of them.

r00fus2y ago

Would it be possible to simply use a materialized view for table1_join_table2?

1 more reply

sainez2y ago

> what this might mean in practice is that you do mockups of all of the expected pages and what data is necessary on each one. then you design a schema that results in you never having to do joins on the majority, if not all, of them.

Great suggestion! I had a role where I helped a small team develop a full stack, data-heavy application. I felt pretty good about the individual layers but I felt we could have done a better job at achieving cohesion in the big picture. Do you have any resources where people think about these sorts of things deeply?

1 more reply

joshstrange2y ago

Single Table Design is the way forward here. I can highly recommend The DynamoDB Book [0] and anything (talks, blogs, etc) that Rick Houlihan has put out. In previous discussions the author shared a coupon code ("HACKERNEWS") that will take $20-$50 off the cost depending on the package you buy. It worked earlier this year for me when I bought the book. It was very helpful and I referred back to it a number of times. This github repo [1] is also a wealth of information (maintained by the same guy who wrote the book).

As an added data point I don't really like programming books but bought this since the data out there on Single Table Design was sparse or not well organized, it was worth every penny for me.

[0] https://www.dynamodbbook.com/

[1] https://github.com/alexdebrie/awesome-dynamodb

deely32y ago

And if you don't want to spend money, you can get idea from this article:

https://www.alexdebrie.com/posts/dynamodb-single-table/

Im really curious about real life performance on different databases, especially in situation where RAM is smaller than database size.

1 more reply

Scarbutt2y ago

Normalization is not only about data storage but most importantly, data integrity.

endisneigh2y ago

Yes, but I assert that it's possible to use transactions to update everything consistently. Serializable transactions weren't really common when MySQL/Postgres first came out, but now that they're common in new DBs + ACID, I think it's not possible to do with reasonable difficulty. If you agree with this, than its easy to prove that denormalized tables performance increase is well worth the annoyance of updating everything to transactionally update the dependencies.

I won't say that it's trivial to update all of your business logic to do this, but I think it's definitely worth it for a new project at least.

Bognar2y ago

Denormalized transactions are not trivial unless you are using serializable isolation level which will kill performance. If you don't use serializable isolation level, then you risk either running into deadlocks (which will kill performance) or inconsistency.

Decent SQL databases offer materialized views, which probably give you what you want without all the headache of maintaining denormalized tables yourself.

1 more reply

williamdclt2y ago

Transactions are not only (actually mainly not) about atomicity. Of course it’s possible to keep data integrity without normalisation, but that means you need to maintain the invariants yourself at application level and a big could result in data inconsistency. Normalisation isn’t there to make integrity possible, it’s there to make (some) non-integrity impossible.

Nobody says you have to have only one view of your data though. You can have a normalised view of your data to write, and another denormalised for fast reads (you usually have to, at scale). Something like event sourcing is another way (which is actually pushing invariants to application level, in a structured way)

1 more reply

Guvante2y ago

You always need to compare write vs read performance.

Turning a single table update into a 10 table one could tip your lock contention to the point where you are write bound or worse start hitting retries.

Certainly it makes sense to move rarely updated fields to where they are used makes sense.

Similarly "build your table against your queries not your ideal data model" is always sage advice.

hliyan2y ago

Ten years ago when DB engines were not as good and servers were not as large, I did something similar -- set up a trigger on insert/updates to certain relations that auto updates the main record with a cache column. Back then it was comma separated, but today I would obviously use JSONB. Back then it reduced latency significantly. Today, I would probably not attempt it.

wtetzner2y ago

I'd say that probably depends on what your hot path is. If it's write-heavy, then you'll probably end up with performance issues when you need to write the same data to multiple tables in a single transaction. And if all of those columns are indexed, it'll be even worse.

dajonker2y ago

Very hot take indeed. As with all things, it depends and use the query planner to measure what actually makes a difference.

In our application we have one important join that actually makes things a lot faster than the denormalized alternatieve. The main table has about 8 references to an organization table. To figure out what rows should be selected for a particular organization, you could either query on those 8 columns, making a very big where/or clause. As it turns out, PostgreSQL will usually end up doing a full table scan despite any index you would create.

Instead, there is an auxiliary table with two columns, one for organization and one reference to the main table. Joining on this table simplifies the query and also turns out to be much faster.

killthebuddha2y ago

IME if a join is the problem, a join is not the problem.

Buttons8402y ago

> On this note, I have mixed feelings about Amazon's DynamoDB, but one things about it is to use it properly you need to plan your use first, and schema second. I think there's something you can take from this even with a RDBMS.

This captures the experience I've had with DynamoDB and document databases in general. They appear more flexible at first, but in truth they are much less flexible. You must get them right up front or your going to be paying thousands of dollars every month in AWS bills just for DynamoDB. The need to get things right up front is the opposite of flexibility.

bbojan2y ago

> - Design your application's hot path to never use joins. Storage is cheap, denormalize everything and update it all in a transaction.

Hard disagree. You just re-implemented a database engine in your application code. Poorly.

anonzzzies2y ago

As indeed a traditional HN remark, you are giving advice for applications that almost no one will ever need to build because you will never, ever see the type of traffic/users for it.

Also, doing these things ; dejoining, UUIDs and indexing all columns (really unsure about this one; why?), might be better later on, but at the start it will be a lot heavier.

Modern hardware and databases can take an incredible amount of traffic if you use them in the right and natural way without artificial tricks.

skybrian2y ago

I’m wondering if indexes and materialized views can be used to do basically the same thing? That is, assuming they contain all the columns you want.

giantrobot2y ago

There's always money in the banana sta...materialized views. Materialized views will get you quite a ways on read heavy workloads.

Izkata2y ago

As long as you're okay with the reads being a little out of date after writes occur.

latchkey2y ago

The issue is writes, not reads.

rocqua2y ago

I wish for a DB that lets me write a completely normalized scheme, and then lets me specify how it should denormalize the scheme for actual storage. There is no reason manual updates to denormalized DBs need to be hand-rolled every time. They are easy to automatically deduce.

kaba02y ago

These are called material views.

tremon2y ago

*updatable views.

Not every material view is updatable, and a view doesn't need to be materialized to be updatable.

veave2y ago

>Design your application's hot path to never use joins. Storage is cheap, denormalize everything and update it all in a transaction. It's truly amazing how much faster everything is when you eliminate joins.

Anybody has documentation about this with examples?

joshstrange2y ago

See "Single Table Design" which I talked about in this comment above: https://news.ycombinator.com/item?id=37093357

deely32y ago

And if you don't want to spend money, you can get basic idea from this article: https://www.alexdebrie.com/posts/dynamodb-single-table/

newlisp2y ago

Duplicate data to avoid joins, use serializable transactions to update all the duplicated data.

_kidlike2y ago

> index all columns

check out MariaDB "ColumnStore". it recently got merged into the upstream binary, and i started reading about it. ngl i was salivating a bit.

i_like_apis2y ago

Yes I like the zero joins on hot paths approach. It can be hard to sell people on it. It’s a great decision for scaling though.

GrumpyNl2y ago

When your queries become slow because of joins, you havent designed your table structures properly.

zeckalpha2y ago

Materialized views provide a way to have joins that are accessed like tables!

backendanon2y ago

Performance isn't king. The business is king.

camgunz2y ago

I would rather just use a cache.

justinlloyd2y ago

Squeeze what you've got, as hard as you can, then realize after squeezing for a while that if you squeezed here, here, and also... here, by changing how you think about a problem, suddenly you've got a lot left to get.

I spent two or so months optimizing the crap out of a majestic monolith and went from under 2K RPS when the PM thought, and the team repeatedly reported, that everything had been squeezed as much as it could, then changing the hardware, which got us to less than 3200 RPS, then to 4K RPS after just a few days of tinkering, to 10K RPS with a bit more effort, to 40K RPS a week or so later. "Oh that's, enough, we don't need to go further." I then changed "quite a bit of stuff" which then jumped us to 2M+ RPS, and then a month later, a consistent 40M+ RPS with low latency on a single box and there is still some juice left in the box should we want to go a little harder.

Right now we're not even touching 5% of the capacity of what we can pull from, it was that much of a change, simply by changing how we think about the problems. Moving from the old server to the new server let us jump from around 1800 RPS to a hair over 3000 RPS. Adding more hardware didn't fix our underlying problems. Adding more complexity was just punting the problem down the road. But changing how to think about the problem? _That_ changed the problem. And changed our answer to the problem.

smarkov2y ago

Very curious to learn more about what the monolith was doing so incredibly poorly that you managed to squeeze that much performance out of it. Poorly written queries? Too many queries? Lack of any caching? Doing things synchronously when they could've been done concurrently?

justinlloyd2y ago

Some of that, some other bad practices. Lots of low-hanging fruit, then more esoteric changes.

https://justinlloyd.li/blog/how-much-cache-you-got-on-you/

zepolen2y ago

He cached everything and delayed writes too. It's easy to make a system fast when it's not realtime.

1 more reply

high_priest2y ago

Quick google yields very good examples in huge improvements on a single algorithm level: https://youtu.be/c33AZBnRHks Easy to imagine how even smaller improvements across 10-100 steps of data processing, can become even better.

porridgeandrice2y ago

+1 I'd like to know as well

justinlloyd2y ago

https://justinlloyd.li/blog/how-much-cache-you-got-on-you/

dist1ll2y ago

Could you share what's the content/protocol of these requests? And what's the average payload size (for req and response each)?

therealdrag02y ago

You gotta share more lol. These are insane gains.

justinlloyd2y ago

I cannot go too deep. It was some client work.

https://justinlloyd.li/blog/how-much-cache-you-got-on-you/

therealdrag02y ago

Thank you! Even that was insightful.

deathanatos2y ago

> Split up the monolith into multiple interconnected services, each with its own data store that could be scaled on its own terms.

Just to note: you don't have to split out all the possible microservices at this junction. You can ask, "what split would have the most impact?"

In my case, we split out some timeseries data from Mongo into Cassandra. Cass's table structure was a much better fit — that dataset had a well defined schema, so Cass could pack the data much more efficiently; for that subset, we didn't need the flexibility of JSON docs. And it was the bulk of our data, and so Mongo was quite happy after that. Only a single split was required. (And technically, we were a monolith before and after: the same service just ended up writing to two databases.)

Ironically, later, an airchair architect wanted to merge all the data into a JSON document store, which resulted in numerous "we've been down that road, and we know where it goes" type discussions.

mamcx2y ago

Is interesting that the idea of micro services is throw like a obvious "solution".

Is not.

"Scale-up" MUST be the "obvious" solution. What is missed by many, and this article touch (despite saying that micro-services is a "solid" choice) is that "Scale-up" is "scale-out" without breaking the consistency of the DB.

Is a lot you can do to squeeze, and is rare you need to ignore join, data validations and other anti-patterns that are normally trow casual when problems of performance happens.

deathanatos2y ago

I don't know what to tell you other than I've seen vertical scaling hit its ceiling, several times. The OP lists "scale vertically first" as a given; to an extent, I agree with it, and the comment you're responding to is made with that as a base assumption.

There are sometimes diminishing returns to simple scaling; e.g., in my current job, each new disk we add adds 1/n disks' worth of capacity. Each scaling step happens quicker and quicker (assuming growth of the underlying system). Eventually, you hit the wall in the OP, in that you need design level changes, not just quick fixes.

The situation I mention in my comment was one of those: we'd about reached the limits of what was possible with the setup we had. We were hitting things such as bringing in new nodes was difficult: the time for the replica to replicate was getting too long, and Mongo, at the time, had some bug that caused like a ~30% chance that the replica would SIGSEGV and need to restart the replication from scratch. Operationally, it was a headache, and the split moved a lot of data out that made these cuts not so bad. (Cassandra did bring its own challenges, but the sum of the new state was that it was better than where we were.)

Consistency is something you must pay attention to. In our case, the old foreign key between the two systems was the user ID, and we had specific checks to ensure consistency of it.

kreetx2y ago

In a way, in the article they also did a split: specific heavy select queries were offloaded to a replica.

agentultra2y ago

They could probably squeeze more depending on their workload patterns. RDBMS' typically optimize for fast/convenient writes. If your write load would be fine with a small increase in latency then you can do a lot of de-normalization so that your reads can avoid using tonnes of joins, aggregates, windows, etc at read-time. Update write path so that you update all of the de-normalized views at write time.

Depending on your read load and application structure you can get a lot more scale with caching.

Decent article.

kedean2y ago

Funny enough I frequently have the opposite problem, justifying repeatedly why Cassandra is a bad fit for relatively short lived, frequently updated data (tombstones, baby).

deathanatos2y ago

I'd agree with you there.

The specific data that went into Cassandra in our case was basically immutable. (And somehow, IIRC, we still had issues around tombstones. I am not a fan of them.) Cassandra's tooling left much to be desired around inspecting the exact state of tombstones within the cluster.

alexchamberlain2y ago

The other thing worth noting is a server can read from 2 datastores - which as a sibling comment says, they ended up doing with a read replica. There's nothing preventing you from reading from Postgres and Redis in the same process!

otar2y ago

Loads of over-engineering decisions would be avoided if devs understood how to read EXPLAIN/ANALYZE results and do the proper indexing/query optimization.

Log queries, filter our the ones that are very frequent or take loads of time to execute, cache the frequent ones, optimize the fat ones, do this systematically and your system will be healthier.

Things that help massively from my experience: - APM - slow query log - DB read/write replicas - partitioning and sharding

hennell2y ago

Simply understanding how to read explain output can be quite a task in itself though, databases are a whole other thing, especially if you barely do any SQL yourself.

Tools like https://explainmysql.com that make it clearer what you actually need to optimise are an easier system for Devs with enough database knowledge to set stuff up, but not enough to understand how it's used.

I assume someone's already working on an AI system that takes schema and logs and returns the SQL needed to magically improve things. Not sure I'd trust that, but I'd bet many companies would rather use that then get a full DBA.

Too2y ago

Understanding explain output is usually very simple. 1 Look for any occurrence of “table scan”. 2. Add index on those queried fields or limit the query by filtering on another already indexed field.

This should unclog the most low hanging fruit. Then there is of course more advanced scenarios, especially with joins.

That’s not to say that the UX for explaining (hah) this doesn’t have a lot of room for improvement.

hi412y ago

Do you know of any good resources to understand sql explain plan. In my current project, we are facing a lot of issues related to query performance on MS SQL server. Do we need to always specify index hint with queries. Sometimes index exists but query does not seem to be using the index. I am thinking using sql execution plan could help us understand this issue better. tia.

cocoflunchy2y ago

I recommend https://use-the-index-luke.com/ or even better do the training with Markus Winand, it'll change how you view databases.

otar2y ago

I was recommended this video once, but I haven't watched it: https://youtu.be/sGkSOvuaPs4

Use the Index Luke (also recommended by cocoflunchy) was one of my go to resources once.

Also, Tobias Petry does a really good job by covering many advanced topics on a Twitter and his books: https://twitter.com/tobias_petry

HeavyStorm2y ago

Can't even count how many "next Gen architecture" sessions I've been at which certainly could've been replaced with due diligence on the current implementation.

You don't fix bad coding with a new architecture. That just puts the problem off by some time.

i_like_apis2y ago

I’m reminded of one of my favorite sayings:

You go to war with the army you have, not the army you might want or wish to have at a later time.

You may want to ignore that this this comes from Donald Rumsfeld (he has some great ones though: “unknown unknowns …”, etc.)

I think about this a lot when working on teams. Everyone is not perfectly agreeable or has the same understanding or collective goals. Some may be suboptimal or prone to doing things you don’t prefer. But having a team is better than no team, so find the best way to accomplish goals with the one you have.

It applies to systems well too.

roughly2y ago

Rumsfeld's got some great quotes, most of which were delivered in the context of explaining how the Iraq war turned into such a clusterfuck, and boy could that whole situation have used the kind of leadership Donald Rumsfeld's quotes would lead you to believe the man could've provided.

rovolo2y ago

JFYI, the "Unknown Unknowns" quote is from before the invasion (2002-02-12). It was deflection on whether there was evidence of Iraq building WMDs or of cooperating with e.g Al Qaeda.

> Q: Could I follow up, Mr. Secretary, on what you just said, please? In regard to Iraq weapons of mass destruction and terrorists, is there any evidence to indicate that Iraq has attempted to or is willing to supply terrorists with weapons of mass destruction? Because there are reports that there is no evidence of a direct link between Baghdad and some of these terrorist organizations.

> Rumsfeld: Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones.

> And so people who have the omniscience that they can say with high certainty that something has not happened or is not being tried, have capabilities that are ...

https://archive.ph/20180320091111/http://archive.defense.gov...

el_nahual2y ago

I haven't read the transcript of this conversation in a long time, but thank you for sharing it.

The sophistry of his argument is extreme.

Yes, of course there are always "unknown unknowns"-- but the statement "there is no evidence that Iraq is supplying WMDs to terrorists" is not a statement made in a vacuum, in which all permutations of known/unknown are equally likely.

1 more reply

marcosdumay2y ago

If I remember it correctly (it was a long time ago), he never fully supported the war. It didn't take a genius to notice that the goals set by the presidency were (literally) impossible and not the kind of thing you do achieve a war.

But whatever position he had, Iraq turning into a clusterfuck wasn't a sign of bad leadership by his part. It was a sign of bad ethics, but not leadership. His options were all of getting out of his position, disobeying the people above him, or leading the US into a clusterfuck.

mickdeek862y ago

Rumsfeld personally advanced the de-baathification directive - the lynchpin of the clusterfuckery - all on his own, and he certainly would have known to expect the 'unexpected' results to be similar to de-nazification. This was absolutely his choice. Another point you have (unintentionally?) brought up is the dignified resignation option. While it is often a naive, self-serving gesture, we can reasonably imagine that the Defense Secretary publicly resigning over opposition to a war during the public consideration of that war, might have had some effect on whether that war was started. I want to like him too, with his grandfatherly demeanor and genuine funnyness ("My god, were there so many vases?!") but, come on.

js22y ago

I don't think you remember correctly:

> In the first emergency meeting of the National Security Council on the day of the attacks, Rumsfeld asked, "Why shouldn't we go against Iraq, not just al-Qaeda?" with his deputy Paul Wolfowitz adding that Iraq was a "brittle, oppressive regime that might break easily—it was doable," and, according to John Kampfner, "from that moment on, he and Wolfowitz used every available opportunity to press the case."

https://en.wikipedia.org/wiki/Donald_Rumsfeld#Military_decis...

earthboundkid2y ago

As history, this is completely incorrect, but beyond that, if you don’t believe in the mission of the President in committing an act of war, you have a responsibility to resign, and it can’t be bracketed as “bad ethics”.

Anyway, another historical point besides what the other commenters have said is that Rumsfeld believed in “transformation” which meant you could do more with less in modern war. He was totally wrong about it.

It wasn’t his fault Turkey didn’t let the US attack from the north, but other than that, the fuck up is his responsibility, among others.

2 more replies

dragonwriter2y ago

> Rumsfeld's got some great quotes, most of which were delivered in the context of explaining how the Iraq war turned into such a clusterfuck

If by “explaining how” you mean “deflecting (often preemptively) responsibility for”, yes.

froggit2y ago

> If by “explaining how” you mean “deflecting (often preemptively) responsibility for”, yes.

There's no reason to think you can't do both of those with the same statement.

1 more reply

xapata2y ago

> could've

If someone is 83.7% likely to provide good leadership, how would you evaluate the choice to hire that person as a leader in the hindsight that the person failed to provide good leadership -- was it a bad choice, or was it a good choice that was unlucky?

(Likelihood was selected arbitrarily.)

hluska2y ago

Like everything in politics, I think this is a function of what team you cheer for. If your goal was to come up with an excuse to invade Iraq, that person was an excellent choice. If you’re on the other team, what a clusterfuck.

Then you add in a party system and it gets more complicated. Realistically, you don’t get to be the United States Secretary of Defense (twice) if you’re the kind of person who will ignore the will of the party and whoever is President.

eru2y ago

Is that number (publicly) known when you hire the person?

If yes, you just evaluate the choice based on that probability (and other things you knew at the time), not on the actual outcome.

Prediction markets are one way to make these kinds numbers known.

1 more reply

whatshisface2y ago

>quotes would lead you to believe

moffkalast2y ago

Could've at least given them some motivational quotes.

hluska2y ago

I like to remind myself that very few people reach positions of great power after mediocre lives. Rather there’s a thread of talent that runs through government.

Once they’re in, the predilections that led to power often rear their dark long tails. But they’re all (even the ones I disagree with) talented.

patmcc2y ago

They're talented at getting into power, and may be talented at any number of other things.

They're not always talented at the things we may want them to be, unfortunately. And that's true of both the ones I agree and disagree with.

coldtea2y ago

>I like to remind myself that very few people reach positions of great power after mediocre lives.

You'd be surprised.

"Reaching positions of great power after mediocre lives" is the very art of career politics.

roughly2y ago

Politics is fundamentally the art of convincing people of things - usually “vote for me.” That is the only skill that acquisition of high office is evidence of. Many politicians have more skills than just than that, but the mere fact of having acquired high office tells you nothing more about a person than that they’re particularly good at politics.

oDot2y ago

Every time I hear the name Rumsfeld, I am reminded of the time when, for over 10 minutes, he refused to deny being a lizard:

https://www.youtube.com/watch?v=XH_34tqxAjA

i_like_apis2y ago

Haha. Thanks for sharing that. Rumsfeld definitely has a sense of humor.

He’s also one of the best candidates for that type of conspiracy theory. His career history is flabbergasting.

Check out https://en.wikipedia.org/wiki/Donald_Rumsfeld#Corporate_conn...

In addition to all the Bohemian Club, RAND corp, defense and government posts, in the 70s the guy was a CEO in the pharmaceuticals and electronics industries, was a director in aerospace, media and tech.

Definitely the type of resume that lets the imagination run wild with, “… wait, was he a lizard person …?”

fuzztester2y ago

"No battle plan survives contact with the enemy."

https://www.google.com/search?q=no+battle+plan+survives

fuzztester2y ago

https://en.m.wikipedia.org/wiki/Helmuth_von_Moltke_the_Elder

Moltke's thesis was that military strategy had to be understood as a system of options, since it was possible to plan only the beginning of a military operation. As a result, he considered the main task of military leaders to consist in the extensive preparation of all possible outcomes.[3] His thesis can be summed up by two statements, one famous and one less so, translated into English as "No plan of operations extends with certainty beyond the first encounter with the enemy's main strength" (or "no plan survives contact with the enemy") and "Strategy is a system of expedients".[18][8] Right before the Austro-Prussian War, Moltke was promoted to General of the Infantry.[8]

roughly2y ago

I’m reminded of the Eisenhower line: “plans are worthless, but planning is everything.”

4 more replies

sbuk2y ago

Mike Tyson said it more simply: "Everybody has a plan until you get hit in the face."

fuzztester2y ago

Nope.

Let's juxtapose them and see:

Von Moltke:

"No battle plan survives contact with the enemy."

Tyson:

"Everybody has a plan until you get hit in the face."

Pretty much the same meaning, and Von Moltke's quote is three words shorter, so no, Tyson's quote is not simpler.

Also, Tyson was ungrammatical, IMO:

"Everybody" vs. "you" in the same sentence, referring to the same entity.

Grammar experts, correct me if I am wrong.

4 more replies

tedunangst2y ago

Mattis "the enemy gets a vote" is another good reminder of reality, although people get very angry about it. Useful in terms of security, privacy, DRM, etc.

roughly2y ago

I work in an area with particularly clever and motivated users, and this quote pops to mind now and again when I learn about some of the hacks they’re using to get around some of the more optimistically designed systems they’ve been provided.

walterbell2y ago

Product management outside the box.

Buttons8402y ago

I like a similar quote from Steven Pressfield:

“The athlete knows the day will never come when he wakes up pain-free. He has to play hurt.”

This applies to ourselves more than our systems though.

brookst2y ago

I think it’s as applicable to systems. They are all imperfect, they all have flaws and broken parts that need fixing. And we have to use them.

sainez2y ago

Great point about working on teams. For the vast majority of tasks, people are only marginally better or worse than each other. A few people with decent communication will outpace a "star" any day of the week.

I try to remind myself of this fact when I'm frustrated with other people. A bit of humility and gratitude go a long way.

prmph2y ago

Hmmmm, I really don't think this is true all (or even most of) the time. It probably depends on the task at hand, but if leading small teams of all kinds has taught me anything, it's that I'd prefer a tiny team (or even one person) who is at least above average competence, and is reflective of the work they are doing, than several people of average or below-average competence.

It's eye opening how many people are outright lazy with thought, don't care about the joy of doing something well (apart from whatever extrinsic rewards are attached to the work). Many team members can actually produce negative value.

It seems that people who are really capable of (or care about) conscientious, original thought in problem solving and driving projects forward are few. Count yourself lucky if you get to manage one of these people, they can produce incredible value when well directed.

eru2y ago

Btw, excellent communication can also be the skill that makes a 'star'.

eru2y ago

> Great point about working on teams. For the vast majority of tasks, people are only marginally better or worse than each other. A few people with decent communication will outpace a "star" any day of the week.

Depends on what you are working on. Btw, good communication can also make someone a 'star' and elevate the whole team.

> I try to remind myself of this fact when I'm frustrated with other people. A bit of humility and gratitude go a long way.

That's good advice for most situations.

bmurphy19762y ago

I think about this whenever a product lead talks about planning something, dumping it on the dev team, and saying it's the dev team's responsibility to figure out how to implement it. No wonder the part of the company that does this has a very contentious relationship with their product team and is overly oriented around metrics, having to constantly fight for resources and prove they don't have the bandwidth to take on projects.

Meanwhile our side of the org has a much more collaborative relationship with our product team. We have our issues for sure, but our relationships are sound. The feedback loop is tight and product pushes back on things as much as the dev team does. Product works with the dev team to figure out what we can do and stays with us to the end. There's much less tossing things over the fence and everybody seems happier.

yardie2y ago

I’m not sure who came up with it first but the nautical expression is, “you sail with the crew you have”

i_like_apis2y ago

Yeah I should have left the Rumsfeld part out because the conversation naturally got distracted. It isn’t accurate to attribute it to him. His was perhaps the most prominent recent version , but he was definitely paraphrasing an existing adage.

makeitdouble2y ago

I'm thinking about this quote for a while but have a hard time squeezing the meaning, or really the actionable part out of it.

The unknown unknowns quote brings the concept that however confident you are in a plan you absolutely need margin. The other quote thought...what do you do differently when understanding that your team is not perfect ?

On one side, outside of VC backed startups I don't see companies trying to reinvent linux whith a team of 4 new graduates. On the other side companies with really big goals will hire a bunch until they feel comfortable with their talent before "going to war". You'll see recruiting posts seeking specialists in a field before a company bets the farm on that specific field (imagine Facebook renaming itself to Meta before owning Oculus...nobody does that[0])

Edit: sorry, I forgot some guy actually just did that 2 weeks ago with a major social platform. And I kinda wanted to forget about it I think.

coldtea2y ago

This however is a retelling of centuries old proverbs and quotes (all the way to Roosevel's "do what you can, with what you have, where you are"), and "unknown unknowns" was a concept already familiar in epistemology, but also fields like systems theory, risk management, etc.

bachmeier2y ago

That's the right attitude for an employee. If management says something like that, look for a new job. It's not sustainable to compete with fewer resources than your opposition. There's a reason college sports is going through a passionate realignment right now.

nostrademons2y ago

The way to win with fewer resources than your competition is to convince them it's not a competition. Or even better, to not let them know you exist.

batch122y ago

Sounds like he may have been a Shania Twain fan - "Dance with the One That Brought You" is the similar phrase I've heard

dizhn2y ago

Let's go back to the 70s.

https://en.m.wikipedia.org/wiki/Love_the_One_You%27re_With

KnobbleMcKnees2y ago

That was Donald Rumsfeld!? I always assumed this came from some techie or agile guru given how much it's used as a concept in project planning.

killjoywashere2y ago

As a military officer who was watching CNN live from inside an aircraft carrier (moored) when he said that, being in charge of anti-terrorism on the ship at the time, it was absolutely foundational to my approach to so many things after that. Here's the actual footage: https://www.youtube.com/watch?v=REWeBzGuzCc

Rumsfeld was complicated, but there's no doubt he was very effective at leading the Department. I think most people fail to realize how sophisticated the Office of the Secretary of Defense is. Their resources reel the mind, most of all the human capital, many with PhDs, many very savvy political operators with stunning operational experiences. As a small example, as I recall, Google's hallowed SRE system was developed by an engineer who had come up through the ranks of Navy nuclear power. That's but one small component reporting into OSD.

Not a Rumsfeld apologist, by any means. Errol Morris did a good job showing the man for who he is, and it's not pretty (1). But reading HN comments opining about the leadership qualities of a Navy fighter pilot who was both the youngest and oldest SECDEF makes me realize how the Internet lets people indulge in a Dunning-Kruger situation the likes of which humanity has never seen.

https://www.amazon.com/Known-Donald-Rumsfeld/dp/B00JGMJ914

sgarland2y ago

> Google's hallowed SRE system was developed by an engineer who had come up through the ranks of Navy nuclear power

Wait, really? That makes _so much sense._ It also makes me upset that all of my attempts to sway other SRE orgs over to Nuclear Navy practices have been met with doubt.

- ex nuke submariner

1 more reply

michael19992y ago

I'll support you there. In any sensible reading of Nuremberg, they all deserve to hang from the neck until dead. But the central moral failure was Bush. Letting Cheney hijack the vp search, and then pairing him up with Rumsfeld was a bad move, and obviously bad at the time. Those two had established themselves as brilliant but paranoid kooks with their Team B fantasies in the 70s, and should never have been allowed free rein.

jacquesm2y ago

Indeed, it is very well possible to be both brilliant and an ethically completely unhinged individual.

eru2y ago

> [...] the Internet lets people indulge in a Dunning-Kruger situation the likes of which humanity has never seen.

While we are at it, that infamous Dunning-Kruger study showed didn't even claim what people like to pretend it claimed. In addition the more nuanced claim they did make is not supported by the evidence they collected and presented in their paper. (Their statistics are pretty much useless, and as with any social science study, it has a small 'n' and it doesn't replicate.)

But the mythology 'Dunning-Kruger effect' is too good to pass up in Internet discussions, so it survives as a meme.

1 more reply

twelvechairs2y ago

It's from a post WWII psychological theory the 'Johari Window'. Rumsfeld brought the phrase into wider consciousness.

https://en.m.wikipedia.org/wiki/Johari_window

a_seattle_ian2y ago

That it came from Donald Rumsfeld in the context of what we know now and what he surely knew then is why it's such a good quote. The words basically say nothing but are also true about everything. So it can implicit be a warning that there is probably some bullshit going on or someone has a sense of humor and is also warning people while also avoiding the subject - of course just my opinion. How people actually use it will depend what the audience agrees it to mean.

KnobbleMcKnees2y ago

The common use I'm referring to is similar to the OP, which is using it as a framework for assessing risk. In particular, aligning a team on the "known unknowns" is critical to building the confidence and alignment needed as a group to be able to deal with unquantifiable/inestimable risk.

1 more reply

midasuni2y ago

And unknown unknowns is a great way to communicate with stakeholders too

roughly2y ago

Žižek has a followup to that quote:

"What he forgot to add was the crucial fourth term: the "unknown knowns," the things we don't know that we know."

I've found it's really critical during the project planning phase to get to not just where the boundaries of our knowledge are, but also where are the things we're either tacitly assuming or not even aware that we've assumed. An awful lot of postmortems I've been a part of have come down to "It didn't occur to us that could happen."

3 more replies

macNchz2y ago

In my experience, in web apps built on top of ORMs there is often a TON of low hanging fruit for query optimization when database load becomes an issue. Beyond the basics of "do we have N+1 issues", ORMs sometimes just don't generate optimal queries. I wouldn't want to built a complex production web app without an ORM, but being able to eject from it sometimes is key.

Profile real world queries being run in production that use the most resources. Take a look at them. Get a sense of the shape of the tables that they're running against. Sometimes the ORM will be using a join where you actually want a subquery. Sometimes the opposite. Sometimes you'll want to aggregate some results beforehand, or adjust the WHERE conditions in a complex join. I've seen situations where a semi-frequent ORM-generated query was murdering the DB, taking 20+ seconds to run, and with a few minor tweaks it would run in less than a second.

nerdponx2y ago

I'm working on something right now with the Python ORM SQLAlchemy. It turns out that getting it to use RETURNING with INSERT is not trivial and requires you to set the non-obvious option `expire_on_commit=False`, which doesn't guarantee use of RETURNING, but is supposed to use it if your db driver and database happen to support it and the ORM happens to support it for that particular combination of driver and database. And there's no API to actually inspect the generated SQL even though it's emitted in the logs, so there's no way to enforce the use of RETURNING in your test suite without capturing and scraping your own logs (which fortunately is very easy within the Pytest framework).

I like ORMs but this is just frustratingly complicated on so many levels. I also understand that SQLAlchemy is an enormous library and not everything will be easy. But I think this case exemplifies the trade-offs involved with using an ORM.

(Yes I am aware that using insert() itself in Core does what I want, I'm talking about .add()-ing an ORM object to an AsyncSession).

roughly2y ago

I don’t mean this as a slight on SQLAlchemy - it does a lot of things very well and provides a ton of levers and knobs - but it’s absolutely the first place I look when I’m looking for performance improvements.

A friend used to say Zookeeper was where the crazy lived in any application that used it - sqlalchemy is where the slow lives in any application that uses it.

somsak22y ago

not sure this is that specific to sqlalchemy, you could say this really about any ORM

bootsmann2y ago

There is certainly an API to inspect your query, you can just call print() on the object iirc.

nerdponx2y ago

The problem is with using `session.add(obj)` instead of `session.scalars(insert(TheClass).returning(TheClass), data)`. If there's a way to get generated SQL from an AsyncSession, please do let me know.

2 more replies

eru2y ago

A big problem with ORM's is that object orientation is just not a good way to organise software nor data for most domains.

Most business logic would be better expressed in the language of relational algebra (plus some extensions) than via OOP.

Xeoncross2y ago

> The real cost of increased complexity – often the much larger cost – is attention.

...or just mental load. I'm tired of working on micro-service systems that still have downtime, but no one knows how it all works. Most are actually just distributed monoliths so changes often touch multiple services and have to be rolled out in order. Data has to be duplicated, tasks have to be synchronized, state has to be shared, etc...

https://www.youtube.com/watch?v=y8OnoxKotPQ

javajosh2y ago

This is a very common architectural smell, when you have uservices and "no-one knows how they all work". The whole point is that no-one can or should know how they all work; the fact that someone does in order to fix or modify the system is a strong signal that you've violated some of the rules - like single responsibility, and proper abstraction through API. But, in my experience, this is extremely common - debugging a pipeline of N microservices often requires running and building all N services locally. This is, strictly speaking, a monolith + network partitions + (infinite) build/deploy variation. An extremely challenging work environment that is ultimately beyond any mortal programmer's ability, IMO.

sssspppp2y ago

Love this post. I’ve been trying to tell my manager the same message for the last few months (with little success). We’re about to embark on a massive migration to “next-gen infrastructure” (read: three different Redshift clusters managed by CDK) because our overloaded Redshift cluster (already maxed out with RA3 nodes) has melted down one too many times. The next-gen infra is significantly more complex than our existing setup and I’m not convinced this migration will be the silver bullet everyone is hoping for.

discussDev2y ago

It's the boring solution. It should also only be the default answer if you are not building a super critical system to life and limb. But it certainly gives a much lower total cost of ownership. If you don't have the resources for some big redundant system, I've too often seen the complexity added by the redundant system be the issue then focusing on simplicity. If you need to add a bunch of people to support complexity but both the money and the risk assessment doesn't call for it, simpler is much better. I won't say I haven't seen the issue where eventually it was only a huge project to go forward, but I tend to think sometimes even that is less then the sum of having dealt with complexity to that point, it's dependent on a lot about what you are building.

zengid2y ago

The solution they went with, squeezing juice out of the system by finding performance optimizations, brings me so much joy.

It reminds me of Richard L. Sites's book _Understanding_Software_Dynamics_ where he basically teaches how to measure and fix latency issues, and how at large scales, reducing latency can have tremendous savings.

Measuring and reasoning about those issues are hard, but the solutions are often simple. For example, on page 9 he mentions that "[a] simple change paid for 10 years of my salary."

I hope to someday make such an impactful optimization!

scottlamb2y ago

> Measuring and reasoning about those issues are hard, but the solutions are often simple. For example, on page 9 he mentions that "[a] simple change paid for 10 years of my salary." I hope to someday make such an impactful optimization!

I did that at Google more than once. They use a tremendous amount of machine resources and have excellent performance tools [1], so it's fertile ground.

There are a lot of other smart people around though so if you find a big opportunity there's probably a reason no one else has jumped on it. Maybe technical, maybe organizational. As an example of the latter, Google doesn't usually reward this kind of thing except when there's a resource crunch. Like, maybe I got a peer bonus (~$100) for one of them. I certainly didn't a 10% commission or a promotion or the ability to keep getting a paycheck without showing up for the next 10 years or whatever. As a general rule, they'd prefer engineers work on growing revenue than on reducing cost. Whether this is the right policy or not is kind of above my pay grade...

[1] e.g. https://research.google/pubs/pub36575/

canucker20162y ago

The problem I have with their eventual solution is that they only optimized their queries AFTER they had upgraded their instance to the largest config available.

They couldn't upgrade their config with a few clicks in the admin console anymore (I'm guessing what's involved here) so now they had to use actual grey matter to fix their capacity problem. Maybe if they had spent more time optimizing specific parts of their code, they wouldn't even need such a large config instance.

sgarland2y ago

Despite the fact that SQL is not a complex language, and relational algebra isn't that hard, people regard it as dark magic.

Administering and tuning RDBMS is dark magic. Doing basic query optimization should be viewed the same as "maybe don't write an O(n^3) algorithm."

iamwil2y ago

Ugh. I had a colleague that addressed any scaling problem by putting a cache in front of the DB. Praised for solving the immediate problem, but shouldered none of the costs. </rant>

I admit in the face of finding Prod/market fit, you do the expedient thing, but damned if I'm not often at the receiving end of these sorts of decisions.

FridgeSeal2y ago

“Just put a cache in front of it to speed it up” is up there along with everyone’s favourite misunderstood “(premature) optimisation is the root of all evil” when it comes to performance and efficient software in my opinion.

Caching is mostly a lie. Redis will not “make” your application faster. It will just let you pretend the problem doesn’t exist for a while, and then when the caching eventually falls over, you will be even more stuck, because you’re now past your scaling limits, with no more defenses left and software that has been (continually) built without the understanding of its performance requirements.

aidos2y ago

Interestingly, I often ask candidates about optimising a slow running db query and the majority of people jump to adding caching and very few ask if they can run an explain or see the indexes.

tedunangst2y ago

"I would make the slow query faster" seems too obvious an answer for an interview question.

aidos2y ago

Haha, sure, but the very first thing you should ask when faced with a slow query is to see the “explain analyse” output.

Caching, in any form, is the last thing you want to reach for because it’s always more nuanced than you anticipate.

To clarify, when asking the question it’s after drilling through the layers from the frontend -> backend -> query and the actual query is on the screen along with some table metrics as a guide.

pphysch2y ago

The question is usually phrased like "what would you check/do if you had a page that was taking many seconds to load".

tomxor2y ago

I find it useful to frame performance of pure software in two fairly distinct categories:

1. Efficiency, which I define as minimising losses i.e not writing things inefficiently, avoiding artificial complexity, bloat and keeping code simple. Performance hits here can also be a 1000 cuts problem when depending on many 3rd party pieces while having people chanting "premature optimisation" at you.

2. Optimization, which I define as employing specialist algorithms (which sometimes come in the form of entire tech stacks these days) with the cost of added complexity (and potentially performance trade) to get performance beyond the basic or naive yet efficiently implemented methods. The cost benefit ratio to these is not always worth it, especially in the beginning.

Hopefully the point I'm trying to make should be obvious, that attempting #2 before #1 is a bad idea, and in less explicit words I suspect this is kind of what the author is getting at... Yet it's not all that uncommon to see someone trying to fit a turbocharger to a cheese skateboard with 64 triangular wheels.

sheepz2y ago

Agree wholeheartedly with the conclusion of the article.

But the post makes it seem that there was no real query-level monitoring for the Postgres instance in place, other than perhaps the basic CPU/memory ones provided by the cloud provider. Using an ORM without this kind of monitoring is sure way to shoot yourself in the foot with n+1 queries, queries not using indexes/missing indexes etc

The other thing that is amazing that everyone immediately reached for redesigning the system without analyzing the cause of the issues. A single postgres instance can do a lot!

PeledYuval2y ago

What's your recommended way of implementing this in a simple App Server <> Postgres architecture? Is there a good Postgres plugin or do you utilize something on the App side?

sheepz2y ago

I've used pganalyze which is a non-free SaaS tool. Gives you a very good overview of where the DB time is spent with index suggestions etc. There are free alternatives, but require more work from you.

clintonb2y ago

We use Datadog, which centralizes logs and application traces, allowing us to better pinpoint the exact request/code path making the slow query.

pythooooi2y ago

It has never been as easy as today to see slow / shitty queries life.

Open telemetry, tracing and grafanasupport with k8s you basically get it running in a day.

But with performance it's always the same issue: people apparently do not think about it and the optimizations necessary are often: enable query statistics, finding the query in code and either fix/add an index or slightly rewrite your code or add some kind of cache (query, etc).

The last time I analyzed a slow query apparently no one before me spotted the huge memory footprint of that PostgreSQL query and focused on why it runs slower in one region vs the other.

You know the '10x developer' myth?

Yeah if you still look like a sheep after working in it so long and thinking about architecture and performance is still not second nature for you...

I'm slightly cynical because I love performance and optimizing it but 99% of those issues are no issues just people not knowing enough about their tools.

mandevil2y ago

He says to avoid complexity, and the team he was on (cleaning up some bad queries) was probably improving along that axis (or at worst orthogonal to complexity) but, from having done exactly this, adding an optional 'query the read-replica' option for queries- and determining whether this query can safely go there- is definitely extra complexity which will now need to be managed into the future. Definitely less overall than a complete re-arching of the system, but this is where engineering judgement and experience come into play: would you be better off getting those select queries resolved with some other data store or with a pg read-replica? If your query can survive against the read-replica (so stale data is at least sometimes acceptable) would you be better off caching results in redis?

gwbas1c2y ago

> If your query can survive against the read-replica (so stale data is at least sometimes acceptable) would you be better off caching results in redis?

Caching adds a lot of complexity. It denormalizes the data, and now you "need to know" when to update the cache. Because "the single source of truth" is no longer maintained, it's easy to accidentally add regressions.

If it's a matter of adding a read replica, that's a much better solution, long-term, because you don't have the effort of "does this query also need to update the cache?"

(I'd think by now there would be a way to expose events in a DB when certain tables are updated; and then (semi) automatically invalidate the cache.)

sakopov2y ago

I thought I was going to read something insightful. Instead it was a post about how to completely ignore your database performance and then consider overcomplicating everything with sharding and microservices because you didn't care to do basic profiling on your queries. I'm glad common sense prevailed, but this is really some junior-level stuff and it's being celebrated as some kind of novelty.

phirschybar2y ago

I agree with this approach. the other added benefit is that when they decided to optimize the app by eliminating or tuning queries and utilizing replicas for reads, they ultimately made the app much more performant while possibly reducing complexity. the "squeeze" mindset pays off in the long-run here. the continued optimization over time is infinitely better than adding the complexity of microservices or expanded infrastructure because the latter will simply bury and compound the potential optimizations which could AND SHOULD have been made. squeeze squeeze squeeze until you just can't squeeze any more!

huijzer2y ago

> We should always put off significant complexity increases as long as possible.

Reminds me of the mantra that I’ve read here to easily go for reversible things and very careful when going for irreversible things.

sainez2y ago

It is mentioned in this article about the inception of AWS's custom silicon: https://semiconductor.substack.com/p/on-the-origins-of-aws-c...

> “We use the terms one-way door and two-way door to describe the risk of a decision at Amazon. Two-way door decisions have a low cost of failure and are easy to undo, while one-way door decisions have a high cost of failure and are hard to undo. We make two-way door decisions quickly, knowing that speed of execution is key, but we make one-way door decisions slowly and far more deliberately.”

sssspppp2y ago

Amazon’s one way vs two way door decisions echo the sentiment

javajosh2y ago

I'll probably get down-voted for saying this (again), but a key way to squeeze unimaginable amounts of performance is to lean into stored procedures.

Look, I get it, the devx sucks. And it feels proprietary, icky, COBOL-like experience. It means you have to dwell in the database. What are you, a db admin?!

But I'm telling you, the payoff is worth it. (and also, if you ship it you own it so yes you're a db admin). My company ran for many years on 3 machines, despite it's extremely heavy page weight because the original author wrote it stored procs from the beginning. (He also liberally threw away data, which was great, but that's another post.) Part of my job was to migrate away from .NET and to Java and JavaScript - and another engineer wrote an ingenious tool that would generate Java bindings to SQL Server stored procs that made it really nice to work with them. And the performance really was outrageous - 100x better than any system I've worked with before or since. Those 3 boxes handled 300k very data intensive monthly actives, and that was like 10 years ago.

Don't worry - even if you lean into SPs there is still plenty of engineering to do! It's just that your data layer will simplify, and your troubleshooting actually gets easier, not harder. I liked the custom bindings - a bit like ActiveRecord, and no ORM. But really, truly: if you want to squeeze, move some queries into SPs and prepare to be amazed.

giantrobot2y ago

I can't disagree with the results, SPs can change your life. HOWEVER, they require significant discipline and regular audits. All the code for them needs to be in source control with a Process for deployment to the DB. You also need a test suite as part of the Process which runs against a staging server with a comparable configuration to prod. The SPs need to be regularly dumped and compared against what's in source control and marked as what's supposed to be released in prod.

javajosh2y ago

True enough; the binding tool handled this. The system of record was in the app server. The db was considered slaved to the app, and more-or-less tightly coupled to it (within reason). This provides a really solid mental model.

Side-note: the GP comment was actually doing pretty well (like +7) now it's (+2) so a cadre of anti-stored proc folks did a drive by down-voting. A bit sad, IMO.

throwdbaaway2y ago

What do you think about evolving the stored procedures into a stateless GRPC service that fronts the database? For the price of 1 (or 2) additional network hop, you get much better devx, while keeping most of the benefits provided by stored procedures.

javajosh2y ago

I don't think this adds much. The important thing is that your logic runs in the database--how the logic is ultimately exposed is up to you. We did a java app server calling SPs over JDBC with generated, typed bindings and this worked great. You'd have to write a similar tool to generate a gRPC server, but the logic would still be SPs in the database. That's the part that sucks for devs; the bindings are ultimately a detail.

gillh2y ago

Prioritized load shedding works well as a last resort [0]. The idea is simple -

- Detect overload/congestion build-up at the database

- Apply queueing at the gateway service and schedule requests based on their priority

- Shed excess requests after a timeout

[0]: https://docs.fluxninja.com/blog/protecting-postgresql-with-a...

atmosx2y ago

> Just think about how massive these costs are. How much feature delivery will have to be delayed or foregone to support the additional architectural complexity?

I don’t know if the author has worked with micro services. MS solve a communication issue. If implemented semi-properly teams stop blocking each other and the overall result is _faster_ and _safer_ feature delivery to production because the scope a team (or tribe, etc) will be working on a smaller, isolated codebase. The challenge _usually_ is that now developers have to take the environment into consideration introducing new patterns (retries, structured logs, time outs, circuit breakers, possibly SLIs for other teams, distributed tracing, metrics, etc). Given a large enough org, someone will either adopt or write a micro-framework to handle all or most of them.

To re-iterate if introducing MS stalled feature delivery, then it is a premature decision. YMMV, of course as there are other reasons to isolate part of the code base (e.g. compliance).

andresp2y ago

I got to understand from personal experience that the anti-SOA people are usually the ones who stayed at the same company their entire carers, never saw any model other than the monolithic one, see SOA as a threat to their domain knowledge within the company and simply are not able to see its downsides (because they have adapted their ways of work around it and never experienced anything better).

tomas7892y ago

I will offer myself as a counterexample. I worked at many companies, as exployee or contractor and held various positions. Dev, tech lead, manager.

I have yet to see a good application of microservices. I’m not saying there are none but the companies that can truéy benefit from that are few and fare apart.

From my experience smaller companies usually benefit a lot from simple monolythical architecture. Large companies tend to split problem into multiple products. But each product is still a kinda monolyth.

I have no experience with huge SaaS companies like Netflix. I can easily see why there the situation is quite different.

My horror story from recent days is that I had a 100-ish LOC patch. I had to push an update to 8 repos. That means 8 merge requests, 8 code reviews, deploy changes in correct order such that it does not break anything. The whole thing took 3 days. Coding was done in two hours.

camgunz2y ago

Similar experience to my sibling here: I've had a couple microservice shops and they really soured me on them and SOA in general.

- You have so many deploy stacks that it's literally boggling. I had to deploy code that used everything from make to GitHub CI to Jenkins to serverless just to push a single change.

- You have infinite implementations of your business logic. This is for two major reasons. First it's just hard to keep everything sync'd up; even if you've got "libbiz" you're gonna have services on various versions. But second, a core tenet of microservices is to just fork a new service for this and let other services migrate, but now you've evolved from an ecosystem of library differences to an ecosystem of service differences, which is way more complicated and expensive to maintain. It would be infinitely better if all parts of your app used the same versions of your business logic, but it will never ever happen.

- Your stacks are probably really heterogeneous ("right tool for the job"), but what that means is your devs now probably all have to know some mix of Java, Ruby, JavaScript, Python, Go, and maybe something more esoteric like Clojure or Elixir.

- Maintaining a microservice infra is way more complicated. Good luck monitoring multiple app and deploy stacks. Good luck keeping all of them up to date with security fixes. Good luck with Kubernetes and helm (or whatever). Good luck with multiple persistence systems (Postgres, Maria, Mongo, Redis, Cockroach, BigQuery, etc)

- You have to have an event bus. Boo.

- Your logging is hyper complicated now

- Because of the ecosystem around microservices, you're probably doing a lot of weird enterprisy things in your code (DDD, CQRS) that mostly only add layers of indirection or pull in more complicated dependencies, or inspire you to (against all good advice) build your own framework.

I'm not saying a monolith doesn't have problems, but I think the cons of microservices get very little play.

atmosx2y ago

> I'm not saying a monolith doesn't have problems, but I think the cons of microservices get very little play.

I was about to make the same point. I hear you though. In my case CI/CD, application framework, programming language, etc. was common. We manage to shoot ourselves in the foot multiple times because (a) our CI/CD was _way_ too smart, allowing for _way_ to many things to happen for us (SRE team) in non-standard ways and (b) Helm (Duh!) ... allows you to put deployment logic in there which leads to a mess.

1 more reply

account-52y ago

I suppose it seems obvious in hindsight that your first move should always be to investigate potential causes before a wholesale redesign that adds potentially unnecessary complexity to your system.

rsync2y ago

I'm late to this conversation but in case anyone is still reading ...

Sharding is a really simple and comprehensible way to distribute some load and I favor it for situations that are generally like this.

However, if you want to take a baby step, you can shard a database within the same machine by sharding the storage subsystem.

That is, instead of splitting up your database between X machines, you split the database between X SSD arrays within the existing machine.

Now each table (or whatever) that you've made a shard has a unique storage throughput and bus path and you aren't competing for iops on one array/disk/whatever.

Some workloads can gain a lot from that and it might involve simply plugging in a handful of additional SSDs.

gary_02y ago

No mention of caching? If your database is getting hammered with SELECTs, isn't putting a cache in front of it something that should at least be considered?

deathanatos2y ago

I've been in the OP's situation, and this exact suggestion was made in my case. Welcome to one of the hardest problems in CS: cache invalidation.

If you have a dataset for which cache invalidation is easy (e.g., data that is written and never updated), yeah, absolutely go for this.

In our case, and most cases I've seen, it wasn't so simple, and "split this off to a DB better suited to it" was less complex (maybe still a lot of work, but conceptually simple) than figuring out cache invalidation.

lern_too_spel2y ago

There are systems that will do that for you like https://readyset.io/.

Scarbutt2y ago

They mentioned adding a DB replica for reads.

alfor2y ago

I wonder if moving the db on beefy dedicated hardware with ton of ram and nvme would solve the problem. Preferably physicaly connected to the web serveurs.

Cost: a fraction of the developper cost.

I see so many things done on the cloud that 10X their complexity because of it. Modern hardware in increadibly powerfull.

winrid2y ago

If they were on the biggest instance I doubt they want to setup a network connection to another DC with another provider which has all kinds of business/privacy policy/etc concerns.

They likely already had 24.xlarge or something lol

jakey_bakey2y ago

[The Grug Brained Developer](https://grugbrain.dev/)

fritzo2y ago

> since our work touched many parts of the codebase and demanded collaboration with lots of different devs, we now have a strong distributed knowledge base about the existing system

Great to see this cultural side-effect called out.

klodolph2y ago

I have personally witnessed the “let’s build microservices to get better performance” argument. I definitely want to nip that in the bud.

It’s easy to fall in love with complexity, especially since you see a lot of complexity in existing systems. But those systems became complex as they evolved to meet user needs, or for other reasons, over time. Complex systems are impressive, but you need to make sure that your team has people who recognize the heavy costs of complexity, and who can throw their engineering efforts directly against the most important problems your team faces.

alfalfasprout2y ago

The problem is this is also a myopic way of looking at things. What you should be looking at is also operational complexity. What's the current burden on your org/team maintaining what you currently have? What about when you need to scale even higher?

A lot of teams that think this way end up with really high oncall burdens and then never have the time to even iterate on their infrastructure.

_ea1k2y ago

I blame the easy availability of additional resources in the cloud for a lot of problems here. Prod db slow? Get a bigger EC2 instance. Still slow? Hmm, maybe bigger again! Why bother tuning.

Now... Who knows why our AWS bill is so high?

With real hardware in a DC, you'd have to justify large capital expenditures to do something that stupid.

exabrial2y ago

This is amazing advice. A side note is to use the hell out of replication. These things don't have to be complicated. Setup a readonly and a readwrite datasource/connection pool in your app if you have to.

maxboone2y ago

Relevant blog on improving PostgreSQL performance on ZFS: https://news.ycombinator.com/item?id=29647645

vendiddy2y ago

I think a lot of complexity from optimizing databases would go away if incrementally computed materialized views were widely available.

jb36892y ago

Working on a database infra team has taught me that most developers don’t understand databases. Like they understand SQL and basic stuff, but they don’t understand how a database really works. Failure modes, consistency models, B-trees, caches, indexes. Turns out that stuff is important.

Joel_Mckay2y ago

The Monolith is often a marker of several naive assumptions.

Yet some interesting patterns will emerge if teams accept some basic constraints:

1. A low-cpu-power client-process is identical to a resource taxed server-process

2. A systems client-server pattern will inevitably become functionally equivalent to inter-server traffic. Thus, the assumption all high performance systems degenerate into a hosted peer-to-peer model will counterintuitively generalize. Accordingly, if you accept this fact early, than one may avoid re-writing a code-base 3 times, and trying to reconcile a bodged API.

3. Forwarding meaningful information does not mean collecting verbose telemetry, then trying to use data-science to fix your business model later. Assume you will eventually either have high-latency queuing, or start pooling users into siloed contexts. In either case, the faulty idea of a single database shared-state will need seriously reconsidered at around 40k users, and later abandoned after around 13m users.

4. sharding only buys time at the cost of reliability. You may disagree, but one will need to restart a partitioned-cluster under heavy-load to understand why.

5. All complex systems fail in improbable ways. Eventually consistent is usually better than sometimes broken. Thus, solutions like Erlang/Elixir have been around for awhile... perhaps the OTP offers a unique set of tradeoffs.

6. Everyone thinks these constraints don't apply at first. Thus, will repeat the same tantalizing... yet terrible design choices... others have repeated for 40+ years.

Good luck, =) J

userbinator2y ago

From the title, I was expecting something demoscene-related.

There's definitely a subset of the industry which seems to really love complexity, and I suspect a large part of that is caused by the incentives involved and the need for "growth" and justification for one's continued employment.

wlonkly2y ago

I know it's easy to be an armchair quarterback, but I'm surprised that they were faced with considering sharding or rearchitecting into microservices before they'd set up a read replica.

klntsky2y ago

Changing the architecture gives objective, measurable and predictable gains. Fine-tuning can give you any improvement on a scale of 0%-100%, and you never know in advance. So it is sometimes better to invest resources in a re-design.

bullen2y ago

The way I solve this is having a custom realtime distributed db and being very careful with writes.

Reads can scale to infinity.

http://root.rupy.se

quickthrower22y ago

I wonder if it is better to shard by tenant (customer) in the first place. A bit more complexity upfront but not that much really. Tenants could initially share a DB on different schemas.

YAGNI? But you could say that about a lot of things and this also gives you other options like releasing to a subset of customers (rolling release system like LTS etc.)

I work somewhere that has done this but for other driving reasons but it is a scalability dream. It is the web equivalent of desktop software running in Citrix! I haven’t seen anyone tune SQL in 4 years there whereas it has been a regular pastime everywhere else!

Yes you can’t do this for social networks but you can do it for most “customer with isolated clients” type shops which is most companies.

anotheraccount92y ago

This can work long term, as long as your system is not a group of humans. But certainly, making what appear like small adjustments can sometimes tremendously lower required resources.

gslin2y ago

The author is a SRE of HashiCorp (from LinkedIn), so I guess (yeah just guess) what he mentioned in the article is Terraform Cloud, based on the growing business.

dbg314152y ago

I feel like a lot of people do this instead of upgrading to newer versions, even maintenance patches.

And I get that upgrades can be scary, but often they are relatively low cost.

Leaving everyone on the old system unhappy… means they will eventually push to re-platform, or rebuild, instead of just doing suggested maintenance along the way to keep they system they have in good shape.

My advice… do the maintenance. Do all the maintenance! Don’t just drive it into the ground and get mad when it breaks; change the oil and tires and spring for a car wash and some new wiper blades every now and then and you’ll be happier in the long run.

andrewstuart2y ago

Isn’t Rails wasteful in its database access patterns?

iamwil2y ago

Generally No. But it can be easy to write bad queries using ActiveRecord ORM if you're not aware of N + 1 problems.

romafirst32y ago

100%, it makes it easy for bad programmers to write bad performing queries, but you can easily write performant code. Btw that’s a feature - letting people ramp up to full db knowledge is beneficial, you don’t want to be spending your time writing performant queries before you need to.

topspin2y ago

> it makes it easy for bad programmers to write bad performing queries

That is true of every ORM in existence. The easiest thing to do is naively follow the object graph in code, because that's what the ORM gives you. If the ORM was to somehow add friction here to encourage some other approach it would be panned as "too hard!!1" and fade away into obscurity.

1 more reply

jmmv2y ago

Reminds me of something I wrote a while back: https://jmmv.dev/2020/01/system-rewrites-and-tuning.html

TLDR: when facing problems like these, it’s too easy to look at grandiose solutions and, because they look like cool engineering problems, we end up justifying that it’s worth and reasonable to take on such year-long projects. But most often, the boring incremental solutions are easier and cheaper to achieve, while delivering benefit along the way.

This article shows examples of both, and I’m happy to see that the “boring” solution won.

nathias2y ago

Complexity in software is bad, things can be bad and necessary. It's bad in itself, but sometimes it can provide new functionality...

user67232y ago

https://www.citusdata.com/

392y ago

Strangely obvious advice?

sfink2y ago

Well, the advice is rarely taken in practice. It is (in my experience, and it seems common from others based on what I've heard) very very common to jump to the complicated solution at the first hint of capacity issues "because we'll need to do it eventually anyway."

The advice is obvious when you're thinking at that level of abstraction. Which suggests that, in practice, people who are architecting such systems rarely think at that level of abstraction. Which is why it is nice to have posts like this, that periodically remind us to get our heads out of the daily minutiae and consider the bigger picture (of complexity tradeoffs, realistic projections, staffing and availability, etc.)

joelshep2y ago

It might be obvious as far as it goes, but it's also incomplete in at least two ways. One is that as tweaks and optimizations and "supplementing the system in some way" often involves increasing its complexity, even if just a little bit at a time. It adds up with time. The more important thing is this: if you're already constrained on vertical scaling, and you don't have a firm grip on how fast your system is scaling, then you can't just stop with making the db more efficient. That's just postponing the inevitable, and possibly not for more than a couple of years. If you're in the position the author portrays, get the database under control first -- for sure -- but then get started on figuring out how you're going to stay in front of your scaling problem, whether that's rearchitecture, off-loading work to systems better suited for it, or whatever. Speaking as a former owner of a very large Amazon database that fought this battle many times, trying to buy enough dev time to build away from it before it completely collapsed. We were too content with performance improvements just like the ones described in this article, before finally recognizing we were just racing the clock.

iamwil2y ago

That no one likes to follow.

bryanlarsen2y ago

“Common sense is not so common.”

- Voltaire

TX81Z2y ago

Really curious how much can be attributed to using an ORM.

winrid2y ago

None. The ORM didn't design and type up the code.

notnmeyer2y ago

haha, when i read their initial thoughts were write-sharding and microservices i whispered “wtf?” to myself.

glad to see there was a better ending to the story though.

iblaine2y ago

TL;DR; do the easy things fist, in this case it was to fix bad SQL

Given the options to optimize SQL, move read operations to replicas, shard data or go towards micro services, optimizing SQL is the easy choice.

bayindirh2y ago

Actually, I disagree. The "TL;DR:" in the article is "first outgrow, then upgrade". In today's software development practice, efficiency is second class citizen, because moving fast and breaking things is the way to keep the momentum and be hip.

However, sometimes everyone needs to chill and sharpen the tool they have at hand. It might prove much more capable than first anticipated. Or you may be holding the tool wrong to a degree.

recursive42y ago

Pick one and hire for it.

romafirst32y ago

TLDR. We were going to completely rewrite our architecture but instead we optimized a few Postgres queries. LMAO

pluto_modadic2y ago

ah... rails easy mode discovers rails is only performant if you don't stray too far from hello world...

kunalgupta2y ago

I would definitely do the opposite of this - 3 months is a while and i think the cost of complexity would take a long time before it comparec

WallyFunk2y ago

> Of course, I’m not saying complexity is bad. It’s necessary.

Weird thing about computers, even after a fresh install of your favorite OS, the whole thing is sitting on a mountain of complexity, and that's before you start installing programs, browse the web, etc

Only the die-hard use things like MINIX[0] to do their computing. Correction: MINIX is in the Intel Management Engine so you have /two/ computers.

[0] https://en.wikipedia.org/wiki/Minix

j / k navigate · click thread line to collapse

373 comments

endisneigh2y ago

The bit on the database performance issues leads me to my hottest, flamiest take for new projects:

In fact, I'd go as far to say as joins are unnecessary for nonanalytical purposes these days. Storage is so mind booglingly cheap and the major DBs have ACID properties. Just denormalize, forreal.

/hottakes

my sun level take would be also to just index all columns. but that'll have to wait for another day.

theptip2y ago

In terms of tech debt it would have been way more expensive to make everything perform well from the start, we would have moved much slower and probably failed during a few crunch points.

Joins may be evil at scale, but most startups don’t have scale problems, at least not at first.

otikik2y ago

> a few $k/mo

Isn’t that the cost of one engineer already?

isityouyesitsme2y ago

Buy hardware over people. Treat the people you have as if the business depends on them. Let them know that it does. Everyone is happier this way.

dragonwriter2y ago

> > a few $k/mo

> Isn’t that the cost of one engineer already?

theptip2y ago

re-thc2y ago

> I built a startup and paid little attention to perf for years 1-5, and finally in year 6 we started to get bitten by some perf issues in specific tables, and spent a few engineer-months optimizing.

This screams of if I don’t see it it the problem doesn’t exist view of the world.

How do you know it’s not a problem? Perhaps customers would have signed up if it as faster?

The problem is also treating it in terms of business value and/or cost.

A lot of things are “free” and yet it’s ignored.

For most people, in simple cases like turning on http3, brotli, switching to newer instances and many others are all quick wins that I see ignored 90% of the time.

A good design, implementing some good practices etc are performance specific and don’t always cost more.

kreetx2y ago

A denormalized database model is considered bad desig to begin with, and has performance costs on its own. This is why the OP says this is a "hot take". :)

Maybe there are situations where this actually helps, although the resulting datastructure to me looks more like a multi-key cache.

1 more reply

crazygringo2y ago

That is a hot take... ;)

jiggawatts2y ago

Storage might be cheap, but memory and bandwidth aren’t.

Memory is the new disk, and disk is the new tape.

You want everything to remain resident in memory, and spool backups and transaction logs to disk.

If you’re joining from disk, you’ve probably done something wrong.

E.g.: compression is often a net win because while it uses more CPU, it allows more data to fit into memory. And if it doesn’t fit, it reduces the disk I/O required.

This is why I look upon JSON-based document databases in horror. They’re bloating the data out many times over by denormalizing and then expand that into a verbose and repetitive text format.

This is why we have insanity like placeholders for text on web apps now — they’re struggling to retrieve a mere kilobyte of data!

yashap2y ago

Joins are not inherently expensive, but they can lead to expensive queries. For example, say I want to find the 10 most recent users with a phone number as their primary contact method:

SELECT …

FROM User

JOIN ContactMethod on ContactMethod.userId = User.id

WHERE ContactMethod.priority = ‘primary’ AND ContactMethod.type = ‘phoneNumber’

ORDER BY User.createdAt DESC

LIMIT 10

zepolen2y ago

For 10 million users + telephones, this takes 1ms.

    create table users (
        id serial primary key not null,
        created_at timestamp not null default now()
    );

    create table users_telephones (
        user_id int references users(id) not null,
        is_primary boolean not null default true,
        telephone varchar not null
    );

    insert into users
    select i, NOW() + (random() * (interval '90 days')) + '30 days' from generate_series(1, 10000000) i;
    insert into users_telephones select id, true, random() :: text from users limit 10000000; -- all users have a primary telephone
    insert into users_telephones select id, false, random() :: text from users limit 200000; -- some users have a non primary telephone
    create index on users(created_at);
    create index on users_telephones(user_id);
    create index on users_telephones(user_id, is_primary) where is_primary;

    select count(*) from users;
    count   
    ----------
    10000000
    (1 row)

    Time: 160.911 ms


    select count(*) from users_telephones;
    count   
    ----------
    10200000
    (1 row)

    Time: 176.361 ms


    select
        *
    from
        users u
        join users_telephones ut on u.id = ut.user_id
    where
        ut.is_primary
    order by
        created_at
    limit
        10;

    id    |         created_at         | user_id | is_primary |     telephone      
    ---------+----------------------------+---------+------------+--------------------
    9017755 | 2023-09-11 11:45:37.65744  | 9017755 | t          | 0.7182410419408853
    6061687 | 2023-09-11 11:45:39.271054 | 6061687 | t          | 0.3608686654204689
    9823470 | 2023-09-11 11:45:39.284201 | 9823470 | t          | 0.3026398665522869
    2622527 | 2023-09-11 11:45:39.919549 | 2622527 | t          | 0.1929579716250771
    7585920 | 2023-09-11 11:45:40.256742 | 7585920 | t          | 0.3830236472843005
    5077138 | 2023-09-11 11:45:41.076164 | 5077138 | t          | 0.9058939392225689
    1496883 | 2023-09-11 11:45:42.459194 | 1496883 | t          | 0.1519510558344308
    9234364 | 2023-09-11 11:45:42.965896 | 9234364 | t          | 0.8254433522266105
    6988331 | 2023-09-11 11:45:43.130548 | 6988331 | t          | 0.9577098184736457
    7916398 | 2023-09-11 11:45:43.559425 | 7916398 | t          | 0.9681218675498862
    (10 rows)

    Time: 0.973 ms

5 more replies

franckpachot2y ago

2 more replies

Turskarama2y ago

In my experience with SQL, a query like that should return in under a second even if you have 100k or more users.

1 more reply

Zanfa2y ago

> If there are a very large number of users, and a very large number of phone number primary contacts, you cannot make this query fast/efficient

1 more reply

yashap2y ago

The benchmark: https://gist.github.com/yashap/6d7a34ef37c6b7d3e4fc11b0bece7...

1 more reply

tlarkworthy2y ago

1 more reply

unnouinceput2y ago

1 more reply

endisneigh2y ago

I agree but I’m talking in the context where you can’t vertically scale anymore.

Again, assuming your new project can’t fit on a single machine. If it can you’re best just following the “traditional” advice, or better yet keep everything in memory.

crazygringo2y ago

Well then you're only really talking about < 0.1% of projects, since a single server and replication and caching will get you as far as you need, even for most social networks.

fastball2y ago

How is it that your new project can't fit on a single machine?

3 more replies

kaba02y ago

Also, one could just create a materialized view for the join and solve the problem at the proper layer.

blitzar2y ago

9 out of 10 will fail before they need to scale

feoren2y ago

Facebook really only has people, posts, and ads.

Netflix really only has accounts and shows.

Amazon (the product) really only has sellers, buyers, and products, with maybe a couple more behind the scene for logistics.

DylanDmitri2y ago

For Amazon, if we consider everything for website retail purchases, I would estimate tens of thousands of table schemas. This is counting:

r00fus2y ago

But these aren't the website, right? Amazon runs "wide" enterprise systems in the back end for sure.

1 more reply

donavanm2y ago

I apologize up front if I completely misunderstand your intent. However ...

> Amazon (the product) really only has sellers, buyers, and products, with maybe a couple more behind the scene for logistics.

> "FAANG is doing it so you should too" and "but what about when you have a billion users?" is poisoning the minds of people

This part I completely agree with. And many individual components in those giant systems are dead simple. I dare say the best ones are simplistic even.

strgcmc2y ago

Ex-Amazonian here, and while I agree with the facts you present, I do think the "tall" vs "wide" debate is being misapplied here.

So IMO, Amazon subscribes to a "forest-of-'tall'-services" philosophy. And even at that meta level, I would say the forests are better off when they grow taller, rather than wider.

wruza2y ago

This kind of tall-centered advice, justified with "FAANG is doing it so you should too" and "but what about when you have a billion users?" is poisoning the minds of people

pipe_connector2y ago

I also agree with your take that tall applications are generally easier to build engineering-wise.

feoren2y ago

> I've worked on a ton of wide applications, and each of them eventually became loathed by users and really hard to design features for.

1 more reply

xyzzy1232y ago

Thanks I think this is a really interesting way to look at things.

Or am I not quite getting the idea?

feoren2y ago

I think you're getting the idea -- both your points kinda highlight that this is something that companies want, but are not really getting.

other_herbert2y ago

Lotus notes is wide… I imagine their scope creep checker was just a sticky note that said Absolutely!!

1 more reply

ozim2y ago

Nowadays there are bunch of regulations on handling user data that one cannot do without knowing but when these companies started that was not an issue.

My point is market for "wide" applications is huge but it is much more fragmented. Of Course SAP and Salesforce are taking cut in that by having "one app for everything"

pravus2y ago

> What is the market for "wide" applications though?

    * There are multiple schemes for calculating GPAs
    * Each scheme needs to support multiple grading types (A-F, pass/fail, etc)
    * Each scheme needs to support multiple rounding rules
    * Displays of GPAs will need to be scaled properly based on the output context
    * GPA values will need to be normalized for use in calculations in other parts of the system
    * State legislatures mandate state-specific usages of GPAs which must be honored for legal compliance
    * All GPA calculations must have historical context in case the rules changes so that old transcripts can be revived correctly
    * Institutions themselves will have custom rules (maybe across schools or departments) for calculations which must be incorporated into everything else
    * This pretty much has to work every time

evanelias2y ago

Nothing about Facebook's database design is "tall", nor is it "easy". There are a lot of huge misconceptions out there about what Facebook's database architecture actually looks like!

yafbum2y ago

other_herbert2y ago

That’s awesome …. I’ve gotta remember this when our PO’s want to add things that have absolutely no business being in our software…

endisneigh2y ago

I agree with your sentiment, but even enterprises work on multiple “tall” features.

If they didn’t then I’d change my advice to be simply multi tenant per customer and replicate into a column store for cross customer analytics.

What advice would you give for a “wide” application?

poidos2y ago

Very insightful, thank you for writing this.

macNchz2y ago

Over the years I think I've encountered more pain from applications where the devs leaned on denormalization than from those that developed issues with join performance on large tables.

wizofaus2y ago

tremon2y ago

don't let a table get too big

As a simple example: a lot of queries can be optimized to include "WHERE fiscal_year = $current". But you need to design your database and application up front to make use of such filtered indexes.

wizofaus2y ago

pixl972y ago

wizofaus2y ago

Table partitioning can help too but only so much.

8note2y ago

Dynamo is quick for that, so long as you are picking good partition keys.

Instead, it'll throw you hot key throttling if you start querying one partition too much

latchkey2y ago

> Design your application's hot path to never use joins.

Law of unintended consequences

tedunangst2y ago

Hard to follow the link. How would you join two tables between teams that don't communicate?

latchkey2y ago

You don’t, that’s the problem.

endisneigh2y ago

latchkey2y ago

All in all though, I don't think that 'never use joins' is a good solution either since it does create more developer work almost every way you slice it.

Don't forget that hot path resolution is the antithesis of 'premature optimization'.

> you could mitigate some of that with good management and access patterns

1 more reply

taylodl2y ago

My hot take: always use a materialized view or a stored procedure. Hide the actual, physical tables from the Application's account!

We've been doing this for years now and it works great. It's alleviated so many headaches we used to have.

wvenable2y ago

> The application doesn't need to know how the data is physically stored in the database.

taylodl2y ago

There's the physical data model, and there's the logical data model. The application(s) only deal with the logical data model. They don't need to worry about how the data is physically stored.

1 more reply

walterbell2y ago

Interface contracts and indirection FTW.

2011, "Materialized Views" by Rada Chirkova and Jun Yang, https://dsf.berkeley.edu/cs286/papers/mv-fntdb2012.pdf

lanstin2y ago

downWidOutaFite2y ago

icedchai2y ago

Also, many companies don't even have DBAs these days. DBA is, at best, a part time job for a senior+ engineer.

taylodl2y ago

We have sprints. We also have super responsive DBAs. Keeping the logic in the app is the path to unresponsive database calls. Been there, done that. Not going back to that crap.

porridgeandrice2y ago

Yeah, I like doing this too. Not _always_, but for a few things. I use it to emulate partial functions.

alfor2y ago

But for the saves the structure is visible?

taylodl2y ago

You can update underlying data via a materialized view.

1 more reply

tibbetts2y ago

Premature denomalization is expensive complexity. Denormalization is a great tool, maybe an under-used tool. But you should wait until there are hot paths before using it.

endisneigh2y ago

chaxor2y ago

Thanks, I hate it.

I recognize there's a big gap, but boy howdy does what you're suggesting sound messy.

paulddraper2y ago

> Design your application's hot path to never use joins

I had a Chief Architect who decreed this.

So engineers wound up doing joins in application code, with far worse performance, filtering, memory caching, etc.

iamwil2y ago

If you don't use joins, how do you associate records from two different tables when displaying the UI? Do you just join in the application? Or something else?

endisneigh2y ago

this has opinionated answers.

if you ask Amazon, they might suggest that you design around a single table (https://aws.amazon.com/blogs/compute/creating-a-single-table...).

r00fus2y ago

Would it be possible to simply use a materialized view for table1_join_table2?

1 more reply

sainez2y ago

1 more reply

joshstrange2y ago

As an added data point I don't really like programming books but bought this since the data out there on Single Table Design was sparse or not well organized, it was worth every penny for me.

[0] https://www.dynamodbbook.com/

[1] https://github.com/alexdebrie/awesome-dynamodb

deely32y ago

And if you don't want to spend money, you can get idea from this article:

https://www.alexdebrie.com/posts/dynamodb-single-table/

Im really curious about real life performance on different databases, especially in situation where RAM is smaller than database size.

1 more reply

Scarbutt2y ago

Normalization is not only about data storage but most importantly, data integrity.

endisneigh2y ago

I won't say that it's trivial to update all of your business logic to do this, but I think it's definitely worth it for a new project at least.

Bognar2y ago

Decent SQL databases offer materialized views, which probably give you what you want without all the headache of maintaining denormalized tables yourself.

1 more reply

williamdclt2y ago

1 more reply

Guvante2y ago

You always need to compare write vs read performance.

Turning a single table update into a 10 table one could tip your lock contention to the point where you are write bound or worse start hitting retries.

Certainly it makes sense to move rarely updated fields to where they are used makes sense.

Similarly "build your table against your queries not your ideal data model" is always sage advice.

hliyan2y ago

wtetzner2y ago

dajonker2y ago

Very hot take indeed. As with all things, it depends and use the query planner to measure what actually makes a difference.

Instead, there is an auxiliary table with two columns, one for organization and one reference to the main table. Joining on this table simplifies the query and also turns out to be much faster.

killthebuddha2y ago

IME if a join is the problem, a join is not the problem.

Buttons8402y ago

bbojan2y ago

> - Design your application's hot path to never use joins. Storage is cheap, denormalize everything and update it all in a transaction.

Hard disagree. You just re-implemented a database engine in your application code. Poorly.

anonzzzies2y ago

As indeed a traditional HN remark, you are giving advice for applications that almost no one will ever need to build because you will never, ever see the type of traffic/users for it.

Also, doing these things ; dejoining, UUIDs and indexing all columns (really unsure about this one; why?), might be better later on, but at the start it will be a lot heavier.

Modern hardware and databases can take an incredible amount of traffic if you use them in the right and natural way without artificial tricks.

skybrian2y ago

I’m wondering if indexes and materialized views can be used to do basically the same thing? That is, assuming they contain all the columns you want.

giantrobot2y ago

There's always money in the banana sta...materialized views. Materialized views will get you quite a ways on read heavy workloads.

Izkata2y ago

As long as you're okay with the reads being a little out of date after writes occur.

latchkey2y ago

The issue is writes, not reads.

rocqua2y ago

kaba02y ago

These are called material views.

tremon2y ago

*updatable views.

Not every material view is updatable, and a view doesn't need to be materialized to be updatable.

veave2y ago

Anybody has documentation about this with examples?

joshstrange2y ago

See "Single Table Design" which I talked about in this comment above: https://news.ycombinator.com/item?id=37093357

deely32y ago

And if you don't want to spend money, you can get basic idea from this article: https://www.alexdebrie.com/posts/dynamodb-single-table/

newlisp2y ago

Duplicate data to avoid joins, use serializable transactions to update all the duplicated data.

_kidlike2y ago

> index all columns

check out MariaDB "ColumnStore". it recently got merged into the upstream binary, and i started reading about it. ngl i was salivating a bit.

i_like_apis2y ago

Yes I like the zero joins on hot paths approach. It can be hard to sell people on it. It’s a great decision for scaling though.

GrumpyNl2y ago

When your queries become slow because of joins, you havent designed your table structures properly.

zeckalpha2y ago

Materialized views provide a way to have joins that are accessed like tables!

backendanon2y ago

Performance isn't king. The business is king.

camgunz2y ago

I would rather just use a cache.

justinlloyd2y ago

smarkov2y ago

justinlloyd2y ago

Some of that, some other bad practices. Lots of low-hanging fruit, then more esoteric changes.

https://justinlloyd.li/blog/how-much-cache-you-got-on-you/

zepolen2y ago

He cached everything and delayed writes too. It's easy to make a system fast when it's not realtime.

1 more reply

high_priest2y ago

porridgeandrice2y ago

+1 I'd like to know as well

justinlloyd2y ago

https://justinlloyd.li/blog/how-much-cache-you-got-on-you/

dist1ll2y ago

Could you share what's the content/protocol of these requests? And what's the average payload size (for req and response each)?

therealdrag02y ago

You gotta share more lol. These are insane gains.

justinlloyd2y ago

I cannot go too deep. It was some client work.

https://justinlloyd.li/blog/how-much-cache-you-got-on-you/

therealdrag02y ago

Thank you! Even that was insightful.

deathanatos2y ago

> Split up the monolith into multiple interconnected services, each with its own data store that could be scaled on its own terms.

Just to note: you don't have to split out all the possible microservices at this junction. You can ask, "what split would have the most impact?"

Ironically, later, an airchair architect wanted to merge all the data into a JSON document store, which resulted in numerous "we've been down that road, and we know where it goes" type discussions.

mamcx2y ago

Is interesting that the idea of micro services is throw like a obvious "solution".

Is not.

Is a lot you can do to squeeze, and is rare you need to ignore join, data validations and other anti-patterns that are normally trow casual when problems of performance happens.

deathanatos2y ago

Consistency is something you must pay attention to. In our case, the old foreign key between the two systems was the user ID, and we had specific checks to ensure consistency of it.

kreetx2y ago

In a way, in the article they also did a split: specific heavy select queries were offloaded to a replica.

agentultra2y ago

Depending on your read load and application structure you can get a lot more scale with caching.

Decent article.

kedean2y ago

Funny enough I frequently have the opposite problem, justifying repeatedly why Cassandra is a bad fit for relatively short lived, frequently updated data (tombstones, baby).

deathanatos2y ago

I'd agree with you there.

alexchamberlain2y ago

otar2y ago

Loads of over-engineering decisions would be avoided if devs understood how to read EXPLAIN/ANALYZE results and do the proper indexing/query optimization.

Log queries, filter our the ones that are very frequent or take loads of time to execute, cache the frequent ones, optimize the fat ones, do this systematically and your system will be healthier.

Things that help massively from my experience: - APM - slow query log - DB read/write replicas - partitioning and sharding

hennell2y ago

Simply understanding how to read explain output can be quite a task in itself though, databases are a whole other thing, especially if you barely do any SQL yourself.

Too2y ago

This should unclog the most low hanging fruit. Then there is of course more advanced scenarios, especially with joins.

That’s not to say that the UX for explaining (hah) this doesn’t have a lot of room for improvement.

hi412y ago

cocoflunchy2y ago

I recommend https://use-the-index-luke.com/ or even better do the training with Markus Winand, it'll change how you view databases.

otar2y ago

I was recommended this video once, but I haven't watched it: https://youtu.be/sGkSOvuaPs4

Use the Index Luke (also recommended by cocoflunchy) was one of my go to resources once.

Also, Tobias Petry does a really good job by covering many advanced topics on a Twitter and his books: https://twitter.com/tobias_petry

HeavyStorm2y ago

Can't even count how many "next Gen architecture" sessions I've been at which certainly could've been replaced with due diligence on the current implementation.

You don't fix bad coding with a new architecture. That just puts the problem off by some time.

i_like_apis2y ago

I’m reminded of one of my favorite sayings:

You go to war with the army you have, not the army you might want or wish to have at a later time.

You may want to ignore that this this comes from Donald Rumsfeld (he has some great ones though: “unknown unknowns …”, etc.)

It applies to systems well too.

roughly2y ago

rovolo2y ago

JFYI, the "Unknown Unknowns" quote is from before the invasion (2002-02-12). It was deflection on whether there was evidence of Iraq building WMDs or of cooperating with e.g Al Qaeda.

> And so people who have the omniscience that they can say with high certainty that something has not happened or is not being tried, have capabilities that are ...

https://archive.ph/20180320091111/http://archive.defense.gov...

el_nahual2y ago

I haven't read the transcript of this conversation in a long time, but thank you for sharing it.

The sophistry of his argument is extreme.

1 more reply

marcosdumay2y ago

mickdeek862y ago

js22y ago

I don't think you remember correctly:

https://en.wikipedia.org/wiki/Donald_Rumsfeld#Military_decis...

earthboundkid2y ago

It wasn’t his fault Turkey didn’t let the US attack from the north, but other than that, the fuck up is his responsibility, among others.

2 more replies

dragonwriter2y ago

> Rumsfeld's got some great quotes, most of which were delivered in the context of explaining how the Iraq war turned into such a clusterfuck

If by “explaining how” you mean “deflecting (often preemptively) responsibility for”, yes.

froggit2y ago

> If by “explaining how” you mean “deflecting (often preemptively) responsibility for”, yes.

There's no reason to think you can't do both of those with the same statement.

1 more reply

xapata2y ago

> could've

(Likelihood was selected arbitrarily.)

hluska2y ago

eru2y ago

Is that number (publicly) known when you hire the person?

If yes, you just evaluate the choice based on that probability (and other things you knew at the time), not on the actual outcome.

Prediction markets are one way to make these kinds numbers known.

1 more reply

whatshisface2y ago

>quotes would lead you to believe

moffkalast2y ago

Could've at least given them some motivational quotes.

hluska2y ago

I like to remind myself that very few people reach positions of great power after mediocre lives. Rather there’s a thread of talent that runs through government.

Once they’re in, the predilections that led to power often rear their dark long tails. But they’re all (even the ones I disagree with) talented.

patmcc2y ago

They're talented at getting into power, and may be talented at any number of other things.

They're not always talented at the things we may want them to be, unfortunately. And that's true of both the ones I agree and disagree with.

coldtea2y ago

>I like to remind myself that very few people reach positions of great power after mediocre lives.

You'd be surprised.

"Reaching positions of great power after mediocre lives" is the very art of career politics.

roughly2y ago

oDot2y ago

Every time I hear the name Rumsfeld, I am reminded of the time when, for over 10 minutes, he refused to deny being a lizard:

https://www.youtube.com/watch?v=XH_34tqxAjA

i_like_apis2y ago

Haha. Thanks for sharing that. Rumsfeld definitely has a sense of humor.

He’s also one of the best candidates for that type of conspiracy theory. His career history is flabbergasting.

Check out https://en.wikipedia.org/wiki/Donald_Rumsfeld#Corporate_conn...

Definitely the type of resume that lets the imagination run wild with, “… wait, was he a lizard person …?”

fuzztester2y ago

"No battle plan survives contact with the enemy."

https://www.google.com/search?q=no+battle+plan+survives

fuzztester2y ago

https://en.m.wikipedia.org/wiki/Helmuth_von_Moltke_the_Elder

roughly2y ago

I’m reminded of the Eisenhower line: “plans are worthless, but planning is everything.”

4 more replies

sbuk2y ago

Mike Tyson said it more simply: "Everybody has a plan until you get hit in the face."

fuzztester2y ago

Nope.

Let's juxtapose them and see:

Von Moltke:

"No battle plan survives contact with the enemy."

Tyson:

"Everybody has a plan until you get hit in the face."

Pretty much the same meaning, and Von Moltke's quote is three words shorter, so no, Tyson's quote is not simpler.

Also, Tyson was ungrammatical, IMO:

"Everybody" vs. "you" in the same sentence, referring to the same entity.

Grammar experts, correct me if I am wrong.

4 more replies

tedunangst2y ago

Mattis "the enemy gets a vote" is another good reminder of reality, although people get very angry about it. Useful in terms of security, privacy, DRM, etc.

roughly2y ago

walterbell2y ago

Product management outside the box.

Buttons8402y ago

I like a similar quote from Steven Pressfield:

“The athlete knows the day will never come when he wakes up pain-free. He has to play hurt.”

This applies to ourselves more than our systems though.

brookst2y ago

I think it’s as applicable to systems. They are all imperfect, they all have flaws and broken parts that need fixing. And we have to use them.

sainez2y ago

I try to remind myself of this fact when I'm frustrated with other people. A bit of humility and gratitude go a long way.

prmph2y ago

eru2y ago

Btw, excellent communication can also be the skill that makes a 'star'.

eru2y ago

Depends on what you are working on. Btw, good communication can also make someone a 'star' and elevate the whole team.

> I try to remind myself of this fact when I'm frustrated with other people. A bit of humility and gratitude go a long way.

That's good advice for most situations.

bmurphy19762y ago

yardie2y ago

I’m not sure who came up with it first but the nautical expression is, “you sail with the crew you have”

i_like_apis2y ago

makeitdouble2y ago

I'm thinking about this quote for a while but have a hard time squeezing the meaning, or really the actionable part out of it.

Edit: sorry, I forgot some guy actually just did that 2 weeks ago with a major social platform. And I kinda wanted to forget about it I think.

coldtea2y ago

bachmeier2y ago

nostrademons2y ago

The way to win with fewer resources than your competition is to convince them it's not a competition. Or even better, to not let them know you exist.

batch122y ago

Sounds like he may have been a Shania Twain fan - "Dance with the One That Brought You" is the similar phrase I've heard

dizhn2y ago

Let's go back to the 70s.

https://en.m.wikipedia.org/wiki/Love_the_One_You%27re_With

KnobbleMcKnees2y ago

That was Donald Rumsfeld!? I always assumed this came from some techie or agile guru given how much it's used as a concept in project planning.

killjoywashere2y ago

https://www.amazon.com/Known-Donald-Rumsfeld/dp/B00JGMJ914

sgarland2y ago

> Google's hallowed SRE system was developed by an engineer who had come up through the ranks of Navy nuclear power

Wait, really? That makes _so much sense._ It also makes me upset that all of my attempts to sway other SRE orgs over to Nuclear Navy practices have been met with doubt.

- ex nuke submariner

1 more reply

michael19992y ago

jacquesm2y ago

Indeed, it is very well possible to be both brilliant and an ethically completely unhinged individual.

eru2y ago

> [...] the Internet lets people indulge in a Dunning-Kruger situation the likes of which humanity has never seen.

But the mythology 'Dunning-Kruger effect' is too good to pass up in Internet discussions, so it survives as a meme.

1 more reply

twelvechairs2y ago

It's from a post WWII psychological theory the 'Johari Window'. Rumsfeld brought the phrase into wider consciousness.

https://en.m.wikipedia.org/wiki/Johari_window

a_seattle_ian2y ago

KnobbleMcKnees2y ago

1 more reply

midasuni2y ago

And unknown unknowns is a great way to communicate with stakeholders too

roughly2y ago

Žižek has a followup to that quote:

"What he forgot to add was the crucial fourth term: the "unknown knowns," the things we don't know that we know."

3 more replies

macNchz2y ago

nerdponx2y ago

(Yes I am aware that using insert() itself in Core does what I want, I'm talking about .add()-ing an ORM object to an AsyncSession).

roughly2y ago

A friend used to say Zookeeper was where the crazy lived in any application that used it - sqlalchemy is where the slow lives in any application that uses it.

somsak22y ago

not sure this is that specific to sqlalchemy, you could say this really about any ORM

bootsmann2y ago

There is certainly an API to inspect your query, you can just call print() on the object iirc.

nerdponx2y ago

2 more replies

eru2y ago

A big problem with ORM's is that object orientation is just not a good way to organise software nor data for most domains.

Most business logic would be better expressed in the language of relational algebra (plus some extensions) than via OOP.

Xeoncross2y ago

> The real cost of increased complexity – often the much larger cost – is attention.

https://www.youtube.com/watch?v=y8OnoxKotPQ

javajosh2y ago

sssspppp2y ago

discussDev2y ago

zengid2y ago

The solution they went with, squeezing juice out of the system by finding performance optimizations, brings me so much joy.

Measuring and reasoning about those issues are hard, but the solutions are often simple. For example, on page 9 he mentions that "[a] simple change paid for 10 years of my salary."

I hope to someday make such an impactful optimization!

scottlamb2y ago

I did that at Google more than once. They use a tremendous amount of machine resources and have excellent performance tools [1], so it's fertile ground.

[1] e.g. https://research.google/pubs/pub36575/

canucker20162y ago

The problem I have with their eventual solution is that they only optimized their queries AFTER they had upgraded their instance to the largest config available.

sgarland2y ago

Despite the fact that SQL is not a complex language, and relational algebra isn't that hard, people regard it as dark magic.

Administering and tuning RDBMS is dark magic. Doing basic query optimization should be viewed the same as "maybe don't write an O(n^3) algorithm."

iamwil2y ago

Ugh. I had a colleague that addressed any scaling problem by putting a cache in front of the DB. Praised for solving the immediate problem, but shouldered none of the costs. </rant>

I admit in the face of finding Prod/market fit, you do the expedient thing, but damned if I'm not often at the receiving end of these sorts of decisions.

FridgeSeal2y ago

aidos2y ago

Interestingly, I often ask candidates about optimising a slow running db query and the majority of people jump to adding caching and very few ask if they can run an explain or see the indexes.

tedunangst2y ago

"I would make the slow query faster" seems too obvious an answer for an interview question.

aidos2y ago

Haha, sure, but the very first thing you should ask when faced with a slow query is to see the “explain analyse” output.

Caching, in any form, is the last thing you want to reach for because it’s always more nuanced than you anticipate.

To clarify, when asking the question it’s after drilling through the layers from the frontend -> backend -> query and the actual query is on the screen along with some table metrics as a guide.

pphysch2y ago

The question is usually phrased like "what would you check/do if you had a page that was taking many seconds to load".

tomxor2y ago

I find it useful to frame performance of pure software in two fairly distinct categories:

sheepz2y ago

Agree wholeheartedly with the conclusion of the article.

The other thing that is amazing that everyone immediately reached for redesigning the system without analyzing the cause of the issues. A single postgres instance can do a lot!

PeledYuval2y ago

What's your recommended way of implementing this in a simple App Server <> Postgres architecture? Is there a good Postgres plugin or do you utilize something on the App side?

sheepz2y ago

clintonb2y ago

We use Datadog, which centralizes logs and application traces, allowing us to better pinpoint the exact request/code path making the slow query.

pythooooi2y ago

It has never been as easy as today to see slow / shitty queries life.

Open telemetry, tracing and grafanasupport with k8s you basically get it running in a day.

The last time I analyzed a slow query apparently no one before me spotted the huge memory footprint of that PostgreSQL query and focused on why it runs slower in one region vs the other.

You know the '10x developer' myth?

Yeah if you still look like a sheep after working in it so long and thinking about architecture and performance is still not second nature for you...

I'm slightly cynical because I love performance and optimizing it but 99% of those issues are no issues just people not knowing enough about their tools.

mandevil2y ago

gwbas1c2y ago

> If your query can survive against the read-replica (so stale data is at least sometimes acceptable) would you be better off caching results in redis?

If it's a matter of adding a read replica, that's a much better solution, long-term, because you don't have the effort of "does this query also need to update the cache?"

(I'd think by now there would be a way to expose events in a DB when certain tables are updated; and then (semi) automatically invalidate the cache.)

sakopov2y ago

phirschybar2y ago

huijzer2y ago

> We should always put off significant complexity increases as long as possible.

Reminds me of the mantra that I’ve read here to easily go for reversible things and very careful when going for irreversible things.

sainez2y ago

It is mentioned in this article about the inception of AWS's custom silicon: https://semiconductor.substack.com/p/on-the-origins-of-aws-c...

sssspppp2y ago

Amazon’s one way vs two way door decisions echo the sentiment

javajosh2y ago

I'll probably get down-voted for saying this (again), but a key way to squeeze unimaginable amounts of performance is to lean into stored procedures.

Look, I get it, the devx sucks. And it feels proprietary, icky, COBOL-like experience. It means you have to dwell in the database. What are you, a db admin?!

giantrobot2y ago

javajosh2y ago

Side-note: the GP comment was actually doing pretty well (like +7) now it's (+2) so a cadre of anti-stored proc folks did a drive by down-voting. A bit sad, IMO.

throwdbaaway2y ago

javajosh2y ago

gillh2y ago

Prioritized load shedding works well as a last resort [0]. The idea is simple -

- Detect overload/congestion build-up at the database

- Apply queueing at the gateway service and schedule requests based on their priority

- Shed excess requests after a timeout

[0]: https://docs.fluxninja.com/blog/protecting-postgresql-with-a...

atmosx2y ago

> Just think about how massive these costs are. How much feature delivery will have to be delayed or foregone to support the additional architectural complexity?

To re-iterate if introducing MS stalled feature delivery, then it is a premature decision. YMMV, of course as there are other reasons to isolate part of the code base (e.g. compliance).

andresp2y ago

tomas7892y ago

I will offer myself as a counterexample. I worked at many companies, as exployee or contractor and held various positions. Dev, tech lead, manager.

I have yet to see a good application of microservices. I’m not saying there are none but the companies that can truéy benefit from that are few and fare apart.

I have no experience with huge SaaS companies like Netflix. I can easily see why there the situation is quite different.

camgunz2y ago

Similar experience to my sibling here: I've had a couple microservice shops and they really soured me on them and SOA in general.

- You have so many deploy stacks that it's literally boggling. I had to deploy code that used everything from make to GitHub CI to Jenkins to serverless just to push a single change.

- You have to have an event bus. Boo.

- Your logging is hyper complicated now

I'm not saying a monolith doesn't have problems, but I think the cons of microservices get very little play.

atmosx2y ago

> I'm not saying a monolith doesn't have problems, but I think the cons of microservices get very little play.

1 more reply

account-52y ago

I suppose it seems obvious in hindsight that your first move should always be to investigate potential causes before a wholesale redesign that adds potentially unnecessary complexity to your system.

rsync2y ago

I'm late to this conversation but in case anyone is still reading ...

Sharding is a really simple and comprehensible way to distribute some load and I favor it for situations that are generally like this.

However, if you want to take a baby step, you can shard a database within the same machine by sharding the storage subsystem.

That is, instead of splitting up your database between X machines, you split the database between X SSD arrays within the existing machine.

Now each table (or whatever) that you've made a shard has a unique storage throughput and bus path and you aren't competing for iops on one array/disk/whatever.

Some workloads can gain a lot from that and it might involve simply plugging in a handful of additional SSDs.

gary_02y ago

No mention of caching? If your database is getting hammered with SELECTs, isn't putting a cache in front of it something that should at least be considered?

deathanatos2y ago

I've been in the OP's situation, and this exact suggestion was made in my case. Welcome to one of the hardest problems in CS: cache invalidation.

If you have a dataset for which cache invalidation is easy (e.g., data that is written and never updated), yeah, absolutely go for this.

lern_too_spel2y ago

There are systems that will do that for you like https://readyset.io/.

Scarbutt2y ago

They mentioned adding a DB replica for reads.

alfor2y ago

I wonder if moving the db on beefy dedicated hardware with ton of ram and nvme would solve the problem. Preferably physicaly connected to the web serveurs.

Cost: a fraction of the developper cost.

I see so many things done on the cloud that 10X their complexity because of it. Modern hardware in increadibly powerfull.

winrid2y ago

If they were on the biggest instance I doubt they want to setup a network connection to another DC with another provider which has all kinds of business/privacy policy/etc concerns.

They likely already had 24.xlarge or something lol

jakey_bakey2y ago

[The Grug Brained Developer](https://grugbrain.dev/)

fritzo2y ago

> since our work touched many parts of the codebase and demanded collaboration with lots of different devs, we now have a strong distributed knowledge base about the existing system

Great to see this cultural side-effect called out.

klodolph2y ago

I have personally witnessed the “let’s build microservices to get better performance” argument. I definitely want to nip that in the bud.

alfalfasprout2y ago

A lot of teams that think this way end up with really high oncall burdens and then never have the time to even iterate on their infrastructure.

_ea1k2y ago

I blame the easy availability of additional resources in the cloud for a lot of problems here. Prod db slow? Get a bigger EC2 instance. Still slow? Hmm, maybe bigger again! Why bother tuning.

Now... Who knows why our AWS bill is so high?

With real hardware in a DC, you'd have to justify large capital expenditures to do something that stupid.

exabrial2y ago

maxboone2y ago

Relevant blog on improving PostgreSQL performance on ZFS: https://news.ycombinator.com/item?id=29647645

vendiddy2y ago

I think a lot of complexity from optimizing databases would go away if incrementally computed materialized views were widely available.

jb36892y ago

Joel_Mckay2y ago

The Monolith is often a marker of several naive assumptions.

Yet some interesting patterns will emerge if teams accept some basic constraints:

1. A low-cpu-power client-process is identical to a resource taxed server-process

4. sharding only buys time at the cost of reliability. You may disagree, but one will need to restart a partitioned-cluster under heavy-load to understand why.

6. Everyone thinks these constraints don't apply at first. Thus, will repeat the same tantalizing... yet terrible design choices... others have repeated for 40+ years.

Good luck, =) J

userbinator2y ago

From the title, I was expecting something demoscene-related.

wlonkly2y ago

I know it's easy to be an armchair quarterback, but I'm surprised that they were faced with considering sharding or rearchitecting into microservices before they'd set up a read replica.

klntsky2y ago

bullen2y ago

The way I solve this is having a custom realtime distributed db and being very careful with writes.

Reads can scale to infinity.

http://root.rupy.se

quickthrower22y ago

I wonder if it is better to shard by tenant (customer) in the first place. A bit more complexity upfront but not that much really. Tenants could initially share a DB on different schemas.

YAGNI? But you could say that about a lot of things and this also gives you other options like releasing to a subset of customers (rolling release system like LTS etc.)

Yes you can’t do this for social networks but you can do it for most “customer with isolated clients” type shops which is most companies.

anotheraccount92y ago

This can work long term, as long as your system is not a group of humans. But certainly, making what appear like small adjustments can sometimes tremendously lower required resources.

gslin2y ago

The author is a SRE of HashiCorp (from LinkedIn), so I guess (yeah just guess) what he mentioned in the article is Terraform Cloud, based on the growing business.

dbg314152y ago

I feel like a lot of people do this instead of upgrading to newer versions, even maintenance patches.

And I get that upgrades can be scary, but often they are relatively low cost.

andrewstuart2y ago

Isn’t Rails wasteful in its database access patterns?

iamwil2y ago

Generally No. But it can be easy to write bad queries using ActiveRecord ORM if you're not aware of N + 1 problems.

romafirst32y ago

topspin2y ago

> it makes it easy for bad programmers to write bad performing queries

1 more reply

jmmv2y ago

Reminds me of something I wrote a while back: https://jmmv.dev/2020/01/system-rewrites-and-tuning.html

This article shows examples of both, and I’m happy to see that the “boring” solution won.

nathias2y ago

Complexity in software is bad, things can be bad and necessary. It's bad in itself, but sometimes it can provide new functionality...

user67232y ago

https://www.citusdata.com/

392y ago

Strangely obvious advice?

sfink2y ago

joelshep2y ago

iamwil2y ago

That no one likes to follow.

bryanlarsen2y ago

“Common sense is not so common.”

- Voltaire

TX81Z2y ago

Really curious how much can be attributed to using an ORM.

winrid2y ago

None. The ORM didn't design and type up the code.

notnmeyer2y ago

haha, when i read their initial thoughts were write-sharding and microservices i whispered “wtf?” to myself.

glad to see there was a better ending to the story though.

iblaine2y ago

TL;DR; do the easy things fist, in this case it was to fix bad SQL

Given the options to optimize SQL, move read operations to replicas, shard data or go towards micro services, optimizing SQL is the easy choice.

bayindirh2y ago

However, sometimes everyone needs to chill and sharpen the tool they have at hand. It might prove much more capable than first anticipated. Or you may be holding the tool wrong to a degree.

recursive42y ago

Pick one and hire for it.

romafirst32y ago

TLDR. We were going to completely rewrite our architecture but instead we optimized a few Postgres queries. LMAO

pluto_modadic2y ago

ah... rails easy mode discovers rails is only performant if you don't stray too far from hello world...

kunalgupta2y ago

I would definitely do the opposite of this - 3 months is a while and i think the cost of complexity would take a long time before it comparec

WallyFunk2y ago

> Of course, I’m not saying complexity is bad. It’s necessary.

Only the die-hard use things like MINIX[0] to do their computing. Correction: MINIX is in the Intel Management Engine so you have /two/ computers.

[0] https://en.wikipedia.org/wiki/Minix

j / k navigate · click thread line to collapse