I disagree a bit with "Don’t index columns if the table has little data", mostly because it doesn't matter in those cases. If the table is tiny the index is also very cheap (unless it's something really weird like a tiny table that is written at a very high frequency). And "little data" is just not specific enough for people to make decisions unless they already have a very good intuition on when the query planner would use such an index.
A rather important part that isn't mentioned about multi-column indexes is which kinds of query can use them. That is probably not obvious if you never read about them in detail, but it's really important to know when defining them.
BRIN is generally useful for datasets that have high local correlation. Ordered datasets have that, but it is not unique per se to ordered datasets. The summary type (operator class) you specify when creating the index is what defines which kind of correlation you need:
Minmax (the default for most indexable types in BRIN) needs values of the pages to be closer to eachother than to other ranges (a sorted table has this, but if you'd move the ranges around that would still hold). In the future, this may even be used to support topk-sorts.
Minmax-multi has similar needs as minmax, but has the ability to absorb some outliers in the ranges without losing precision immediately.
Bloom works well for equality checks and benefits most when only few values of the range of valid values are stored in each page range.
For instance, using the bloom operator class, you can use BRIN to exclude large ranges of the table at a fraction of the cost of scanning those ranges manually: it can quickly find out if the tuples in the table ranges might contain results with both date Y and user X, while only storing a fraction of O(sizeof table) instead of the O(num tuples) usually required for indexes.
I really wish pg had a way to do partial indexes with limits so I could create a partial index that stores, for example, only the most recent version of something (I find this comes up a lot).
Select the ID with the max sync_token. Easiest for Postgres to optimize, assuming you have a primary key of (id, sync_token).
SELECT *
FROM external_api
WHERE id = :my_id
AND sync_token = (SELECT max(sync_token) FROM ext_api WHERE id = :my_id)
Define a view using DISTINCT ON. Convenient for ad-hoc querying. Postgres usually figures out it can use the primary key to avoid a full-table scan. SELECT DISTINCT ON (id) *
FROM external_api
WHERE id = :my_id
AND sync_token = (SELECT max(sync_token) FROM ext_api WHERE id = :my_id)
ORDER BY id, sync_token DESC
For tricky predicates, I use a trigger to track the most recent resource in a separate table. This is a hacky version of incremental view maintenance. [1][1]: https://wiki.postgresql.org/wiki/Incremental_View_Maintenanc...
For a similar sounding problem, I used a view with row_number() over () with a partition() clause sorted descending so that row number 1 was always the most recent within the partition columns and older were 2, 3 etc. I could then query for the row number = 1 to get the most recent row (or 2 for 2nd etc.). For the most recent only, I had a view which had that row number = 1 condition and I used that most frequently to access data.
To get an index, the view could be materialized but then it needs refreshed but my experience was that had more overhead than just using the regular view.
https://www.postgresql.org/docs/current/ddl-partitioning.htm...
Inherently cost prohibitive. Maintaining the index after a delete is O(n) operation.
If you would like to take on some of that burden yourself, you can make a `lastest` bool flag and make a partial index on that.
Due to this we sometimes have to add multiple indexes with different permutations of the same columns.
Two really nice nuggets: how to detect unused and bloated indices.