This is part of a broader choice: write amplification. You'd want to, of course, have the most precise index possible - but no matter how you cut it, you are incurring extra I/O for writes - one for the tuple, one per index. How you index things is heavily influenced by the mix of reads and writes, and this is why we have data warehouses/read replicas in the first place: it allows us to avoid write amplification in the write path, while having fast filtered reads (that are slightly delayed).
If you're dealing with <ridiculous number of users>, there is a good chance that you don't want to be putting BI/OLAP indices on your OLTP database. You probably don't have enough users to worry about this - but - if you ever find that your writes are becoming an issue this is something to consider.
However, that is usually the least of your concerns with write amplification. If you don't batch your writes, you can easily get 100x write amplification. For any primary key or any other index not strongly correlated with your INSERTs, you can get perhaps another 100x write amplification even if you batch you writes.
But it would add complexity to detect out-of-sync indexes and tables.
> I'm not a big fan of using the constraint names in SQL, so to overcome both limitations I'd use MERGE instead:
``` db=# MERGE INTO urls t USING (VALUES (1000004, 'https://hakibenita.com')) AS s(id, url) ON t.url = s.url WHEN MATCHED THEN UPDATE SET id = s.id WHEN NOT MATCHED THEN INSERT (id, url) VALUES (s.id, s.url); MERGE 1 ```
I use `insert ... on conflict do update ...` all the time to handle upserts, but it seems like merge may be more powerful and able to work in more scenarios. I hadn't heard of it before.
https://pganalyze.com/blog/5mins-postgres-15-merge-vs-insert...
This is somewhat a personal preference, but I would just use `INSERT ... ON CONFLICT` and design my data model around it as much as I can. If I absolutely need the more general features of `MERGE` and _can't_ design an alternative using `INSERT ... ON CONLFICT` then I would take a bit of extra time to ensure I handle `MERGE` edge cases (failures) gracefully.
> If you want the generality of MERGE, you have to accept the fact that you might get unique constraint violations, when there are concurrent inserts, versus with INSERT ON CONFLICT, the way it's designed with its speculative insertions, guarantees that you either get an INSERT or an UPDATE and that is true even if there are concurrent inserts. You might want to choose INSERT ON CONFLICT if you need the guarantee.
Basically, `MERGE` is susceptible to a concurrent process also writing `INSERT` where that `INSERT` and `MERGE` are unaware of one another, causing a duplicate value to be used.
Besides portability, there is IMHO nothing against INSERT ... ON CONFLICT if it does what you need.
Since I know conceptually how RDBMSes work, I can ask vey specifically what I want. Also asking for feedback on schemas/queries really helped me. I use a lot more of PGs features now!
How well do they work for UUIDv7? You’d probably have to tune (increase?) pages_per_range, but even though each index entry is 16 bytes you have to consider the btree index on the same is also similarly affected (or worse).
It's interesting how both virtual columns and hash indexes work, but feel like they're bolted on, vs being made part of the whole ecosystem so that they work seamlessly.
Hash indices have long been crippled; they shipped almost unusable but every few years get a good QoL update. I think automatic unique constraints are the big thing left there.
> Starting at version 14, PostgreSQL supports generated columns - these are columns that are automatically populated with an expression when we insert the row. Sounds exactly like what we need but there is a caveat - the result of the expression is materialized - this means additional storage, which is what we were trying to save in the first place!
Is it also possible to create index (maybe partial index) on expressions?
Postgres makes DELETEs single threaded, this includes the selection part of the DELETE. By running a completely separate SELECT first Postgres would multithread the SELECT and populate the cache fast. Then the single thread DELETE can operate on in-memory data and not endlessly block loading data from disk.
PG's lack of plan caching strikes again, this sort of thing is not a concern in other DB's that reuse query plans.
It sometimes really stinks on some queries since the generic plan can't "see" the parameter values anymore. E.g. if you have an index on (customer_id, item_id) and run a query where `customer_id = $1 AND item_id = ANY($2)` ($2 is an array parameter), the generic query plan doesn't know how many elements are in the array and can decide to do an elaborate plan like a bitmap index scan instead of a nested loop join. I've seen the generic plan flip-flop in a situation like this and have a >100x load difference.
The plan cache is also per-connection, so you still have to plan a query multiple times. This is another reason why consolidating connections in PG is important.
0: https://www.postgresql.org/docs/current/runtime-config-query...
MSSQL server also does parameter sniffing now days and can have multiple plans based on the parameters values it also has a hint to guide or disable sniffing because many times a generic plan is actually better, again something else PG doesn't have, HINTS [2].
PG being process based per connection instead of thread based makes it much more difficult to share plans between connections and it also has no plan serialization ability. Where MSSQL can save plans to xml and they can be loaded on other servers and "frozen" to use that plan if desired, they can also be loaded into plan inspection tools that way as well [3].
1. https://learn.microsoft.com/en-us/sql/relational-databases/n...
2. https://learn.microsoft.com/en-us/sql/t-sql/queries/hints-tr...
3. https://learn.microsoft.com/en-us/sql/t-sql/queries/hints-tr...
The article suggests using a check constraint to get around that - are you saying that does actually check the underlying value when USING HASH is set? If so, I think the docs need updating.
Of course the hash index also outperforms a unique (btree) index on top of separately calculating the hash, in addition to the storage overhead, row bloat, lack of guarantees regarding the hash unless you expose it to Postgres as a user-defined function AND add a check constraint.
Is the syntax highlighting built into pgsql now or is that some other wrapper that provides that? (it looks really nice).
My only gripe with it is its insistence on adding a space after a line break when the query is too long, making copy/paste a pain for long queries.