Postgres is eating the database world (opens in new tab)

(medium.com)

281 pointsVonng2y ago142 comments

142 comments

Postgres is far from perfect:

- The codebase is old and huge, accruing some heavy technical debt, making it a less than ideal foundation for iterating quickly on a new paradigm like AI and vector databases.

- Some ancient design decisions have aged poorly, such as its one connection per process model, which is not as efficient as distributing async tasks over thread pools. If not mitigated through an external connection pooler you can easily have real production issues.

- Certain common use cases suffer from poor performance; for example, write amplification is a known issue. Many junior developers mistakenly believe they can simply update a timestamp or increment a field on a main table with numerous columns.

So, yes, PG is one of the best compromises available on the database market today. It's robust, offers good enough performance, and is feature-rich. However, I don't believe it can become the ONE database for all purposes.

Using a dedicated tool best suited for a specific use case still has its place; SQLite and DuckDB, for instance, are very different solutions with interesting trade-offs.

LunaSea2y ago

I believe that there are contributors currently working on a one thread per connection version of PostgreSQL. It's a huge amount of work so I wouldn't expect it to be released tomorrow.

Regarding wide updates, I believe that HOT updates already partially solve this problem.

hans_castorp2y ago

> Some ancient design decisions have aged poorly, such as its one connection per process model

Oracle uses the same model by default on Linux.

Since 19 (or maybe earlier) it is configurable though, but the default is still one process per connection if I'm not mistaken.

The_Colonel2y ago

It's actually quite impressive they were willing & able to make such a drastic change in such an old and conservative codebase.

hans_castorp2y ago

Not sure how drastic it was to be honest. Under Windows, multi-threading was always the only option. So in theory, they already had code for that.

avinassh2y ago

Can you tell me more about the write amplification issue?

CodesInChaos2y ago

Postgres handles updates as insert+delete, and its secondary indexes reference the physical location of the row, instead of the primary key. This means that whenever an update results in an insert to a different page, the index needs to be updated as well, even if the indexed column hasn't been modified.

At least it has an optimization that if the insert ends up in the same page, it won't need to update the index https://www.postgresql.org/docs/current/storage-hot.html

Replication has a similar amplification issue. Historically postgres has favored physical replication over per-row logical replication, that means that replication needs to transfer every modified page, including modified indexes, instead of just the new value of the modified row. (I think logical replication support has improved over the last couple of years).

There is the OrioleDB project, which attempts to improve on the design flaws in postgres's storage engine, but it's definitely not production ready yet.

sgarland2y ago

It’s worse than that, unfortunately – since the entire row has to be updated due to MVCC, if a single index is updated, then _all_ indexes are updated. If you have a wide table with a lot of indexes, each UPDATE is N*index writes. This becomes even worse with multiple updates if indexed columns aren’t k-sortable (like UUIDv4), since now the DB will probably have to jump around multiple pages despite the rows being logically sequential.

riku_iki2y ago

> new paradigm like AI and vector databases.

they have several ways to write extensions: extensions and fdw, so you can build your cool AI stuff without digging into PgSQL sources much.

jillesvangurp2y ago

> not to mention its ElasticSearch grade full-text search capabilities.

I played with postgresql a while ago to implement search. It's not horrible. But it's nowhere near Elasticsearch in terms of its capabilities. It's adequate for implementing very narrow use cases where search ranking really doesn't matter much (i.e. your revenue is not really impacted by poor precision and recall metrics). If your revenue does depend on that (e.g. because people buy stuff that they find on your website), you should be a bit more careful about monitoring your search performance and using the right tools to improve performance.

But for everything else you only have a handful of tools to work with to tune things. And what little there is is hard to use and kind of clunky. Great if that really is all you need and you know what you are doing but if you've used Elasticsearch and know how to use it properly you'll find your self missing quite a few things. Maybe some of those things will get added over time but for now it simply does not give you a lot to work with.

That being said, if you go down that path the trigram support in postgres is actually quite useful for implementing simple search. I went for that after trying the very clunky tsvector support and finding it very underwhelming for even the simplest of use cases. Trigrams are easier to deal with in postgres and you can implement some half decent ranking with it. Great for searching across product ids, names, and other short strings.

VonngOP2y ago

That's true for the kernel, How about extensions such as ParadeDB BM25 https://www.paradedb.com/ + PGroonga https://pgroonga.github.io/ + PG Bigm https://github.com/pgbigm/pg_bigm ?

kiwicopple2y ago

also with pg_trgm[0] (mentioned by OP) and pgvector for semantic search you have a pretty powerful search toolkit. for example, combining them for Hybrid Search [1]

[0] https://www.postgresql.org/docs/current/pgtrgm.html

[1] Reciprocal Ranked Fusion: https://supabase.com/docs/guides/ai/hybrid-search

mistrial92y ago

the difference is cached/pre-calc results in a (big on disk, expensive to compute) inverted index.. you cannot beat that at runtime

jillesvangurp2y ago

Still very limited and frankly all a bit low level primitives. Unless you are a search expert, you won't be able to do much productive with this stuff. If you are, it might fit a few use cases. But then, why limit yourself to just this stuff?

The point of that of course being that the target audience for this stuff is actually people that for whatever reason are a bit shy using the right tools for the right job here and are probably lacking a lot of expertise. The intersection of people with the expertise that would be happy with this narrow subset of functionality is just not a lot of people.

hans_castorp2y ago

> I played with postgresql a while ago to implement search. It's not horrible. But it's nowhere near Elasticsearch in terms of its capabilities.

The article is referring to the ParadeDB extension, not the built-in full text search

zachmu2y ago

We're writing a postgres-compatible database that doesn't use any postgres code:

https://github.com/dolthub/doltgresql/

We're doing this because our main product (Dolt) is MySQL-compatible, but a lot of people prefer postgres. Like, they really strongly prefer postgres. When figuring out how to support them, we basically had three options:

1) Foreign data wrapper. This doesn't work well because you can't use non-native stored procedure calls, which are used heavily throughout our product (e.g. CALL DOLT_COMMIT('-m', 'changes'), CALL DOLT_BRANCH('newBranch')). We would have had to invent a new UX surface area for the product just to support Postgres.

2) Fork postgres, write our own storage layer and parser extensions, etc. Definitely doable, but it would mean porting our existing Go codebase to C, and not being able to share code with Dolt as development continues. Or else rewriting Dolt in C, throwing out the last 5 years of work. Or doing something very complicated and difficult to use a golang library from C code.

3) Emulation. Keep Dolt's Go codebase and query engine and build a Postgres layer on top of it to support the syntax, wire protocol, types, functions, etc.

Ultimately we went with the emulation approach as the least bad option, but it's an uphill climb to get to enough postgres support to be worth using. Our main effort right now is getting all of postgres's types working.

atonse2y ago

I can totally understand this. Your feature set isn’t compelling enough for me to switch to MySQL when, as the original article states, Postgres basically does everything under the sun, well enough that I can keep my stack simple.

But if it allows me to use my existing Postgres tools, drivers, code, and knowledge then I’d consider it.

zachmu2y ago

Right. We're betting there are a bunch of people like you, hence the investment.

monero-xmr2y ago

Postgres is simply the best. One thing I would like however is the ability to have control over the query planner for specific tasks. There is a dark art to influencing the query planner, but essentially it is unpredictable, and postgres can get it consistently wrong in certain scenarios. If you could just enable a special query mode that gives you absolute control over the QP for that query, it would solve a major pain point.

I'm not a database developer, and last time I researched this (a few years ago) I found many good reasons for not enabling this from postgres contributors. But it would still be very useful.

VonngOP2y ago

pg_hint_plan —— Give PostgreSQL ability to manually force some decisions in execution plans. https://github.com/ossc-db/pg_hint_plan

justinclift2y ago

Cool, that looks super useful. Hadn't come across it before. :)

---

Just did a HN submission for it (https://news.ycombinator.com/item?id=39712211) now too.

oldandtired2y ago

The problem is not the query planner per se. There is a much more subtle problem and it is related to how you have created the query in the join structure.

For many queries, the order in which you specify the joins doesn't really matter. But there are a number of classes where the join order dramatically affects how fast the query can actually run and nothing the query planner does will change this.

I came across this problem around 30 years ago. By accident, I discovered what the problem cause was - the order of the joins. The original query was built and took 30 - 40 minutes to run. I deleted particular joins to see what intermediate results were. In reestablishing the joins, the query time went to down to a couple of seconds.

I was able to establish that the order of joins in this particular case was generating a Cartesian product of the original base records. By judicious reordering of the joins, this Cartesian product was avoided.

If you are aware of this kind of problem, you can solve it faster than any query planner ever could.

sgarland2y ago

Usually that kind of problem is a result of exceeding {from,join}_collapse_limit, which defaults to 8. If you have more tables than that in a query, Postgres doesn’t exhaustively try all ordering to determine the best, and instead uses its genetic algorithm, which can be worse.

You can raise the limit at the risk of causing query planning times going up exponentially, or refactor your schema, or, as you did, rewrite it to be more restrictive out of the gate. That way, those join paths will be found first and so will be the best found when the planner gives up.

tacone2y ago

Would love to see a practical example of this.

Another thing to consider is table fragmentation. Fragmentation > bad row count estimation > bad query plan.

afandian2y ago

We hit some weird query plans. I don't have forensic evidence but here's an example: https://crossref.gitlab.io/engineering/decision-records/dr-0...

Combination of two joins, filtering on all tables, and sorting by the LEFT one. Performance was fine, until we hit the scale when it suddenly became unpredictable.

In hindsight, given the variability of the queries and table structure, I don't think any query planner could have done a good job. The natural answer was to denormalize. But the journey to get there was a little unpredictable.

teaearlgraycold2y ago

I’m currently learning the basics of this. Currently struggling with multiple similar scenarios where switching from a left to an inner join, or any equivalent, kills performance. But these are aggregation queries so there are only 5 records returned. I could just filter in my app code no problem. But why the hell does adding “where foo.id is not null” in SQL make it O(N*M)??? CTEs are not helping.

RedShift12y ago

Have you checked with EXPLAIN ANALYZE VERBOSE?

teaearlgraycold2y ago

Yup, used a nice EXPLAIN GUI tool as well to try and help.

1 more reply

sgarland2y ago

It shouldn’t. At worst, if id doesn’t have a usable index, a sequential scan of foo, so O(N).

jaktet2y ago

I haven’t used Postgres but is this the issue you’re talking about? https://github.com/launchbadge/sqlx/pull/1539

monero-xmr2y ago

No this is a fundamental concept in postgres. If you do EXPLAIN ANALYZE on a query, you get the query plan, which is influenced by the query, indexes, table structure, etc. But the QP may decide to do a silly thing like a sequential scan where a better path exists, and adding an index to avoid the scan would be cost prohibitive. So if you could just override the QP and say "Use this index and do this type of sort and then this type of scan, damn the consequences!" it would give the query writer full control. I don't understand why you can't just force the system to follow a path - you as the developer pay for it regardless.

tacone2y ago

Nobody does actually. Postgres not having deterministic query plans is a big pain and a good reason not to use it. The same query may use different query plans depending on the estimated number of affected rows, very frustrating.

2 more replies

jaktet2y ago

Oh that’s unfortunate, thanks for explaining it. Postgres has been in my list to check out but haven’t done any personal projects that I’d need it for… yet

1 more reply

teaearlgraycold2y ago

I wonder if for non-trivial use cases we should just go back to imperative programming.

1 more reply

sgarland2y ago

IME, many query flips occur due to inadequate analysis, inadequate vacuuming, or table / index bloat.

Examining statistics for your tables / indices can be quite helpful in determining the issue.

hipadev232y ago

MySQL offers this via use/force index (…). Similar problem where the QP will inexplicably decide one day to make a query really slow and you gotta override it.

donatj2y ago

We have a couple queries where using the correct index they'll take milliseconds, using the wrong index it'll take minutes - and mysql occasionally enough to be a noticeable load on the database chooses the wrong one and we've specified "USE INDEX" even though I really hate having to do so.

Minor49er2y ago

I'm curious about the query and schema that you're using

fulafel2y ago

Someone who picked their tools with good tech judgement 25 years ago can be using the same today (eg PG, Python, Linux) without corporate control of them, it's pretty great.

vsnf2y ago

That feels a bit like hindsight talking. Linux perhaps, but were Python and Postgres really the obvious good judgement choices 25 years ago? Every other choice was poor judgement?

slyall2y ago

Well 25 years ago was pretty much (December 1998) when "LAMP"[1] was defined and that was originally Linux, Apache, MySQL and PHP.

So Postgres and Python were not the obvious choices back then.

[1] https://en.wikipedia.org/wiki/LAMP_(software_bundle)

Shish2k2y ago

In my personal experience of being around back then, postgres and python were still considered "technically better", but such a massive pain in the ass to install (especially on cheap shared hosting where it was often actually impossible to install) that only the most masochistic people would even try. I myself wrote a fastcgi implementation in PHP which would allow a web server which only supported php to call python under the hood and forward the inputs and outputs :P

It is kind of depressing that 25 years later, no other language has even attempted to compete with PHP in the “easy to get started on bargain-basement-tier shared web hosts” space D:

1 more reply

spacebanana72y ago

It's amazing how that's not a terrible choice of tech stack even today after 26 years.

Not fashionable by any means - but still feasible to maintain unlike Windows IIS / C++ or Oracle / Perl / Flash.

pg_12342y ago

Still, by 15 years ago Linux, Nginx, Postgres, Python had become a clearly better choice than the default LAMP stack for more complex applications.

1 more reply

taffer2y ago

Postgres really took off when Heroku became popular in the late 00s.

djupblue2y ago

Around 20 years ago when I was trying to decide on what database to use my requirement were that it should store data reliably. I learned that MySQL in contrast to PostgreSQL: 1. wasn't ACID 2. didn't have foreign key constraints 3. could loose/corrupt committed data (no WAL)

Despite me not knowing much about databases it seemed like an obvious choice.

Tostino2y ago

Hell, 10 years ago (in a multi database project) I got bit by MySQL not supporting check constraints, but returning successfully when I ran the create check constraint statements.

Postgres has been great all around.

1 more reply

fulafel2y ago

MySQL didn't have transactions (for many years after that), so PG over MySQL would have been the case for "good technical judgement" though maybe not the majority choice. For Python, maybe I'm biased - but you could have went for PHP in the argument if you swing that way and it still works.

layer82y ago

Neither of those three would have been particularly good judgement 25 years ago, without a crystal ball.

lmm2y ago

Linux is absolutely corporate controlled, sadly. Just look at how decisions like systemd and wayland get made.

remram2y ago

25 years ago... PostgreSQL 6, Python 1.5, Linux 2.2? I don't know if picking those tools was good judgement then...

fulafel2y ago

Why not?

TOMDM2y ago

Postgres is such a great tool.

The feature I'd love to see added that has been kicking around the mailing list for ages now would be incremental view maintenance.

Being able to keep moderately complex analysis workloads fresh in realtime would be such a boon.

Izkata2y ago

Materialized views with pg_cron to refresh them? (Or even just cron and your usual interface, if you don't want to install something extra)

TOMDM2y ago

This solves the same problem, just not as well, the idea of an incrementaly maintained view is that only the update needs to be computed, so a count will increment or decrement as rows are inserted or deleted.

It means complex views that could take minutes or hours to calculate from scratch can be kept fresh in realtime.

physicles2y ago

I’ve built a few of these in snowflake for time series data (with dbt, which is fine but I don’t love it). My feeling so far is that the incremental bits are kinda fiddly and often domain-specific. Is it possible to define a single “shape” that would solve >90% of these incremental refresh scenarios?

salojoo2y ago

https://github.com/sraoss/pg_ivm

serpix2y ago

https://github.com/sraoss/pg_ivm does what you need

pgwhalen2y ago

pg_ivm has a ton of restrictions though (as do other PG incremental view refresh mechanisms, like timescale hypertables).

That's why I assume it's not in vanilla postgres: adding the future with so many caveats would not make for a great experience for the full breadth of postgres users.

iTokio2y ago

Is this feature even possible? How can the database unravel a complex queries with derived data to minimal updates from its normalized parts?

The easy way is to rebuild everything if any “from table” as been modified.

The manual way is to create triggers that perform the minimal updates.

benjaminwootton2y ago

There are lots of projects who have managed to achieve this.

Streaming frameworks like Kafka Streams and Flink have incrementally updating tables in memory.

Materialize is built around the concept with a Postgres compatible API.

ClickHouse materialized views act like insert triggers which update when the base table is updated.

pgwhalen2y ago

I'm not familiar with ClickHouse materialized views, but the other tech you list (as I roughly understand them) seem more concerned with streaming SQL, which is a related but different end user experience from incrementally refreshed materialized views.

shpongled2y ago

I believe differential data flow techniques (Materialize, Noria) enable this kind of stuff

antifa2y ago

MERGE was interested in 15, if that's helpful: https://www.postgresql.org/docs/current/sql-merge.html

zozbot2342y ago

An "incremental view" is just an index over some custom query. So you're pretty much asking for improvements in Postgres' index support.

kingraoul2y ago

He means a materialized view, I believe.

dbacar2y ago

Did you used this in production, how do you rate it?

paulmd2y ago

In the same vein as “is your product a business, or is it just a feature”, Postgres has really raised the bar to “is your product a database or an index in postgres”. There’s a few databases that make compelling cases for their existence, like Cassandra or Elastic/Solr, but surprisingly many databases really don’t offer anything that can’t be replicated with a GIN or GIST on Postgres. It is the amorphous blob swallowing up your product and turning it into a feature. JSON handling or json column types, are no longer a distinctive feature anymore, for example.

And a surprising amount of other stuff (similar to lisp inner platforms) converges on half-hearted, poorly-implemented replications of Postgres features… in this world you either evolve to Cassandra/elastic or return to postgres.

(not saying one or the other is better, mind you… ;)

lmm2y ago

Postgres is still single-node-first, and while Citus exists I'm skeptical that it can ever become as easy to administer as a true HA-first datastore. For me the reason to use something like Cassandra or Kafka was never "big data" per se, it was having true master-master fault tolerance out of the box in a way that worked with everything.

AtlasBarfed2y ago

If you are going to multi-node Postgres, you need to start planning for Cassandra/Dynamo.

That is a BIG lift. Joins don't really practically scale at the Cassandra/Dynamo scale, because basically every row in the result set is subject to CAP uncertainty. "Big Data SQL" like Hive/Impala/Snowflake/Presto etc are more like approximations at true scale.

Relational DBMS is sort of storage-focused in the design and evolution: you figure out the tables you need to store the data in a sensible way. They you add views and indexes to optimize for view/retrieval.

Dyanmo/Cassandra is different, you start from the views/retrieval. That's why it is bad to start with these models for an application because you have not fully explored all your specific data structuring and access patterns/loads yet.

By the time Postgres hits the single node limits, you should know what your highest volume reads/writes are and how to structure a cassandra/dynamo table to specifically handle those read/writes.

sgarland2y ago

> Postgres… Kafka… Cassandra

These are all wildly different products that should not be considered for the same purposes.

user_of_the_wek2y ago

But that's the gist of the article here, right? That Postgres is taking over all db-like use cases. It doesn't claim that it can replace Kafka but https://www.amazingcto.com/postgres-for-everything/ certainly does. Of course it's not a full replacement, but it might be good enough.

lmm2y ago

If your premise is that Postgres is eating the datastore world, then you're talking about using it as a replacement for Kafka and Cassandra.

Frankly if you zoom out far enough they're all systems suitable for use as your primary online datastore that you build your application on (each with their own caveats of course). There are places where they compete.

VonngOP2y ago

Patroni has native HA support for citus horizontal cluster since v3. Which means your can create a HA citus cluster as simple as: https://pigsty.io/docs/pgsql/config/#citus-cluster

dbacar2y ago

That looks like apples vs oranges.

riku_iki2y ago

You should look at CocroachDB: it tries to be PG compatible (not there yet), but true HA-first.

derekperkins2y ago

At often 10x the latency and significant query compatibility issues. That's a hard pill to swallow

riku_iki2y ago

latency increase is likely payment for distributed consistency regardless of specific DB: doing consensus between nodes is much slower than dumping block of data on local nvme.

ksec2y ago

>In 2024, a single x86 machine can have 512 cores

I am not aware of such machine in a single Node unless it is talking about vCPU / Thread. Intel Sierra Forest 288 Core doesn't do dual socket option. So I have no idea where the 512 x86 core came from.

zanoab2y ago

>This year’s new EPYC 9754 goes even further, offering a single CPU with 128 cores and 256 threads. This means a standard dual-socket server could have an astonishing 512 cores!

The author pulled it from an article linked in the previous sentence. The numbers don't even add up unless it was a mistake or I'm missing something.

Cesura2y ago

I think the terminology is a bit muddied here because AMD has historically referred to physical cores as "modules" and logical cores as "cores" (although their current spec sheet [1] seems to use "cores" and "threads" in the way that most understand them).

So in a dual-socket setup, 2 x EPYC 9754 would indeed yield 512 threads (logical cores), which are backed by 256 physical cores.

[1] https://www.amd.com/en/products/cpu/amd-epyc-9754

thisgoodlife2y ago

Did a quick search and found this 480-core server. Not exactly 512, but not far off. https://lenovopress.lenovo.com/lp1729-thinksystem-sr950-v3-s...

issung2y ago

Alright I'm convinced, I'm using postgres for my next project. Anyone have any experiences with it and Entity Framework Core?

blencdr2y ago

I'm using EF Core with SQL Server and PostgreSQL for two different production projects; it works like a charm, and the performance is great.

All recent projects in my company are PostgreSQL based (> 2000 production applications), and we have far fewer troubles with PostgreSQL than with Oracle, not to mention the licensing.

caleblloyd2y ago

I have used it in multiple projects and it works great. The Npgsql.EntityFrameworkCore.PostgreSQL provider is always quickly updated when new versions of EF Core are shipped, and its primary maintainer @roji is also extremely talented and responsive. All around has been a pleasure to work with!

danielovichdk2y ago

Coming from the MS ecosystem i can definitely vouch for EF core, but I am more of a Dapper person myself.

But I don't use PG because my needs are not so heavy nor have I the data sizes or advanced features that makes PG a more reliable choice over sqllite.

taspeotis2y ago

Npgsql has been good for me, although each major release there are a few teething issues.

riperoni2y ago

I am using it in a hobby code first project. Had no issues with it, but my data model is admittedly not complex and the the data is small in size.

thrixton2y ago

Well supported in EF core, great experience. Having said that, I’ve not used EF core with any other DB so have nothing to compare to.

JackMorgan2y ago

I'm managing two medium sized projects on that stack, using both EF core and Dapper. I'm about to switch a third over from Oracle because I'm so tired of all the issues with Oracle.

ksec2y ago

>As DuckDB’s manifesto “Big Data is Dead” suggests, the era of big data is over.

I have been stating this since at least 2020 if not earlier.

We are expecting DDR6 and PCI-E 7.0 Spec to be finalised by 2025. You could expect them to be on market by no later than 2027. Although I believe we have reach the SSD IOPS limits without some special SSD with Z-NAND. I assume ( I could be wrong ) this makes SSD bandwidth on Server less important. In terms of TSMC Roadmap that is about 1.4nm or 14A. Although in server sector they will likely be on 2nm. Hopefully we should have 800Gbps Ethernet by then with ConnectX Card support. ( I want to see the Netflix FreeBSD serving 1.6Tbps update )

We then have software and DB that is faster and simpler to scale. What used to be a huge cluster of computer that is mentally hard to comprehend, is now just a single computer or a few larger server doing its job.

There is 802.3dj 1.6Tbps Ethernet looking at competition on 2026. Although product coming through to market tends to take much longer compared to Memory and PCI-Express.

AMD Zen6C in ~2025 / 2026 with 256 Core per Socket, on Dual Socket System that is 512 Core or 1024 vCPU / Thread.

The future is exciting.

riku_iki2y ago

> As DuckDB’s manifesto “Big Data is Dead” suggests, the era of big data is over.

yet, their db can't handle many cases where data doesn't fit into memory, and PgSQL always does large writes in single thread..

CyberDildonics2y ago

I'm not sure what your point is here, it seems like you are just listing off announced hardware.

AtlasBarfed2y ago

The threshold for Cassandra / Dynamo scaling is increasing is probably the only point. "Big data is dead" is pretty stupid to say, typical clickbait marketing by a database that will probably be chucked away by something else trendy in another year.

But at a certain point, a 10,000 core 5 petabyte single megamachine starts to practically encounter CAP from the internal scale alone. It already ... kind of ... does.

And no matter how big your node scales, if you need to globally replicate data ... you have to globally replicate it over a network, and you need Cassandra (DynamoDB global replication is shady last I looked at it, I have no idea how row-level timestamps can merge-resolve conflicting rows updated in separate global regions)

ksec2y ago

The point is Big Data or Hard to Scale aren't as much of a thing with Hardware technology moving forward.

patrickdavey2y ago

I have a handful of sites I run on a VPS with a basic setup, including MySQL.

One thing I've always liked about MySQL is that it pretty much looks after itself, whereas with Postgres I've had issues before doing upgrades (this was with brew though) and I'm not clear on whether it looks after itself for vacuuming etc.

Should I just give it a go the next time I'm upgrading? It does seem like a tool I need to get familiar with.

elp2y ago

Its got a lot better over the years.

25+ years ago MySQL was fast and easy to admin but didn't have rollback and a bunch of other features. At the same time Postgres had the features but was horrible for performance and usability. Those days are LONG gone. Mysql obviously has all the features and PG is great to admin and the auto-vacuum works well out of the box.

I run a bunch of clusters of pg servers around the world and they need almost no maintenance. In place upgrades without needing to go the dump/restore route work well, 5 minutes on a TB sized database, just make very VERY sure you do a reindex afterwards or you will be in a world of pain.

jesterson2y ago

> PG is great to admin

What do you use for it? Is there anything like phpmyadmin for postgres with similar simplicity?

monospaced2y ago

Coming myself from MySQL to Postgres I found PgAdmin (https://www.pgadmin.org/screenshots/#7) easy to use

2 more replies

elp2y ago

Pretty much the same as mysql. https://wiki.postgresql.org/wiki/PostgreSQL_Clients#Web_Clie...

mixmastamyk2y ago

The price is right and auto-vacuuming is a thing.

Macha2y ago

Postgres updates are definitely a pain. MySQL is usually just a matter of upgrading the package and restarting the server for the projects I run, but postgres is a full dump and import process.

scientist43972y ago

I manage many PostgreSQL databases, the only time it was a pain, it was due to PostGIS upgrade but not the postgresql cluster itself…

You don’t need to dump and re import the database since a long long time…

swcode2y ago

Great article! We also tried lots of databases in the past at SWCode, but then ended up using Postgres for almost all our usecases. There really must be a good argument for using something else which can‘t be done with some Postgres extension.

artyom2y ago

Yes, the title is click-baity. Yes, Postgres isn't perfect and not the "best" choice for every possible use case in the universe.

But Postgres is a work of art, and compared to all the other relational database options, if it's ultimately crowned the king of them all, it'd be well deserved.

I'd also say that the PG protocol and the extensions ecosystem are as important as the database engine.

jpalomaki2y ago

I would like to keep my data in Postgres (OLTP purposes), but run analytical queries against the same datafiles in Snowflake/DuckDB fashion.

The analytics part should scale independently. Often this is only needed occasionally, so scale-to-zero (like Snowflake) would be great.

derekperkins2y ago

You have been able to do that with DuckDB for 18 months. https://duckdb.org/2022/09/30/postgres-scanner.html

Clickhouse supports it too https://clickhouse.com/docs/en/sql-reference/table-functions...

thinkerswell2y ago

Is there an extension that can make it compete with TigerBeetle for transaction processing speed?

LAC-Tech2y ago

That would surely be impossible. TB is designed from the ground up to do one thing and do that one thing well. Postgres is a swiss army knife, and TB is a knife.

thinkerswell2y ago

Yes I think this is the one area where specialized DBs will win over PG

thinkerswell2y ago

Why downvoted? Genuine question.

NorwegianDude2y ago

This post was very wrong and misleading on multiple points.

I have seen a lot of people praising Postgres over e.g. MariaDB. But more often than not it seems to be people how lack knowledge.

Take this linked post, where the author points out "The untuned PostgreSQL performs poorly (x1050)" later followed by "This performance can’t be considered bad, especially compared to pure OLTP databases like MySQL and MariaDB (x3065, x19700)".

Frist of all, those are not pure OLTP databases. And if the author took a better look at the benchmark he would see that MariaDB using ColumnStore is at x98. That's 10x the performance of Postgres out of the box, and 200x faster than the author stated.

VonngOP2y ago

How about: PostgreSQL tuned (x47). PostgreSQL + Hydra Extension (x42) PostgreSQL + ParadeDB Extension (x10.7)

givemeethekeys2y ago

Having used Postgres for many projects, yet never having used any of the other tools in the ecosystem, I'm surprised by how many tools there are!

How does one go about finding paying customers when developing a new database tool? How does one figure out the size of the market, and pricing structure?

elliotwagner2y ago

I truly like postgres

throwawaaarrgh2y ago

It's not a best practice, it's a fad. 99% of people who recommend or use Postgres barely know how to use it. Another trendy database will come along and you'll stop seeing all these posts about it. Happens every decade. I'll link back to this post in a few years with "I told you so".

dagw2y ago

Another trendy database will come along and you'll stop seeing all these posts about it. Happens every decade.

And then after a couple of years people will realise that Postgres can do everything the trendy database can do and come back to Postgres. Happens every decade. This is at least 'hype cycle' 3 for Postgres since I started my career.

renegade-otter2y ago

It’s been 40 years for Postgres. A database is not some trendy ReactJS library.

docc2y ago

Correct, but a lot of pieple treat it that way.

sgarland2y ago

> 99% of people who recommend or use Postgres barely know how to use it.

You're not wrong here, although you could just as easily say "99% of people who recommend $DB barely know how to use it."

Databases remain a mysterious black box to entirely too many people, despite the three largest (SQLite, Postgres, MySQL) being open source, and having extensive documentation.

I've come to the conclusion that most devs don't care about infra in the slightest, and view a DB as a place to stick data. When it stops working like they want, they shrug and upsize the instance. This is infuriating to me, because it's the equivalent of me pushing a PR to implement bogosort, and when told that it's suboptimal, dismissing the criticism and arguing that infra just needs to allocate more cores.

renegade-otter2y ago

I feel this on a soul level. I wrote about it: https://renegadeotter.com/2023/11/12/your-database-skills-ar...

sgarland2y ago

My god, are you me? I also thoroughly enjoyed this diatribe from [0]:

> First, a whole army of developers writing JavaScript for the browser started self-identifying as “full-stack”, diving into server development and asynchronous code.

> ...early JavaScript was a deeply problematic choice for server development. Pointing this out to still green server-side developers usually resulted in a lot of huffing and puffing.

[0]: https://renegadeotter.com/2023/09/10/death-by-a-thousand-mic...

dzonga2y ago

that was a profound article. thanks

coldtea2y ago

You're off by 30 years already...

j / k navigate · click thread line to collapse

142 comments

egnehots2y ago

Postgres is far from perfect:

- The codebase is old and huge, accruing some heavy technical debt, making it a less than ideal foundation for iterating quickly on a new paradigm like AI and vector databases.

Using a dedicated tool best suited for a specific use case still has its place; SQLite and DuckDB, for instance, are very different solutions with interesting trade-offs.

LunaSea2y ago

I believe that there are contributors currently working on a one thread per connection version of PostgreSQL. It's a huge amount of work so I wouldn't expect it to be released tomorrow.

Regarding wide updates, I believe that HOT updates already partially solve this problem.

hans_castorp2y ago

> Some ancient design decisions have aged poorly, such as its one connection per process model

Oracle uses the same model by default on Linux.

Since 19 (or maybe earlier) it is configurable though, but the default is still one process per connection if I'm not mistaken.

The_Colonel2y ago

It's actually quite impressive they were willing & able to make such a drastic change in such an old and conservative codebase.

hans_castorp2y ago

Not sure how drastic it was to be honest. Under Windows, multi-threading was always the only option. So in theory, they already had code for that.

avinassh2y ago

Can you tell me more about the write amplification issue?

CodesInChaos2y ago

At least it has an optimization that if the insert ends up in the same page, it won't need to update the index https://www.postgresql.org/docs/current/storage-hot.html

There is the OrioleDB project, which attempts to improve on the design flaws in postgres's storage engine, but it's definitely not production ready yet.

sgarland2y ago

riku_iki2y ago

> new paradigm like AI and vector databases.

they have several ways to write extensions: extensions and fdw, so you can build your cool AI stuff without digging into PgSQL sources much.

jillesvangurp2y ago

> not to mention its ElasticSearch grade full-text search capabilities.

VonngOP2y ago

That's true for the kernel, How about extensions such as ParadeDB BM25 https://www.paradedb.com/ + PGroonga https://pgroonga.github.io/ + PG Bigm https://github.com/pgbigm/pg_bigm ?

kiwicopple2y ago

also with pg_trgm[0] (mentioned by OP) and pgvector for semantic search you have a pretty powerful search toolkit. for example, combining them for Hybrid Search [1]

[0] https://www.postgresql.org/docs/current/pgtrgm.html

[1] Reciprocal Ranked Fusion: https://supabase.com/docs/guides/ai/hybrid-search

mistrial92y ago

the difference is cached/pre-calc results in a (big on disk, expensive to compute) inverted index.. you cannot beat that at runtime

jillesvangurp2y ago

hans_castorp2y ago

> I played with postgresql a while ago to implement search. It's not horrible. But it's nowhere near Elasticsearch in terms of its capabilities.

The article is referring to the ParadeDB extension, not the built-in full text search

zachmu2y ago

We're writing a postgres-compatible database that doesn't use any postgres code:

https://github.com/dolthub/doltgresql/

3) Emulation. Keep Dolt's Go codebase and query engine and build a Postgres layer on top of it to support the syntax, wire protocol, types, functions, etc.

atonse2y ago

But if it allows me to use my existing Postgres tools, drivers, code, and knowledge then I’d consider it.

zachmu2y ago

Right. We're betting there are a bunch of people like you, hence the investment.

monero-xmr2y ago

I'm not a database developer, and last time I researched this (a few years ago) I found many good reasons for not enabling this from postgres contributors. But it would still be very useful.

VonngOP2y ago

pg_hint_plan —— Give PostgreSQL ability to manually force some decisions in execution plans. https://github.com/ossc-db/pg_hint_plan

justinclift2y ago

Cool, that looks super useful. Hadn't come across it before. :)

---

Just did a HN submission for it (https://news.ycombinator.com/item?id=39712211) now too.

oldandtired2y ago

The problem is not the query planner per se. There is a much more subtle problem and it is related to how you have created the query in the join structure.

If you are aware of this kind of problem, you can solve it faster than any query planner ever could.

sgarland2y ago

tacone2y ago

Would love to see a practical example of this.

Another thing to consider is table fragmentation. Fragmentation > bad row count estimation > bad query plan.

afandian2y ago

We hit some weird query plans. I don't have forensic evidence but here's an example: https://crossref.gitlab.io/engineering/decision-records/dr-0...

Combination of two joins, filtering on all tables, and sorting by the LEFT one. Performance was fine, until we hit the scale when it suddenly became unpredictable.

teaearlgraycold2y ago

RedShift12y ago

Have you checked with EXPLAIN ANALYZE VERBOSE?

teaearlgraycold2y ago

Yup, used a nice EXPLAIN GUI tool as well to try and help.

1 more reply

sgarland2y ago

It shouldn’t. At worst, if id doesn’t have a usable index, a sequential scan of foo, so O(N).

jaktet2y ago

I haven’t used Postgres but is this the issue you’re talking about? https://github.com/launchbadge/sqlx/pull/1539

monero-xmr2y ago

tacone2y ago

2 more replies

jaktet2y ago

Oh that’s unfortunate, thanks for explaining it. Postgres has been in my list to check out but haven’t done any personal projects that I’d need it for… yet

1 more reply

teaearlgraycold2y ago

I wonder if for non-trivial use cases we should just go back to imperative programming.

1 more reply

sgarland2y ago

IME, many query flips occur due to inadequate analysis, inadequate vacuuming, or table / index bloat.

Examining statistics for your tables / indices can be quite helpful in determining the issue.

hipadev232y ago

MySQL offers this via use/force index (…). Similar problem where the QP will inexplicably decide one day to make a query really slow and you gotta override it.

donatj2y ago

Minor49er2y ago

I'm curious about the query and schema that you're using

fulafel2y ago

Someone who picked their tools with good tech judgement 25 years ago can be using the same today (eg PG, Python, Linux) without corporate control of them, it's pretty great.

vsnf2y ago

That feels a bit like hindsight talking. Linux perhaps, but were Python and Postgres really the obvious good judgement choices 25 years ago? Every other choice was poor judgement?

slyall2y ago

Well 25 years ago was pretty much (December 1998) when "LAMP"[1] was defined and that was originally Linux, Apache, MySQL and PHP.

So Postgres and Python were not the obvious choices back then.

[1] https://en.wikipedia.org/wiki/LAMP_(software_bundle)

Shish2k2y ago

It is kind of depressing that 25 years later, no other language has even attempted to compete with PHP in the “easy to get started on bargain-basement-tier shared web hosts” space D:

1 more reply

spacebanana72y ago

It's amazing how that's not a terrible choice of tech stack even today after 26 years.

Not fashionable by any means - but still feasible to maintain unlike Windows IIS / C++ or Oracle / Perl / Flash.

pg_12342y ago

Still, by 15 years ago Linux, Nginx, Postgres, Python had become a clearly better choice than the default LAMP stack for more complex applications.

1 more reply

taffer2y ago

Postgres really took off when Heroku became popular in the late 00s.

djupblue2y ago

Despite me not knowing much about databases it seemed like an obvious choice.

Tostino2y ago

Hell, 10 years ago (in a multi database project) I got bit by MySQL not supporting check constraints, but returning successfully when I ran the create check constraint statements.

Postgres has been great all around.

1 more reply

fulafel2y ago

layer82y ago

Neither of those three would have been particularly good judgement 25 years ago, without a crystal ball.

lmm2y ago

Linux is absolutely corporate controlled, sadly. Just look at how decisions like systemd and wayland get made.

remram2y ago

25 years ago... PostgreSQL 6, Python 1.5, Linux 2.2? I don't know if picking those tools was good judgement then...

fulafel2y ago

Why not?

TOMDM2y ago

Postgres is such a great tool.

The feature I'd love to see added that has been kicking around the mailing list for ages now would be incremental view maintenance.

Being able to keep moderately complex analysis workloads fresh in realtime would be such a boon.

Izkata2y ago

Materialized views with pg_cron to refresh them? (Or even just cron and your usual interface, if you don't want to install something extra)

TOMDM2y ago

It means complex views that could take minutes or hours to calculate from scratch can be kept fresh in realtime.

physicles2y ago

salojoo2y ago

https://github.com/sraoss/pg_ivm

serpix2y ago

https://github.com/sraoss/pg_ivm does what you need

pgwhalen2y ago

pg_ivm has a ton of restrictions though (as do other PG incremental view refresh mechanisms, like timescale hypertables).

That's why I assume it's not in vanilla postgres: adding the future with so many caveats would not make for a great experience for the full breadth of postgres users.

iTokio2y ago

Is this feature even possible? How can the database unravel a complex queries with derived data to minimal updates from its normalized parts?

The easy way is to rebuild everything if any “from table” as been modified.

The manual way is to create triggers that perform the minimal updates.

benjaminwootton2y ago

There are lots of projects who have managed to achieve this.

Streaming frameworks like Kafka Streams and Flink have incrementally updating tables in memory.

Materialize is built around the concept with a Postgres compatible API.

ClickHouse materialized views act like insert triggers which update when the base table is updated.

pgwhalen2y ago

shpongled2y ago

I believe differential data flow techniques (Materialize, Noria) enable this kind of stuff

antifa2y ago

MERGE was interested in 15, if that's helpful: https://www.postgresql.org/docs/current/sql-merge.html

zozbot2342y ago

An "incremental view" is just an index over some custom query. So you're pretty much asking for improvements in Postgres' index support.

kingraoul2y ago

He means a materialized view, I believe.

dbacar2y ago

Did you used this in production, how do you rate it?

paulmd2y ago

(not saying one or the other is better, mind you… ;)

lmm2y ago

AtlasBarfed2y ago

If you are going to multi-node Postgres, you need to start planning for Cassandra/Dynamo.

By the time Postgres hits the single node limits, you should know what your highest volume reads/writes are and how to structure a cassandra/dynamo table to specifically handle those read/writes.

sgarland2y ago

> Postgres… Kafka… Cassandra

These are all wildly different products that should not be considered for the same purposes.

user_of_the_wek2y ago

lmm2y ago

If your premise is that Postgres is eating the datastore world, then you're talking about using it as a replacement for Kafka and Cassandra.

VonngOP2y ago

Patroni has native HA support for citus horizontal cluster since v3. Which means your can create a HA citus cluster as simple as: https://pigsty.io/docs/pgsql/config/#citus-cluster

dbacar2y ago

That looks like apples vs oranges.

riku_iki2y ago

You should look at CocroachDB: it tries to be PG compatible (not there yet), but true HA-first.

derekperkins2y ago

At often 10x the latency and significant query compatibility issues. That's a hard pill to swallow

riku_iki2y ago

latency increase is likely payment for distributed consistency regardless of specific DB: doing consensus between nodes is much slower than dumping block of data on local nvme.

ksec2y ago

>In 2024, a single x86 machine can have 512 cores

zanoab2y ago

>This year’s new EPYC 9754 goes even further, offering a single CPU with 128 cores and 256 threads. This means a standard dual-socket server could have an astonishing 512 cores!

The author pulled it from an article linked in the previous sentence. The numbers don't even add up unless it was a mistake or I'm missing something.

Cesura2y ago

So in a dual-socket setup, 2 x EPYC 9754 would indeed yield 512 threads (logical cores), which are backed by 256 physical cores.

[1] https://www.amd.com/en/products/cpu/amd-epyc-9754

thisgoodlife2y ago

Did a quick search and found this 480-core server. Not exactly 512, but not far off. https://lenovopress.lenovo.com/lp1729-thinksystem-sr950-v3-s...

issung2y ago

Alright I'm convinced, I'm using postgres for my next project. Anyone have any experiences with it and Entity Framework Core?

blencdr2y ago

I'm using EF Core with SQL Server and PostgreSQL for two different production projects; it works like a charm, and the performance is great.

All recent projects in my company are PostgreSQL based (> 2000 production applications), and we have far fewer troubles with PostgreSQL than with Oracle, not to mention the licensing.

caleblloyd2y ago

danielovichdk2y ago

Coming from the MS ecosystem i can definitely vouch for EF core, but I am more of a Dapper person myself.

But I don't use PG because my needs are not so heavy nor have I the data sizes or advanced features that makes PG a more reliable choice over sqllite.

taspeotis2y ago

Npgsql has been good for me, although each major release there are a few teething issues.

riperoni2y ago

I am using it in a hobby code first project. Had no issues with it, but my data model is admittedly not complex and the the data is small in size.

thrixton2y ago

Well supported in EF core, great experience. Having said that, I’ve not used EF core with any other DB so have nothing to compare to.

JackMorgan2y ago

I'm managing two medium sized projects on that stack, using both EF core and Dapper. I'm about to switch a third over from Oracle because I'm so tired of all the issues with Oracle.

ksec2y ago

>As DuckDB’s manifesto “Big Data is Dead” suggests, the era of big data is over.

I have been stating this since at least 2020 if not earlier.

There is 802.3dj 1.6Tbps Ethernet looking at competition on 2026. Although product coming through to market tends to take much longer compared to Memory and PCI-Express.

AMD Zen6C in ~2025 / 2026 with 256 Core per Socket, on Dual Socket System that is 512 Core or 1024 vCPU / Thread.

The future is exciting.

riku_iki2y ago

> As DuckDB’s manifesto “Big Data is Dead” suggests, the era of big data is over.

yet, their db can't handle many cases where data doesn't fit into memory, and PgSQL always does large writes in single thread..

CyberDildonics2y ago

I'm not sure what your point is here, it seems like you are just listing off announced hardware.

AtlasBarfed2y ago

But at a certain point, a 10,000 core 5 petabyte single megamachine starts to practically encounter CAP from the internal scale alone. It already ... kind of ... does.

ksec2y ago

The point is Big Data or Hard to Scale aren't as much of a thing with Hardware technology moving forward.

patrickdavey2y ago

I have a handful of sites I run on a VPS with a basic setup, including MySQL.

Should I just give it a go the next time I'm upgrading? It does seem like a tool I need to get familiar with.

elp2y ago

Its got a lot better over the years.

jesterson2y ago

> PG is great to admin

What do you use for it? Is there anything like phpmyadmin for postgres with similar simplicity?

monospaced2y ago

Coming myself from MySQL to Postgres I found PgAdmin (https://www.pgadmin.org/screenshots/#7) easy to use

2 more replies

elp2y ago

Pretty much the same as mysql. https://wiki.postgresql.org/wiki/PostgreSQL_Clients#Web_Clie...

mixmastamyk2y ago

The price is right and auto-vacuuming is a thing.

Macha2y ago

Postgres updates are definitely a pain. MySQL is usually just a matter of upgrading the package and restarting the server for the projects I run, but postgres is a full dump and import process.

scientist43972y ago

I manage many PostgreSQL databases, the only time it was a pain, it was due to PostGIS upgrade but not the postgresql cluster itself…

You don’t need to dump and re import the database since a long long time…

swcode2y ago

artyom2y ago

Yes, the title is click-baity. Yes, Postgres isn't perfect and not the "best" choice for every possible use case in the universe.

But Postgres is a work of art, and compared to all the other relational database options, if it's ultimately crowned the king of them all, it'd be well deserved.

I'd also say that the PG protocol and the extensions ecosystem are as important as the database engine.

jpalomaki2y ago

I would like to keep my data in Postgres (OLTP purposes), but run analytical queries against the same datafiles in Snowflake/DuckDB fashion.

The analytics part should scale independently. Often this is only needed occasionally, so scale-to-zero (like Snowflake) would be great.

derekperkins2y ago

You have been able to do that with DuckDB for 18 months. https://duckdb.org/2022/09/30/postgres-scanner.html

Clickhouse supports it too https://clickhouse.com/docs/en/sql-reference/table-functions...

thinkerswell2y ago

Is there an extension that can make it compete with TigerBeetle for transaction processing speed?

LAC-Tech2y ago

That would surely be impossible. TB is designed from the ground up to do one thing and do that one thing well. Postgres is a swiss army knife, and TB is a knife.

thinkerswell2y ago

Yes I think this is the one area where specialized DBs will win over PG

thinkerswell2y ago

Why downvoted? Genuine question.

NorwegianDude2y ago

This post was very wrong and misleading on multiple points.

I have seen a lot of people praising Postgres over e.g. MariaDB. But more often than not it seems to be people how lack knowledge.

VonngOP2y ago

How about: PostgreSQL tuned (x47). PostgreSQL + Hydra Extension (x42) PostgreSQL + ParadeDB Extension (x10.7)

givemeethekeys2y ago

Having used Postgres for many projects, yet never having used any of the other tools in the ecosystem, I'm surprised by how many tools there are!

How does one go about finding paying customers when developing a new database tool? How does one figure out the size of the market, and pricing structure?

elliotwagner2y ago

I truly like postgres

throwawaaarrgh2y ago

dagw2y ago

Another trendy database will come along and you'll stop seeing all these posts about it. Happens every decade.

renegade-otter2y ago

It’s been 40 years for Postgres. A database is not some trendy ReactJS library.

docc2y ago

Correct, but a lot of pieple treat it that way.

sgarland2y ago

> 99% of people who recommend or use Postgres barely know how to use it.

You're not wrong here, although you could just as easily say "99% of people who recommend $DB barely know how to use it."

Databases remain a mysterious black box to entirely too many people, despite the three largest (SQLite, Postgres, MySQL) being open source, and having extensive documentation.

renegade-otter2y ago

I feel this on a soul level. I wrote about it: https://renegadeotter.com/2023/11/12/your-database-skills-ar...

sgarland2y ago

My god, are you me? I also thoroughly enjoyed this diatribe from [0]:

> First, a whole army of developers writing JavaScript for the browser started self-identifying as “full-stack”, diving into server development and asynchronous code.

> ...early JavaScript was a deeply problematic choice for server development. Pointing this out to still green server-side developers usually resulted in a lot of huffing and puffing.

[0]: https://renegadeotter.com/2023/09/10/death-by-a-thousand-mic...

dzonga2y ago

that was a profound article. thanks

coldtea2y ago

You're off by 30 years already...

j / k navigate · click thread line to collapse