Databases in 2021: A Year in Review (opens in new tab)

(ottertune.com)

323 pointsjameslao4y ago130 comments

130 comments

Postgres's dominance is well deserved, of course. My only concerns with it, both are actively worked on, are bloat management (significant for update heavy workloads and programmers used to the MySQL model of rollback segments) and the scaling of concurrency (going over 500 connections). Bloat was taken over by Cybertec[1] after stalling for a bit and is funded (yay), while concurrency was also enhanced out of Microsoft [2]. All in all, an excellent future for our beloved Postgres.

[1] https://github.com/cybertec-postgresql/zheap [2] https://techcommunity.microsoft.com/t5/azure-database-for-po...

newlisp4y ago

Another concern, no temporal tables, don't businesses demand this feature?

sa464y ago

In Postgres land, I think most businesses work around temporal tables with audit tables using triggers to dump jsonb or hstore. I wrote up how I used table-inheritance here [1].

I agree with your point. Postgres is starting to stick out compared to alternatives:

- MS SQL supports uni-temporal tables using system time.

- Snowflake has time travel which acts like temporal tables but with a limited retention window. Seems more like a restore mechanism.

- MariaDB has system-versioned tables (doesn't look like it's in MySQL).

- Cockroach DB has uni-temporal support with system time but limited to the garbage collection period. The docs indicate you don't want a long garbage collection period since all versions are stored in a single range.

- Oracle seems to have the best temporal support with their flashback tech. But it's hard to read between the lines to figure out what it actually does.

[1]: https://news.ycombinator.com/item?id=29010446

manigandham4y ago

I've never seen a business actually use them, large or small. Any auditing requirements are usually fed from other sources, like Kafka event streams, files on S3, or a OLAP data warehouse.

1 more reply

code_biologist4y ago

Would love to see wider support for temporal tables, but application level approaches like https://github.com/jazzband/django-simple-history have worked for the business issues I have.

p_l4y ago

Interestingly enough, Postgres used to have time travel in tables before MVCC transactions were added. Apparently it wasn't exactly used feature.

roenxi4y ago

Although temporal tables are a really good idea; it is possible to get away without them being a first class feature. They aren't hard to mimic if you can give up the guarantee of catching every detail. In an ideal world (ha ha, silly thought) the tables would be designed to be append-only anyway, or the amount of data would be significant. Both of which make temporal tables somewhat moot.

1 more reply

ivank4y ago

I use https://github.com/xocolatl/periods for this to some success.

srcreigh4y ago

What do temporal tables do that good queries don't?

1 more reply

nightpool4y ago

i’ve very rarely found that using a full temporal table is the right choice for online analysis—a dedicated schema serves you better in the long run and helps you design your indexes, etc appropriately. For compliance, PIT backups via WAL shipping should suffice, no?

lenkite4y ago

I wish Postgres was more SQL standards compliant. Stuff like using `nextval()` instead of `NEXT VALUE` in SQL sequences is a pain.

nicoburns4y ago

Is zheap definitely still an active project? Last commit seems to be Oct 2020

phonon4y ago

Looks like it's still being actively worked on.

https://www.cybertec-postgresql.com/en/zheap-undo-logs-disca...

https://github.com/cybertec-postgresql/postgres/tree/zheap_u...

srcreigh4y ago

Clustered indexes?

zffr4y ago

The author is a professor at CMU who specializes in databases: https://www.cs.cmu.edu/~pavlo/

Not completely related, but his lectures on databases on YouTube are really good. Much better than the DB class I had at college.

adamkl4y ago

Said lectures on YouTube: https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_...

A great way to learn more about the inner workings of databases, and entertaining too.

Another choice quote (from one of his lectures):

“There’s only two things I care about in life:

1. My wife

2. Databases

I don’t give a f#ck about anything else”

_the_inflator4y ago

The author is hilarious! Quote from his article: “I even broke up with a girlfriend once because of sloppy benchmark results.”

thejosh4y ago

I'm really excited by all the database love in the last few years. I moved to PG from MySQL in 2014 and don't regret it since.

Timescaledb looks very exciting, as it's "just" a PG extension, but their compression work looks great. [0]

I'm also really loving clickhouse, but haven't deployed that to production yet (haven't had the need to yet, almost did for an apache arrow reading thing, but didn't end up using arrow). They do some amazing things there, and the work they do is crazy impressive and fast. Reading their changelog they power through things.

[0] https://docs.timescale.com/timescaledb/latest/how-to-guides/...

threeseed4y ago

So a company that sells PostgreSQL services thinks PostgreSQL is dominating. Brilliant.

The reality is that nothing is dominating. In 2021 there were more databases than ever each addressing a different use case. Companies don't have just one EDW they will have dozens even hundreds of siloed data stores. Startups will start with one for everything, then split out auth, user analytics, telemetry etc

There is no evidence of any consolidation in the market. And definitely not some mass trend towards PostgreSQL.

Sytten4y ago

Couple of points:

1. Ottertune doesn't sell PostgreSQL services, they sell a database optimization service that happens to support PostgreSQL (and other databases like MySQL)

2. PostgreSQL is definitely gaining market shares and fast, see the db-engine graph [1], you can compare it to the oracle trend if you are not convinced [2]

[1] https://db-engines.com/en/ranking_trend/system/PostgreSQL

[2] https://db-engines.com/en/ranking_trend/system/Oracle

srcreigh4y ago

Is this ranking by # of orgs using Postgres, or relative total company value using Postgres, or some even more ambiguous effectiveness metric?

Answer: https://db-engines.com/en/ranking_definition

1 more reply

newlisp4y ago

they sell a database optimization service that happens to support PostgreSQL (and other databases like MySQL)

A ML program that automatically tunes your production database in real-time. What could possibly go wrong?

1 more reply

threeseed4y ago

You can't just compare graphs like that without factoring in the cloud. PostgreSQL is a first-class, cloud managed, supported database in the top three cloud providers whereas Oracle is not. It's a massive impediment to adoption and is in no way a reflection of the database itself.

Either way nothing to suggest that PostgreSQL is any way dominating.

2 more replies

dreyfan4y ago

All you need is Postgres (OLTP) and if you have large datasets where Postgres falls behind for analytical work, then you reach for Clickhouse (OLAP) for those features (while Postgres remains your primary operational database and source of truth).

mritchie7124y ago

Agreed. I have a good bit of experience in SaaS and analytics and that's exactly what I landed on for building Luabase[0]. Postgres (specifically Supabase) for the app database, Clickhouse to run the analytics (which is the product).

0 - https://luabase.com/

drchaim4y ago

This is the way for my also.

czhu124y ago

It's weird to put postgres into the same bucket as elastic search as they are often used for different things.

No matter how much you tune / denormalize postgres, you'll never get the free text search performance elastic search offers. Our best efforts on a 5 million row table yielded 600ms query times vs 30-60ms.

Similarity with snow flake, you'd never expect postgres to perform analytical queries at that scale.

I know graph databases and Time series DB have similar performance tradeoffs.

I think the most interesting and challenging area is how to architect a system uses many of these databases and keeps them eventually consistent without some bound.

code_biologist4y ago

Not affiliated, but for anyone looking to do searches on data stored primarily in Postgres via Elastic, ZomboDB is pretty slick.

ZomboDB is a Postgres extension that enables efficient full-text searching via the use of indexes backed by Elasticsearch. https://github.com/zombodb/zombodb#readme

tpetry4y ago

The author is talking about a different classes of rdbms. I believe his intention was not to compare PostgreSQL to ElasticSearch or ClickHouse which will solve a completely different problem.

But for small to medium datasets his advice to just stick to PostgreSQL is good: Start with an easy solution which will give you anything you need (by simply installing a plugin). If you need more specialized software THEN use it, but don't start with an overcomplicated stack because ElasticSearch and ClickHouse may be the state-of-the-art open source solution to a specific problem.

zxcq5444y ago

Have you tried GIN trigram(https://www.postgresql.org/docs/14/pgtrgm.html CREATE INDEX trgm_idx ON test_trgm USING GIN (t gin_trgm_ops);) and GIN fulltext search indexes(CREATE INDEX textsearch_idx ON pgweb USING GIN (textsearchable_index_col);) ? As far as I know after applying those indexes on full text search columns you can search as fast as in Elastic because those indexes are built same way as in Elastic.

t-writescode4y ago

How large are your text areas? What types of indexes are you using?

_vvhw4y ago

What are the distributed options for Postgres? What mechanisms are available to make it highly available i.e. with a distributed consensus protocol for strict serializability when failing over the primary? How do people typically deploy Postgres as a cluster?

1. Async replication tolerating data loss from slightly stale backup after a failover?

2. Sync replication tolerating downtime during manual failover?

3. Distributed consensus protocol for automated failover, high availability and no data loss, e.g. Viewstamped Replication, Paxos or Raft?

It seems like most managed service versions of databases such as Aurora, Timescale etc. are all doing option 3, but the open-source alternatives otherwise are still options 1 and 2?

ryanworl4y ago

I think you'd still need to change the core of the database to avoid stale reads when an old primary and client are partitioned away from the new primary, or force all client communication through a proxy smart enough to contact a quorum of replicas to ensure the current primary is still the primary during transaction begin and commit.

_vvhw4y ago

Ah yes, good point!

I was assuming in both cases of manual failover that the operator would have to have some way of physically shutting down the old primary, then starting it again only as a backup that doesn't reply to clients. Alternatively, the cluster would need to remain unavailable if any node is partitioned.

But none of this is really very practical when compared to a consensus protocol (or R/W quorums) and distributed database. I'm genuinely curious how people solve this with something like Postgres. Or is it perhaps something that isn't much worried about?

AtlasBarfed4y ago

I can't see how #3 scales under any write load unless you have no joins.|

Well, unless each node has a complete copy of the data?

eternalban4y ago

Databases are the best all around scratch every cs geek itch domain there is, with possible exception of operating systems.

The critical importance of extensibility as a primary concern of successful DB products needs to be highlighted. Realities of the domain dictate that product X matures a few years after inception, at which point the application patterns may have shifted. (Remember map-reduce?) If you pay attention, for example, you'll note that the du jour darlings are scrambling to claim fitness for ML (a subset of big-data), and the new comers are claiming to be "designed for ML".

Smart VC money should be on extensible players ..

SPBS4y ago

I genuinely couldn't tell if the author was being sarcastic when he said Larry Ellison was down on his luck because he dropped from 5th richest to 10th richest (and the whole thing about pulling himself out of the gutters by clawing up to 5th richest again).

apavlo4y ago

I was not being sarcastic. Larry is a good man.

hodgesrm4y ago

Larry Ellison is seriously underestimated as a database leader. I worked at Sybase. Oracle beat us fair and square. The Oracle DBMS team is outstanding.

1 more reply

fuy4y ago

It obviously was sarcastic.

rafaele4y ago

Seems like he's being serious and based on the linked tweet, I think he reveres Larry Ellison.

srini_reddy4y ago

that's what I felt too. especially after the word "gutters". :)

sriku4y ago

I've been intrigued by dgraph (https://dgraph.io) and used it to good effect in a (toy) project where it felt easy to create and evolve it's data model given changing requirements.

Dgraph uses graphql as its native query language.

Anyone here has some experience to share on it? ... Since it isn't mentioned in the article.

divan4y ago

My DB discovery and the game changer of 2021 was EdgeDB.

PudgePacket4y ago

Thanks for point it out, after a quick glance it actually looks like something I want to learn more about. Takes the niceties from prisma.io schema tooling and bring it closer to postgres.

girvo4y ago

Oh wow. Thats what I've been looking for, for years at this point. Thanks for the shout-out, I know what I'm playing with for my next project!

Kinrany4y ago

What did you like about it?

1 more reply

uvdn74y ago

I agree with Andy that it’s just super fun to work on databases. You get to work on consensus, networking, compute, storage, etc. The workloads are always changing, you can try to optimize across the entire stack. Applications and workloads come and go, but databases will always be around.

hodgesrm4y ago

Most interesting CS topics show up at least somewhere in databases.

tayo424y ago

Wow I kind of feel like I'm reading about Javascript frameworks. I don't recognize any of the dbs or companies/projects. Didn't realize the db world was so busy

apavlo4y ago

If you want to be even more overwhelmed, see my encyclopedia of database systems:

https://dbdb.io/

howdydoo4y ago

Which database do you use to store your database of databases?

1 more reply

Hamuko4y ago

Not even PostgreSQL?

tayo424y ago

Ok I thought a couple of them were obvious ones that are commonly known.(postgres, redis, mysql, oracle, mongo, cassandra)

ttiurani4y ago

> Databases Are the Most Important Thing in My Life After My Family > I even broke up with a girlfriend once because of sloppy benchmark results.

I can't say I can relate, but I do appreciate being this passionate about things!

sigmonsays4y ago

I really gotta go OT here and ask how this happened. Too funny.

Professional lives should be separate from personal but please, indulge us with a story!

ransom15384y ago

I am so confused. https://vitess.io/ I would check this page out and view it's "Who uses Vitess" section. Postgres is awesome if you are running a stand alone server with 300 users or creating the next "uber for cats". But at scale mysql has all the solutions. DBs are not js frameworks.

bsdnoob4y ago

I think PostgreSQL in an excellent general purpose solution specially for OLTP usecases but what it lacks behinds is that it's hard to scale horizontally (sharding). There are solutions for this ofcourse with citus but I haven't experimented with it however I have tried MySQL with Vitess which almost seems like dark wizardry. I hope one day vitess works with PostgreSQL.

FridgeSeal4y ago

From the article:

> Rockset joined in, saying its performance is was better for real-time analytics than the other two.

So I went and read the linked Rockset comparison blog post, and while I get that it’s a marketing piece, it’s also so transparently desperate for any advantage over Druid and ClickHouse that their criteria is bizarre at best, and bordering on wildly incorrect at worst.

I’ve been burnt by commercial databases before, and I have a hard time justifying ever using one, especially considering the advent of open source databases that have feature and performance parity (if not outright superiority) and can be self-hosted on K8s, or managed-hosting can be easily purchased.

mritchie7124y ago

Altinity is doing a good job of this with Clickhouse. They offer some decent open source guides for self hosting[0] and offer a hosted option. The hosted option is as self serve as I'd like (you have to get "approved").

0 - https://github.com/Altinity/clickhouse-operator and

FridgeSeal4y ago

Yeah I’ve been paying attention to the Altinity stuff for a while, they’ve got some good stuff.

I think we’ll get even more hosting options now that ClickHouse is it’s own backing company.

1 more reply

hu34y ago

I expected more mentions of Vitess, which honestly looks like some kind of alien black magic from what I saw while consulting for a client this year.

But I guess not much else happened to it other than PlanetScale.

leetrout4y ago

Which part most impressed you and which part seems like magic? Their devs / contributors are active on here...

hu34y ago

edit: not sure why the downvotes (-3 so far) since I just stated my experience on the project. There must be something blatantly wrong in what I wrote and I would appreciate criticism.

    --------

An architect demoed the failure of a shard and the automatic promotion of its backup shard to main, in production. They actually test their failure models.

As I see it, sharding is not very hard. HA is not very hard given a reasonable SLA. But sharding with HA on a large setup that actually works is pretty hard.

Another thing that stuck in my mind was their high throughput-per-provisioned-hardware ratio. With not much hardware they were pulling 80k queries per second with room to spare.

Although I have to say, that's not much compared to GitHub which pulls 1.2 million queries/sec on Vitess [0].

[0] https://github.blog/2021-09-27-partitioning-githubs-relation...

1 more reply

skunkworker4y ago

I would love to use Vitess, but it doesn’t support Postgres at the moment. And that’s a non-starter unfortunately.

ransom15384y ago

All major companies are moving to Vitess. The battle is over. No one at scale uses Postgres.

sbmthakur4y ago

Any reference for this?

1 more reply

kaliszad4y ago

Does somebody have experience with XTDB https://xtdb.com/index.html ? We would like to use it in our Clojure application perhaps with PostgreSQL as the backend (JDBC) to make it easier to implement a history feature.

Looking forward, instead of backward, it would be great for databases to have some kind of live-patch/ live-update feature so that one does not need any downtime at all if some rules are obeyed (with an automatic check, if that is the case). The same is for operating systems, where we have parts of the technology and even some limited deployment, but nothing of it is the default as far as I know. This situation makes it quite a bit harder to develop and maintain systems without introducing extreme complexity. It does not look like we will have less bugs/ less patches any time soon so we should make updating as easy as possible to drastically reduce the need for a maintenance window without resorting to building clusters for everything.

hbarka4y ago

I’m genuinely happy with Redshift for data warehousing purposes. For this I mean not-transactional data store. I don’t want to use the term OLTP or OLAP as it puts it in a purist’s camp. Sometimes I store 3NF normalized data and many times a flattened denormalized very large fact table and often times a model similar to star schema. I don’t have to worry about building indexes anymore, which was a real chore with row-store databases like Oracle, MySQL, SQL Server, or PostgreSql. MPP column-store databases have really been a game-changer for the enterprise. We’re talking billions of rows of data easily handled in the query plan.

LunaSea4y ago

The SQL version of Redshift is lagging so much behind that it makes it borderline unusable in my opinion.

hodgesrm4y ago

I have always been a huge fan of Redshift, which extends to Anurag Gupta and the team that delivered it. Redshift has always struck me as one of the real breakthrough products the history of analytic databases. It collapsed deploying data warehouses from months to about 20 minutes.

It's great to see the current team is on the move again, as the original ParAccel architecture did not scale very well. There was an excellent talk on Redshift in Andy Pavlo's Vaccination Database Tech Talks, 2nd Dose. [0] It's by Ippokratis Pandis and worth a view. It covers a lot of the recent improvements, which are likely to disappoint the many critics who have counted Redshift out. (Prematurely in my opinion.)

[0] https://db.cs.cmu.edu/seminar2021-dose2/

leetrout4y ago

Excited to see Dgraph on the top 10 mentions and climbing above neo4j

slekker4y ago

We are experimenting with Neo4j and found that the Cypher QL albeit foreign looking in the beginning feels quite natural to read when you think about graphs. How’s your experience with Dgraph been, any thoughts? I havent really heard about it before reading this post hence the curiosity!

criticaltinker4y ago

Databases in 2030: SQL DB finally succumbs to Graph DB as #1

Does anyone else feel like a caveman when modeling a many to many relationship in a normalized schema, and then querying via SQL?

I’m surprised graph DBs aren’t more popular for this reason alone. Maybe it’s a far fetched dream, but perhaps a graph frontend can be slapped onto the Postgres backend.

apavlo4y ago

> Databases in 2030: SQL DB finally succumbs to Graph DB as #1

Graph databases will not overtake relational databases in 2030 by marketshare.

Bookmark this comment. Reach out to me in 2030. If I'm wrong, I will replace my official CMU photo with one of me wearing a shirt that says "Graph Databases Are #1". I will use that photo until I retire, get fired, or a former student stabs me.

hodgesrm4y ago

Count me in on Andy's side of the bet. The most useful features of graph databases will likely be subsumed into RDBMS just as features from JSON stores and object stores were before them.

For example...One of the hits against RDBMS is that the structure is supposedly "rigid." That's simply not the case in many RDBMS, such as those using column storage. Adding columns in databases like ClickHouse is a trivial metadata operation. This means that many problems that Neo4j solves can be addressed in a more general-purpose RDBMS, because you can add columns easily to track relationships. It's pretty easy to envision other improvements to access methods to make searches more efficient.

I don't mean to undercut in any way the innovation of graph databases. It's just that the relational model is (a) extremely general and (b) can be extended.

cam04y ago

Not a fan of graph dbs? Surprised the $325m round for Neo4j didn't make your funding paragraph.

https://techcrunch.com/2021/06/17/neo4j-series-f/

PhoenixReborn4y ago

Have you looked at Hasura for the second question (graph frontend + relational backend)? That's basically GraphQL on top of Postgres.

As for the first question - I've tried using Neo4j and ArangoDB for relatively large-scale graph querying (1-2TB of data) and both couldn't hold a candle to Postgres or MySQL in terms of query performance for cost. Neo requires you to store most of your data in memory and Arango isn't great for cross-shard querying.

Unless there's some major new graph DB that comes out in the next few years I would still bet on relational being dominant in 2030.

jbergens4y ago

Have you tried TigerGraph?

They say that they scale well. I have not tried any graphdb for prod work yet.

PDoyle4y ago

Nonsense. Graph databases pre-date SQL. The relational model was created to overcome the limitations of graph databases.

srcreigh4y ago

Relational data schemas are a graph

eurasiantiger4y ago

And exactly for that reason, graph DBs can be more intuitive to work with: relational DBMSs generally don’t support any kind of graph operations or traversal queries.

option_greek4y ago

You can always use an ORM which provides better usability for developers. End of the day rdbms model is suited for a wide variety of workloads and there are several other factors in play while choosing a good db including eco system, cloud vendor support, migrations, performance etc.

chishaku4y ago

Which db is that?

Either way, that’s not happening.

will_gottschalk4y ago

I’ll take Hasura for 500

rapnie4y ago

Nice collection of open source databases: https://codeberg.org/yarmo/delightful-databases

dreig4y ago

There's a very nice and comprehensive database of databases https://dbdb.io/ started and maintained by Andy Pavlo and the CMU-DB group.

jimmyed4y ago

Andy forgot about the ugliest spart around benchmarks: Yugabyte v Cockroach.

endisneigh4y ago

I wish there was some API that abstracted the DB and all technical details and you could connect nodes to it that are specific databases with specific capabilities and it would delegate as necessary.

Too4y ago

Query language and data modeling for a db highly depends on if it is relational, graph, time series, denormalized or KV. Don't think this would be possible beyond what's already available in form of ORMs. Even getting SQL dialects to agree is a challenge some times.

srcreigh4y ago

You're basically talking about Airflow/Airbyte/custom ETL + database expertise. The only way to get efficient performance is expertise, expertise is expensive, ETLs are a given when you have expertise... Just hire a DB consultant or two and you're all set.

mns064y ago

You might want to look into debezium. We use it to extract the change log from a generic OLTP database into Materialize, a view maintenance engine. Combining that data with event streams in Kafka is very powerful for us.

paulryanrogers4y ago

FDW may be a step in that direction

vbezhenar4y ago

ODBC, JDBC.

beamatronic4y ago

Not a word about Couchbase, which went IPO and is currently worth $1B

qaq4y ago

There are so many Unicorn db companies now. It's hard to mention all of them

peakaboo4y ago

And no mention of Exasol which is faster than most, if not all, of these databases for analytics.

dblooman4y ago

ELI5, why do people still choose to use mongo?

redwood4y ago

The continuous availability and horizontal scalability of the distributed system coupled with developer experience--document model, secondary indexes, all of that certainly has captivated a large and growing developer community… you could boil it down to a confluence of ease of use with the advanced capabilities that you may need if you are successful… still they would have petered out if it wasn't for Atlas which makes all of the above that much more accessible

menaerus4y ago

Obvious, it's because MongoDB is web scale.

fullstackchris4y ago

storing files

cloudengineer944y ago

Postgress is amazing, however I work with SAP HANA every day and I gotta say this thing is completely insane.

tpetry4y ago

Can you share some information about SAP HANA? Why is it insane? Insanely good or bad? I have no experience with it.

RedShift14y ago

Well it is hard to beat keeping everything in RAM...

closeparen4y ago

The world moved away from Hadoop and MapReduce… onto what?

carlineng4y ago

Cloud SQL data warehouses like Snowflake and BigQuery.

throwDec214y ago

I just wish AWS had something as good as BigQuery.

1 more reply

AtlasBarfed4y ago

Throw a dart at the Apache project list :-)

My god do we need an atlas of database related Apache projects.

It's almost as bad as java web frameworks about ten years ago.

Everyone can do everything and it's hard to know what is better for what.

FridgeSeal4y ago

Add to this “Apache Streaming projects”.

I get that projects can be donated to Apache from disparate sources, but my god it’s still a disaster.

throwDec214y ago

I'm just surprised that in 2021 BigQuery isn't more popular. I thought it would be top 10 by now, I moved to GCP because of it but feel like I'm the only one.

PhoenixReborn4y ago

Because BQ is great for the ETL/data warehouse/BI use case but is terrible for online applications. I tried using BQ as the backing store for an online analytics application back in late 2019, and it was so much worse than using Clickhouse/Druid/Pinot for the same use case. IDK how much that has changed since, but I'm not too terribly surprised that it isn't higher.

jbergens4y ago

I'm not on GCP but the one I would like to try is Spanner or Cloud Spanner.

I think more scalable systems will continue to gain market share. It will be interesting to see if PlanetScale, CockroachDb or some other actually becomes a big player.

RedShift14y ago

Bigquery seems like a tool for very large but static datasets. I also had a hard time figuring out the pricing so other than some test queries I moved on to other solutions.

j / k navigate · click thread line to collapse

130 comments

why-el4y ago

[1] https://github.com/cybertec-postgresql/zheap [2] https://techcommunity.microsoft.com/t5/azure-database-for-po...

newlisp4y ago

Another concern, no temporal tables, don't businesses demand this feature?

sa464y ago

In Postgres land, I think most businesses work around temporal tables with audit tables using triggers to dump jsonb or hstore. I wrote up how I used table-inheritance here [1].

I agree with your point. Postgres is starting to stick out compared to alternatives:

- MS SQL supports uni-temporal tables using system time.

- Snowflake has time travel which acts like temporal tables but with a limited retention window. Seems more like a restore mechanism.

- MariaDB has system-versioned tables (doesn't look like it's in MySQL).

- Oracle seems to have the best temporal support with their flashback tech. But it's hard to read between the lines to figure out what it actually does.

[1]: https://news.ycombinator.com/item?id=29010446

manigandham4y ago

I've never seen a business actually use them, large or small. Any auditing requirements are usually fed from other sources, like Kafka event streams, files on S3, or a OLAP data warehouse.

1 more reply

code_biologist4y ago

Would love to see wider support for temporal tables, but application level approaches like https://github.com/jazzband/django-simple-history have worked for the business issues I have.

p_l4y ago

Interestingly enough, Postgres used to have time travel in tables before MVCC transactions were added. Apparently it wasn't exactly used feature.

roenxi4y ago

1 more reply

ivank4y ago

I use https://github.com/xocolatl/periods for this to some success.

srcreigh4y ago

What do temporal tables do that good queries don't?

1 more reply

nightpool4y ago

lenkite4y ago

I wish Postgres was more SQL standards compliant. Stuff like using `nextval()` instead of `NEXT VALUE` in SQL sequences is a pain.

nicoburns4y ago

Is zheap definitely still an active project? Last commit seems to be Oct 2020

phonon4y ago

Looks like it's still being actively worked on.

https://www.cybertec-postgresql.com/en/zheap-undo-logs-disca...

https://github.com/cybertec-postgresql/postgres/tree/zheap_u...

srcreigh4y ago

Clustered indexes?

zffr4y ago

The author is a professor at CMU who specializes in databases: https://www.cs.cmu.edu/~pavlo/

Not completely related, but his lectures on databases on YouTube are really good. Much better than the DB class I had at college.

adamkl4y ago

Said lectures on YouTube: https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_...

A great way to learn more about the inner workings of databases, and entertaining too.

Another choice quote (from one of his lectures):

“There’s only two things I care about in life:

1. My wife

2. Databases

I don’t give a f#ck about anything else”

_the_inflator4y ago

The author is hilarious! Quote from his article: “I even broke up with a girlfriend once because of sloppy benchmark results.”

thejosh4y ago

I'm really excited by all the database love in the last few years. I moved to PG from MySQL in 2014 and don't regret it since.

Timescaledb looks very exciting, as it's "just" a PG extension, but their compression work looks great. [0]

[0] https://docs.timescale.com/timescaledb/latest/how-to-guides/...

threeseed4y ago

So a company that sells PostgreSQL services thinks PostgreSQL is dominating. Brilliant.

There is no evidence of any consolidation in the market. And definitely not some mass trend towards PostgreSQL.

Sytten4y ago

Couple of points:

1. Ottertune doesn't sell PostgreSQL services, they sell a database optimization service that happens to support PostgreSQL (and other databases like MySQL)

2. PostgreSQL is definitely gaining market shares and fast, see the db-engine graph [1], you can compare it to the oracle trend if you are not convinced [2]

[1] https://db-engines.com/en/ranking_trend/system/PostgreSQL

[2] https://db-engines.com/en/ranking_trend/system/Oracle

srcreigh4y ago

Is this ranking by # of orgs using Postgres, or relative total company value using Postgres, or some even more ambiguous effectiveness metric?

Answer: https://db-engines.com/en/ranking_definition

1 more reply

newlisp4y ago

they sell a database optimization service that happens to support PostgreSQL (and other databases like MySQL)

A ML program that automatically tunes your production database in real-time. What could possibly go wrong?

1 more reply

threeseed4y ago

Either way nothing to suggest that PostgreSQL is any way dominating.

2 more replies

dreyfan4y ago

mritchie7124y ago

0 - https://luabase.com/

drchaim4y ago

This is the way for my also.

czhu124y ago

It's weird to put postgres into the same bucket as elastic search as they are often used for different things.

Similarity with snow flake, you'd never expect postgres to perform analytical queries at that scale.

I know graph databases and Time series DB have similar performance tradeoffs.

I think the most interesting and challenging area is how to architect a system uses many of these databases and keeps them eventually consistent without some bound.

code_biologist4y ago

Not affiliated, but for anyone looking to do searches on data stored primarily in Postgres via Elastic, ZomboDB is pretty slick.

ZomboDB is a Postgres extension that enables efficient full-text searching via the use of indexes backed by Elasticsearch. https://github.com/zombodb/zombodb#readme

tpetry4y ago

The author is talking about a different classes of rdbms. I believe his intention was not to compare PostgreSQL to ElasticSearch or ClickHouse which will solve a completely different problem.

zxcq5444y ago

t-writescode4y ago

How large are your text areas? What types of indexes are you using?

_vvhw4y ago

1. Async replication tolerating data loss from slightly stale backup after a failover?

2. Sync replication tolerating downtime during manual failover?

3. Distributed consensus protocol for automated failover, high availability and no data loss, e.g. Viewstamped Replication, Paxos or Raft?

It seems like most managed service versions of databases such as Aurora, Timescale etc. are all doing option 3, but the open-source alternatives otherwise are still options 1 and 2?

ryanworl4y ago

_vvhw4y ago

Ah yes, good point!

AtlasBarfed4y ago

I can't see how #3 scales under any write load unless you have no joins.|

Well, unless each node has a complete copy of the data?

eternalban4y ago

Databases are the best all around scratch every cs geek itch domain there is, with possible exception of operating systems.

Smart VC money should be on extensible players ..

SPBS4y ago

apavlo4y ago

I was not being sarcastic. Larry is a good man.

hodgesrm4y ago

Larry Ellison is seriously underestimated as a database leader. I worked at Sybase. Oracle beat us fair and square. The Oracle DBMS team is outstanding.

1 more reply

fuy4y ago

It obviously was sarcastic.

rafaele4y ago

Seems like he's being serious and based on the linked tweet, I think he reveres Larry Ellison.

srini_reddy4y ago

that's what I felt too. especially after the word "gutters". :)

sriku4y ago

I've been intrigued by dgraph (https://dgraph.io) and used it to good effect in a (toy) project where it felt easy to create and evolve it's data model given changing requirements.

Dgraph uses graphql as its native query language.

Anyone here has some experience to share on it? ... Since it isn't mentioned in the article.

divan4y ago

My DB discovery and the game changer of 2021 was EdgeDB.

PudgePacket4y ago

Thanks for point it out, after a quick glance it actually looks like something I want to learn more about. Takes the niceties from prisma.io schema tooling and bring it closer to postgres.

girvo4y ago

Oh wow. Thats what I've been looking for, for years at this point. Thanks for the shout-out, I know what I'm playing with for my next project!

Kinrany4y ago

What did you like about it?

1 more reply

uvdn74y ago

hodgesrm4y ago

Most interesting CS topics show up at least somewhere in databases.

tayo424y ago

Wow I kind of feel like I'm reading about Javascript frameworks. I don't recognize any of the dbs or companies/projects. Didn't realize the db world was so busy

apavlo4y ago

If you want to be even more overwhelmed, see my encyclopedia of database systems:

https://dbdb.io/

howdydoo4y ago

Which database do you use to store your database of databases?

1 more reply

Hamuko4y ago

Not even PostgreSQL?

tayo424y ago

Ok I thought a couple of them were obvious ones that are commonly known.(postgres, redis, mysql, oracle, mongo, cassandra)

ttiurani4y ago

> Databases Are the Most Important Thing in My Life After My Family > I even broke up with a girlfriend once because of sloppy benchmark results.

I can't say I can relate, but I do appreciate being this passionate about things!

sigmonsays4y ago

I really gotta go OT here and ask how this happened. Too funny.

Professional lives should be separate from personal but please, indulge us with a story!

ransom15384y ago

bsdnoob4y ago

FridgeSeal4y ago

From the article:

> Rockset joined in, saying its performance is was better for real-time analytics than the other two.

mritchie7124y ago

0 - https://github.com/Altinity/clickhouse-operator and

FridgeSeal4y ago

Yeah I’ve been paying attention to the Altinity stuff for a while, they’ve got some good stuff.

I think we’ll get even more hosting options now that ClickHouse is it’s own backing company.

1 more reply

hu34y ago

I expected more mentions of Vitess, which honestly looks like some kind of alien black magic from what I saw while consulting for a client this year.

But I guess not much else happened to it other than PlanetScale.

leetrout4y ago

Which part most impressed you and which part seems like magic? Their devs / contributors are active on here...

hu34y ago

edit: not sure why the downvotes (-3 so far) since I just stated my experience on the project. There must be something blatantly wrong in what I wrote and I would appreciate criticism.

    --------

An architect demoed the failure of a shard and the automatic promotion of its backup shard to main, in production. They actually test their failure models.

As I see it, sharding is not very hard. HA is not very hard given a reasonable SLA. But sharding with HA on a large setup that actually works is pretty hard.

Another thing that stuck in my mind was their high throughput-per-provisioned-hardware ratio. With not much hardware they were pulling 80k queries per second with room to spare.

Although I have to say, that's not much compared to GitHub which pulls 1.2 million queries/sec on Vitess [0].

[0] https://github.blog/2021-09-27-partitioning-githubs-relation...

1 more reply

skunkworker4y ago

I would love to use Vitess, but it doesn’t support Postgres at the moment. And that’s a non-starter unfortunately.

ransom15384y ago

All major companies are moving to Vitess. The battle is over. No one at scale uses Postgres.

sbmthakur4y ago

Any reference for this?

1 more reply

kaliszad4y ago

hbarka4y ago

LunaSea4y ago

The SQL version of Redshift is lagging so much behind that it makes it borderline unusable in my opinion.

hodgesrm4y ago

[0] https://db.cs.cmu.edu/seminar2021-dose2/

leetrout4y ago

Excited to see Dgraph on the top 10 mentions and climbing above neo4j

slekker4y ago

criticaltinker4y ago

Databases in 2030: SQL DB finally succumbs to Graph DB as #1

Does anyone else feel like a caveman when modeling a many to many relationship in a normalized schema, and then querying via SQL?

I’m surprised graph DBs aren’t more popular for this reason alone. Maybe it’s a far fetched dream, but perhaps a graph frontend can be slapped onto the Postgres backend.

apavlo4y ago

> Databases in 2030: SQL DB finally succumbs to Graph DB as #1

Graph databases will not overtake relational databases in 2030 by marketshare.

hodgesrm4y ago

Count me in on Andy's side of the bet. The most useful features of graph databases will likely be subsumed into RDBMS just as features from JSON stores and object stores were before them.

I don't mean to undercut in any way the innovation of graph databases. It's just that the relational model is (a) extremely general and (b) can be extended.

cam04y ago

Not a fan of graph dbs? Surprised the $325m round for Neo4j didn't make your funding paragraph.

https://techcrunch.com/2021/06/17/neo4j-series-f/

PhoenixReborn4y ago

Have you looked at Hasura for the second question (graph frontend + relational backend)? That's basically GraphQL on top of Postgres.

Unless there's some major new graph DB that comes out in the next few years I would still bet on relational being dominant in 2030.

jbergens4y ago

Have you tried TigerGraph?

They say that they scale well. I have not tried any graphdb for prod work yet.

PDoyle4y ago

Nonsense. Graph databases pre-date SQL. The relational model was created to overcome the limitations of graph databases.

srcreigh4y ago

Relational data schemas are a graph

eurasiantiger4y ago

And exactly for that reason, graph DBs can be more intuitive to work with: relational DBMSs generally don’t support any kind of graph operations or traversal queries.

option_greek4y ago

chishaku4y ago

Which db is that?

Either way, that’s not happening.

will_gottschalk4y ago

I’ll take Hasura for 500

rapnie4y ago

Nice collection of open source databases: https://codeberg.org/yarmo/delightful-databases

dreig4y ago

There's a very nice and comprehensive database of databases https://dbdb.io/ started and maintained by Andy Pavlo and the CMU-DB group.

jimmyed4y ago

Andy forgot about the ugliest spart around benchmarks: Yugabyte v Cockroach.

endisneigh4y ago

I wish there was some API that abstracted the DB and all technical details and you could connect nodes to it that are specific databases with specific capabilities and it would delegate as necessary.

Too4y ago

srcreigh4y ago

mns064y ago

paulryanrogers4y ago

FDW may be a step in that direction

vbezhenar4y ago

ODBC, JDBC.

beamatronic4y ago

Not a word about Couchbase, which went IPO and is currently worth $1B

qaq4y ago

There are so many Unicorn db companies now. It's hard to mention all of them

peakaboo4y ago

And no mention of Exasol which is faster than most, if not all, of these databases for analytics.

dblooman4y ago

ELI5, why do people still choose to use mongo?

redwood4y ago

menaerus4y ago

Obvious, it's because MongoDB is web scale.

fullstackchris4y ago

storing files

cloudengineer944y ago

Postgress is amazing, however I work with SAP HANA every day and I gotta say this thing is completely insane.

tpetry4y ago

Can you share some information about SAP HANA? Why is it insane? Insanely good or bad? I have no experience with it.

RedShift14y ago

Well it is hard to beat keeping everything in RAM...

closeparen4y ago

The world moved away from Hadoop and MapReduce… onto what?

carlineng4y ago

Cloud SQL data warehouses like Snowflake and BigQuery.

throwDec214y ago

I just wish AWS had something as good as BigQuery.

1 more reply

AtlasBarfed4y ago

Throw a dart at the Apache project list :-)

My god do we need an atlas of database related Apache projects.

It's almost as bad as java web frameworks about ten years ago.

Everyone can do everything and it's hard to know what is better for what.

FridgeSeal4y ago

Add to this “Apache Streaming projects”.

I get that projects can be donated to Apache from disparate sources, but my god it’s still a disaster.

throwDec214y ago

I'm just surprised that in 2021 BigQuery isn't more popular. I thought it would be top 10 by now, I moved to GCP because of it but feel like I'm the only one.

PhoenixReborn4y ago

jbergens4y ago

I'm not on GCP but the one I would like to try is Spanner or Cloud Spanner.

I think more scalable systems will continue to gain market share. It will be interesting to see if PlanetScale, CockroachDb or some other actually becomes a big player.

RedShift14y ago

Bigquery seems like a tool for very large but static datasets. I also had a hard time figuring out the pricing so other than some test queries I moved on to other solutions.

j / k navigate · click thread line to collapse