It seems like a compelling option:
* Much closer to Postgres compatibility than CockroachDB.
* A more permissive license.
* Built-in connection manager [1], which should simplify deployment.
* Supports both high availability and geo-distribution, which is useful if scaling globally becomes necessary later.
That said, I don't see it mentioned around here often. I wonder if anyone here has tried it and can comment on it.
--
1: https://docs.yugabyte.com/preview/explore/going-beyond-sql/c...
EDIT: in response to your question I did run a PoC of it but it had issues where I wasn't able to create very large indexes without the statement timing out on me. Basic simple hand-benchmarking of complex joins on very large tables were very slow if they finished at all. I suppose systems like this and cockroach really need short, simple statements and high client-concurrency rather than large, complex queries.
That’s normal for building indices on large tables, regardless of the RDBMS. Increase the timeout, and build them with the CONCURRENTLY option.
> Query speed
Without knowing your schema and query I can’t say with any certainty, but it shouldn’t be dramatically slower than single-node Postgres, assuming your table statistics are accurate (have you run ANALYZE <table>?), necessary indices are in place, and there aren’t some horrendously wrong parameters set.
## Free Trial
Use to evaluate whether the software suits a particular
application for less than 32 consecutive calendar days, on
behalf of you or your company, is use for a permitted purpose.
https://github.com/yugabyte/yugabyte-db/blob/master/licenses...It's not really clear what this means (what is a permitted purpose?), but it seems the intent is that after 32 days, you are expected to pay up. Or at least prepare for a future when the infrastructure to charge customers is in place (if it isn't there yet).
I found two threads discussing it from the past year:
https://news.ycombinator.com/item?id=39430411
https://news.ycombinator.com/item?id=38914764
Yugabyte (as with CockroachDB and TiDB) is based on mapping relations to an LSM-tree-based KV store, where ranges of keys get mapped to different nodes managed through a Raft group. That kind of structure has very different performance characteristics compared to Postgres' page-based MVCC. In particular, LSM trees are not a free lunch.
Query execution is also very different when a table's data is spread over multiple nodes. For example, joins are done on the query executor side by executing remote scans against each participating storage node and then merging the results. That's always going to be slower than a system that already has all the data locally.
YB also lacks some index optimizations. There is some work to make bitmap index scans work in YB, which will give a huge performance boost to many queries, but it's incomplete. YB does have some optimizations (like loose index scans) that Postgres does not have. So it's probably fair to say that YB is probably a lot slower than PG for some things and a little faster at others.
I think it's fundamentally not a bad architecture, just different from Postgres. So even though they took the higher layers from Postgres, there's a whole bunch of rearchitecting needed in order to make the higher layers work with the lower ones. You do get some Postgres stuff for free, but I wonder if the amount of work here is worth it in the end. So much in Postgres makes the assumption of a local page heap.
What we see in cases where someone takes Postgres and replaces the guts (Greenplum, Cloudberry, and of course YDB) is that it becomes a huge effort to keep up with new Postgres versions. YDB is on Postgres 12, which came out in 2019, and is slowly upgrading to 15, which came out 2022. By the time they've upgraded to 15, it will probably be 2-3 versions behind, and the work continues.
Worth noting: Yugabyte was tested by Kyle Kingsbury back in 2019, which uncovered some deficiencies. Not sure what the state is today. The YB team also runs their own Jepsen tests now as part of CI, which is a good sign.
Please see this blog https://www.yugabyte.com/blog/chaos-testing-yugabytedb/ for latest updates, as well as information on additional in-house built frameworks for resiliency and consistency testing.
The first upgrade is the hardest, but after that we will have the framework in place to perform consecutive upgrades much sooner. When the pg11 to pg15 upgrade becomes available it will be in-place online without affecting the DMLs, no other pg fork offers this capability today.
It's also only compatible insofar that you can use only a subset of Postgres features, as they're only supporting the most basics things like select, views etc
Triggers, notifys etc were out of scope the last time I checked (which has admittedly been a while)
> We use vanilla Postgres as-is for the query layer and replace Postgres storage with YugabyteDB’s own distributed storage engine.
https://www.yugabyte.com/blog/yugabytedb-enhanced-postgres-c...
But they do call it yugabyteDB, YugabyteDB, YugaByte DB, yugabyte-db, and Yugabyte.
pgEdge: https://github.com/pgedge/pgedge Demo: https://youtu.be/Gpty7yNlwH4?t=1873
Not affiliated with them.
I recall that aspirationally pgEdge aims to be compatible with the latest pg version or one behind.
Keeps the number of reported bugs nice and low. The discussion of critical bugs that lose your data is left to HN and Twitter threads instead.
This sounds like the two generals problem, which has no solution. But I may be misunderstanding.
That still leaves actual crashes, which would need to use the shared memory to store the list of pending replications before the recovery of transactions is finished.
Couldn't one simply define kubernetes network policies to limit egress from CockroachDB pods?
There are also solutions like MySQL Cluster and Galera which provide a more cluster-like solution with synchronous replication. If you've got a suitable use case (low writes, high reads and no gigantic transactions) this can work extremely well. You bootstrap a cluster on node 1, and new members automatically take a copy of the cluster data when they join. You can have 3 or 5 mode clusters, and reads are distributed across all nodes since it's synchronous. Beware though, operating one of these things requires care. I've seen people suffer read downtime or full cluster outages by doing operations without understanding how they work under the hood. And if you're cluster is hard-down, you need to pick the node with "most recent transaction" to re-bootstrap the cluster or you can lose transactions.
There's regular single primary/n replica replication built in. There's no built in automatic failover.
There's also Group replication built in. This can be either single primary/n replica with automatic election of a new primary during a failure, or it can be multi-primary.
Then there's Galera, which is similar to the multi-primary mode of Group replication.
My use case is a fairly small database of email address mappings. I have a mail server that sends out emails on behalf of our users, using their outside e-mail address (gmail, yahoo, whathaveyou). In order to allow replies, but prevent being an open relay, I have a milter that creates a lookalike address, with a token in it (user@example.com -> user_at_example_com_x1uif8dm@mailserver.example.net).
I store those mappings in the mysql database, less than 10K of those. So a trivial database, very high read to write ratio.
More recently, maybe 3-4 years ago, I added another couple tables that store all the logs. I have a job that reads the postfix logfiles and writes out all the log information so we can show users the status of every of their outgoing e-mails. That upped the amount of traffic to the database by quite a lot, but it's still pretty simple: basically no transactions, just simple single insert statements, and IIRC one overview table that gets some updates, then a clean up job to delete them after 30-90 days.
Galera has been a real workhorse.
For a while I was going to go with cockroachdb, and I set up a POC. It was pretty robust with setting it up and clustering, I never ran into having to re-bootstrap the couster. But, at the time, postfix couldn't write directly to cockroachdb because cockroach could only do UTF-8, and the postfix pgsql code would only do latin-1 or something. This has changed since and I've thought about switching back to cockroach, but I hear there are some licensing issues I may have to be aware of.
The Jepsen author gave a great talk on all the performance engineering work that has gone into it, Jepsen is near enough an entire DBMS in its own right https://www.youtube.com/watch?v=EUdhyAdYfpA
I just really couldn’t justify Clojure in my project and personally if I want Lisp, I know where to find it.