Instant database clones with PostgreSQL 18 (opens in new tab)

(boringsql.com)

435 pointsradimm3mo ago162 comments

162 comments

For those who can't wait for PG18 or need full instance isolation: I built Velo, which does instant branching using ZFS snapshots instead of reflinks.

Works with any PG version today. Each branch is a fully isolated PostgreSQL container with its own port. ~2-5 seconds for a 100GB database.

https://github.com/elitan/velo

Main difference from PG18's approach: you get complete server isolation (useful for testing migrations, different PG configs, etc.) rather than databases sharing one instance.

72deluxe3mo ago

Despite all of the complaints in other comments about the use of Claude Code, it looks interesting and I appreciated the video demo you put on the GitHub page.

buu7003mo ago

Agentic coding detractors: "If AI is so great, where all the thriving new open source projects to prove it?"

Also agentic coding detractors: "How dare you use AI to help build a new open source project."

I'm joking and haven't read the comments you're referring to, but whether or not AI was involved is irrelevant per se. If anyone finds themselves having a gut reaction to "AI", just mentally replace it with "an intern" or "a guy from Fiverr". Either way, the buck stops with whomever is taking ownership of the project.

If the code/architecture is buggy or unsafe, call that out. If there's a specific reason to believe no one with sufficient expertise reviewed and signed off on the implementation, call that out. Otherwise, why complain that someone donated their time and expertise to give you something useful for free?

2 more replies

newusertoday3mo ago

thanks for sharing its interesting approach. I am not sure why people are complaining most of the software is written with the help of agents these days.

theturtletalks3mo ago

It’s rampant. Launch anything these days and it’s bombarded with “vibe-coded” comments.

The issue of quality makes sense since it’s so easy to build these days, but when the product is open-source, these vibe coded comments make no sense. Users can literally go read the code or my favorite? Repomix it, pop it into AI Studio, and ask Gemini what this person has built, what value it brings, and does it solve the problem I have?

For vibe coded proprietary apps, you can’t do that so the comments are sort of justified.

1 more reply

whalesalad3mo ago

Hell yeah. I’ve been meaning to prototype this exact thing but with btrfs.

elitan3mo ago

interesting, you have a link?

anonzzzies3mo ago

Does it work with other dbs theoretically?

Rovanion3mo ago

You, is an interesting word to use given that you plagiarized it.

anonymars3mo ago

Do you have a link to the original?

wahnfrieden3mo ago

Please share the instant Postgres clones tool this copied! I'd love to try it

elitan3mo ago

Plagiarized from what? Happy to address if you can point to what you're referring to.

1 more reply

teiferer3mo ago

You mean you told Claude a bunch of details and it built it for you?

Mind you, I'm not saying it's bad per se. But shouldn't we be open and honest about this?

I wonder if this is the new normal. Somebody says "I built Xyz" but then you realize it's vibe coded.

pritambarhate3mo ago

Let's say there is an architect and he also owns a construction company. This architect, then designs a building and gets it built from of his employees and contractors.

In such cases the person says, I have built this building. People who found companies, say they have built companies. It's commonly accepted in our society.

So even if Claude built for it for GP, as long as GP designed it, paid for tools (Claude) to build it, also tested it to make sure that it works, I personally think, he has right to say he has built it.

If you don't like it, you are not required to use it.

9 more replies

elAhmo3mo ago

It is the new normal, whether you are against it or not.

If someone used AI, it is a good discussion to see whether they should explicitly disclose it, but people have been using assisted tools, from auto-complete, text expanders, IDE refactoring tools, for a while - and you wouldn't make a comment that they didn't build it. The lines are becoming more blurry over time, but it is ridiculous to claim that someone didn't build something if they used AI tools.

cstrahan3mo ago

Do you take issue with companies stating that they (the company) built something, instead of stating that their employees built something? Should the architects and senior developers disclaim any credit, because the majority of tickets were completed by junior and mid-level developers?

Do you take issue with a CNC machinist stating that they made something, rather than stating that they did the CAD and CAM work but that it was the CNC machine that made the part?

Non-zero delegation doesn’t mean that the person(s) doing the delegating have put zero effort into making something, so I don’t think that delegation makes it dishonest to say that you made something. But perhaps you disagree. Or, maybe you think the use of AI means that the person using AI isn’t putting any constructive effort into what was made — but then I’d say that you’re likely way overestimating the ability of LLMs.

1 more reply

earthnail3mo ago

Not sure why this is downvoted. For a critical tool like DB cloning, I‘d very much appreciate if it was hand written. Simply because it means it’s also hand reviewed at least once (by definition).

We wouldn’t have called it reviewed in the old world, but in the AI coding world we’re now in it makes me realise that yes, it is a form of reviewing.

I use Claude a lot btw. But I wouldn’t trust it on mission critical stuff.

4 more replies

6r173mo ago

There was a recent wave of such comment on the rust subreddit - exactly in this shape "Oh you mean you built this with AI". This is highly toxic, lead to no discussion, and is literally drove by some dark thought from the commentator. I really hope HN will not jump on this bandwagon and will focus instead on creating cool stuff.

Everybody in the industry is vibecoding right now - the things that stick are due to sufficient quality being pushed on it. Having a pessimistic / judgmental surface reaction to everything as being "ai slop" is not something that I'm going to look forward in my behavior.

2 more replies

dpedu3mo ago

Huh? It says so right in the README.

https://github.com/elitan/velo/blame/12712e26b18d0935bfb6c6e...

And are we really doing this? Do we need to admit how every line of code was produced? Why? Are you expecting to see "built with the influence of Stackoverflow answers" or "google searches" on every single piece of software ever? It's an exercise of pointlessness.

2 more replies

taude3mo ago

why does it matter?

sheepscreek3mo ago

I set this up for my employer many years ago when they migrated to RDS. We kept bumping into issues on production migrations that would wreck things. I decided to do something about it.

The steps were basically:

1. Clone the AWS RDS db - or spin up a new instance from a fresh backup.

2. Get the arn and from that the cname or public IP.

3. Plug that into the DB connection in your app

4. Run the migration on pseudo prod.

This helped up catch many bugs that were specific to production db or data quirks and would never haven been caught locally or even in CI.

Then I created a simple ruby script to automate the above and threw it into our integrity checks before any deployment. Last I heard they were still using that script I wrote in 2016!

Tostino3mo ago

I love those "migration only fails in prod because of data quirks" bugs. They are the freaking worst. Have called off releases in the past because of it.

QuercusMax3mo ago

You should almost never test in prod, but sometimes testing on [a copy of] prod is useful

1 more reply

peterldowns3mo ago

Really interesting article, I didn't know that the template cloning strategy was configurable. Huge fan of template cloning in general; I've used Neon to do it for "live" integration environments, and I have a golang project https://github.com/peterldowns/pgtestdb that uses templates to give you ~unit-test-speed integration tests that each get their own fully-schema-migrated Postgres database.

Back in the day (2013?) I worked at a startup where the resident Linux guru had set up "instant" staging environment databases with btrfs. Really cool to see the same idea show up over and over with slightly different implementations. Speed and ease of cloning/testing is a real advantage for Postgres and Sqlite, I wish it were possible to do similar things with Clickhouse, Mysql, etc.

BenjaminFaal3mo ago

For anyone looking for a simple GUI for local testing/development of Postgres based applications. I built a tool a few years ago that simplifies the process: https://github.com/BenjaminFaal/pgtt

peterldowns3mo ago

Is this basically using templates as "snapshots", and making it easy to go back and forth between them? Little hard to tell from the README but something like that would be useful to me and my team: right now it's a pain to iterate on sql migrations, and I think this would help.

BenjaminFaal3mo ago

Thats exactly what it is, just try it with the provided docker-compose file you will get it then.

okigan3mo ago

Would love to see a snapshot of the GUI as part of the README.md.

Also docker link seems to be broken.

BenjaminFaal3mo ago

Fixed the package link. Github somehow made it private. I will add a snapshot right now.

majodev3mo ago

Uff, I had no idea that Postgres v15 introduced WAL_LOG and changed the defaults from FILE_COPY. For (parallel CI) test envs, it make so much sense to switch back to the FILE_COPY strategy ... and I previously actually relied on that behavior.

Raised an issue in my previous pet project for doing concurrent integration tests with real PostgreSQL DBs (https://github.com/allaboutapps/integresql) as well.

radarroark3mo ago

In theory, a database that uses immutable data structures (the hash array mapped trie popularized by Clojure) could allow instant clones on any filesystem, not just ZFS/XFS, and allow instant clones of any subset of the data, not just the entire db. I say "in theory" but I actually built this already so it's not just a theory. I never understood why there aren't more HAMT based databases.

zX41ZdbW3mo ago

This is typical for analytical databases, e.g., ClickHouse (which I'm the author of) uses immutable data parts, allowing table cloning: https://clickhouse.com/docs/sql-reference/statements/create/...

ozgrakkurt3mo ago

`ClickHouse (which I'm the author of)` just casually dropped that in the middle

2 more replies

chamomeal3mo ago

Does datomic have built in cloning functionality? I’ve been wanting to try datomic out but haven’t felt like putting in the work to make a real app lol

radarroark3mo ago

Surprisingly, no it does not. Datomic has a more limited feature that lets you make an in-memory clone of the latest copy of the db for speculative writes, which might be useful for tests, but you can't take an arbitrary version of the db with as-of and use it as the basis for a new version on disk. See: https://blog.danieljanus.pl/2025/04/22/datomic-forking-the-p...

There's nothing technically that should prevent this if they are using HAMTs underneath, so I'm guessing they just didn't care about the feature. With HAMT, cloning any part of the data structure, no matter how nested, is just a pointer copy. This is more useful than you'd think but hardly any database makes it possible.

riskable3mo ago

PostgreSQL seems to have become the be-all, end-all SQL database that does everything and does it all well. And it's free!

I'm wondering why anyone would want to use anything else at this point (for SQL).

nine_k3mo ago

Postgres is wonderful, and has great many useful extensions. But:

* MySQL has a much easier story of master-master replication.

* Mongo has a much easier story of geographic distribution and sharding. (I know that Citus exists, and has used it.)

* No matter how you tune Postgres, columnar databases like Clickhouse are still faster for analytics / time series.

* Write-heavy applications still may benefit from something like Cassandra, or more modern solutions in this space.

(I bet Oracle has something to offer in the department of cluster performance, too, but I did not check it out for a long time.)

pstuart3mo ago

And SQLite is capable enough in many cases too...

vl3mo ago

“does it all well” is a stretch.

Any non-trivial amount of data and you’ll run into non-trivial problems.

For example, some of our pg databases got into such state, that we had to write custom migration tool because we couldn’t copy data to new instance using standard tools. We had to re-write schema to using custom partitions because perf on built-in partitioning degrades as number of partitions gets high, and so on.

aftbit3mo ago

Once upon a time, MySQL/InnoDB was a better performance choice for UPDATE-heavy workloads. There was a somewhat famous blog post about this from Uber[1]. I'm not sure to what extent this persists today. The other big competitor is sqlite3, which fills a totally different niche for running databases on the edge and in-product.

Personally, I wouldn't use any SQL DB other that PostgreSQL for the typical "database in the cloud" use case, but I have years of experience both developing for and administering production PostgreSQL DBs, going back to 9.5 days at least. It has its warts, but I've grown to trust and understand it.

1: https://www.uber.com/blog/postgres-to-mysql-migration/

turtles33mo ago

To be fair, postgres still suffers from a poor choice of MVCC implementation (copy on write rather than an undo log). This one small choice has a huge number of negative knock on effects once your load becomes non-trivial

hu33mo ago

PostgreSQL has no mature Vitess alternative. Hence, the largest oss OLTP database deployments tend to be MySQL. Like YouTube and Uber for example.

bddicken3mo ago

This is changing soon with Neki.

https://www.neki.dev

2 more replies

dangoodmanUT3mo ago

YouTube has migrated mostly to Spanner from what I hear

efxhoy3mo ago

It’s the clear OLTP winner but for OLAP it’s still not amazing out of the box.

scottyah3mo ago

It's heavy, I'd say sqlite3 close to the client and postgres back at the server farm is the combo to use.

wahnfrieden3mo ago

Can’t really run it on iOS. And its WASM story is weak

christophilus3mo ago

As an aside, I just jumped around and read a few articles. This entire blog looks excellent. I’m going to have to spend some time reading it. I didn’t know about Postgres’s range types.

pak9rabid3mo ago

Range types are a godsend when you need to calculate things like overlapping or intersecting time/date ranges.

zachrip3mo ago

Can you give a real world example?

2 more replies

oulipo23mo ago

Assuming I'd like to replicate my production database for either staging, or to test migrations, etc,

and that most of my data is either:

- business entities (users, projects, etc)

- and "event data" (sent by devices, etc)

where most of the database size is in the latter category, and that I'm fine with "subsetting" those (eg getting only the last month's "event data")

what would be the best strategy to create a kind of "staging clone"? ideally I'd like to tell the database (logically, without locking it expressly): do as though my next operations only apply to items created/updated BEFORE "currentTimestamp", and then:

- copy all my business tables (any update to those after currentTimestamp would be ignored magically even if they happen during the copy) - copy a subset of my event data (same constraint)

what's the best way to do this?

gavinray3mo ago

You can use "psql" to dump subsets of data from tables and then later import them.

Something like:

  psql <db_url> -c "\copy (SELECT * FROM event_data ORDER BY created_at DESC LIMIT 100) TO 'event-data-sample.csv' WITH CSV HEADER"

https://www.postgresql.org/docs/current/sql-copy.html

It'd be really nice if pg_dump had a "data sample"/"data subset" option but unfortunately nothing like that is built in that I know of.

peterldowns3mo ago

pg_dump has a few annoyances when it comes to doing stuff like this — tricky to select exactly the data/columns you want, and also the dumped format is not always stable. My migration tool pgmigrate has an experimental `pgmigrate dump` subcommand for doing things like this, might be useful to you or OP maybe even just as a reference. The docs are incomplete since this feature is still experimental, file an issue if you have any questions or trouble

https://github.com/peterldowns/pgmigrate

oulipo23mo ago

Indeed, but is there a way to do it as a "point in time", eg do a "virtual checkpoint" at a timestamp, and do all the copy operations from that timestamp, so they are coherent?

1f973mo ago

aws supports this as well: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

horse6663mo ago

Aurora clones are copy-on-write at the storage layer, which solves part of the problem, but RDS still provisions you a new cluster with its own endpoints, etc, which is slow ~10 mins, so not really practical for the integration testing use case.

nateroling3mo ago

This is on the cluster level, while the article is talking about the database level, I believe.

francislavoie3mo ago

Is anyone aware of something like this for MariaDB?

Something we've been trying to solve for a long time is having instant DB resets between acceptance tests (in CI or locally) back to our known fixture state, but right now it takes decently long (like half a second to a couple seconds, I haven't benchmarked it in a while) and that's by far the slowest thing in our tests.

I just want fast snapshotted resets/rewinds to a known DB state, but I need to be using MariaDB since it's what we use in production, we can't switch DB tech at this stage of the project, even though Postgres' grass looks greener.

proaralyst3mo ago

You could use LVM or btrfs snapshots (at the filesystem level) if you're ok restarting your database between runs

francislavoie3mo ago

Restarting the DB is unfortunately way too slow. We run the DB in a docker container with a tmpfs (in-memory) volume which helps a lot with speed, but the problem is still the raw compute needed to wipe the tables and re-fill them with the fixtures every time.

2 more replies

briffle3mo ago

LVM snapshots work well. Used it for years with other database tools.. But make sure you allocate enough write space for the COW.. when the write space fills up, LVM just 'drops' the snapshot.

pak9rabid3mo ago

I was able to accomplish this by doing each test within its own transaction session that gets rolled-back after each test. This way I'm allowed to modify the database to suit my needs for each test, then it gets magically reset back to its known state for the next test. Transaction rollbacks are very quick.

hu33mo ago

As a consultant, I saw many teams doing that and it works well.

The only detail is that autoincrements (SEQUENCEs for PotgreSQL folks) gets bumped even if the transaction rollsback.

So tables tend to get large ids quickly. But it's just dev database so no problem.

francislavoie3mo ago

Unfortunately a lot of our tests use transactions themselves because we lock the user row when we do anything to ensure consistency, and I'm pretty sure nested transactions are still not a thing.

1 more reply

fanf23mo ago

This doesn’t work for testing migrations because MySQL/MariaDB doesn’t support DDL inside transactions, unlike PostgreSQL.

1 more reply

tudorg3mo ago

This is really cool and I love to see the interest in fast clones / branching here.

We've built Xata with this idea of using copy-on-write database branching for staging and testing setups, where you need to use testing data that's close to the real data. On top of just branching, we also do things like anonymization and scale-to-zero, so the dev branches are often really cheap. Check it out at https://xata.io/

> The source database can't have any active connections during cloning. This is a PostgreSQL limitation, not a filesystem one. For production use, this usually means you create a dedicated template database rather than cloning your live database directly.

This is a key limitation to be aware of. A way to workaround it could be to use pgstream (https://github.com/xataio/pgstream) to copy from the production database to a production replica. Pgstream can also do anonymization on the way, this is what we use at Xata.

mvcosta913mo ago

It looks very interesting for integration tests

radimmOP3mo ago

OP here - yes, this is my use case too: integration and regression testing, as well as providing learning environments. It makes working with larger datasets a breeze.

febed3mo ago

If possible could you share a repo/gist with a working docker example? I’m curious how the instant clone world work there.

presentation3mo ago

We do this, preview deploys, and migration dry runs using Neon Postgres’s branching functionality - seems one benefit of that vs this is that it works even with active connections which is good for doing these things on live databases.

drakyoko3mo ago

would this work inside test containers?

radimmOP3mo ago

OP here - still have to try (generally operate on VM/bare metal level); but my understanding is that ioctl call would get passed to the underlying volume; i.e. you would have to mount volume

odie55333mo ago

I use CREATE DATABASE dbname TEMPLATE template1; inside test containers. Have not tried this new method yet.

TimH3mo ago

Looks like it would probably be quite useful when setting up git worktrees, to get multiple claude code instances spun up a bit more easily.

horse6663mo ago

This is really cool, looking forward to trying it out.

Obligatory mention of Neon (https://neon.com/) and Xata (https://xata.io/) which both support “instant” Postgres DB branching on Postgres versions prior to 18.

eatsyourtacos3mo ago

I still cannot reliably restore any Postgres DB with the TimescaleDB extensions on it.. have tried a million things but still fails every time.

1a527dd53mo ago

Many thanks, this solves integration tests for us!

wayeq3mo ago

we just build the database, commit it to a container (without volumes attached), and programmatically stop and restart the container per test class (testcontainers.org). the overhead is < 5 seconds and our application recovers to the reset database state seamlessly. it's been awesome.

hmokiguess3mo ago

I’ve been a fan of Neon and it’s branching strategy, really handy thing for stuff like this.

tehlike3mo ago

Now i need to find a way to migrate from hydra columnar to pg_lake variants so i can upgrade to PG18.

j / k navigate · click thread line to collapse

162 comments

elitan3mo ago

For those who can't wait for PG18 or need full instance isolation: I built Velo, which does instant branching using ZFS snapshots instead of reflinks.

Works with any PG version today. Each branch is a fully isolated PostgreSQL container with its own port. ~2-5 seconds for a 100GB database.

https://github.com/elitan/velo

Main difference from PG18's approach: you get complete server isolation (useful for testing migrations, different PG configs, etc.) rather than databases sharing one instance.

72deluxe3mo ago

Despite all of the complaints in other comments about the use of Claude Code, it looks interesting and I appreciated the video demo you put on the GitHub page.

buu7003mo ago

Agentic coding detractors: "If AI is so great, where all the thriving new open source projects to prove it?"

Also agentic coding detractors: "How dare you use AI to help build a new open source project."

2 more replies

newusertoday3mo ago

thanks for sharing its interesting approach. I am not sure why people are complaining most of the software is written with the help of agents these days.

theturtletalks3mo ago

It’s rampant. Launch anything these days and it’s bombarded with “vibe-coded” comments.

For vibe coded proprietary apps, you can’t do that so the comments are sort of justified.

1 more reply

whalesalad3mo ago

Hell yeah. I’ve been meaning to prototype this exact thing but with btrfs.

elitan3mo ago

interesting, you have a link?

anonzzzies3mo ago

Does it work with other dbs theoretically?

Rovanion3mo ago

You, is an interesting word to use given that you plagiarized it.

anonymars3mo ago

Do you have a link to the original?

wahnfrieden3mo ago

Please share the instant Postgres clones tool this copied! I'd love to try it

elitan3mo ago

Plagiarized from what? Happy to address if you can point to what you're referring to.

1 more reply

teiferer3mo ago

You mean you told Claude a bunch of details and it built it for you?

Mind you, I'm not saying it's bad per se. But shouldn't we be open and honest about this?

I wonder if this is the new normal. Somebody says "I built Xyz" but then you realize it's vibe coded.

pritambarhate3mo ago

Let's say there is an architect and he also owns a construction company. This architect, then designs a building and gets it built from of his employees and contractors.

In such cases the person says, I have built this building. People who found companies, say they have built companies. It's commonly accepted in our society.

If you don't like it, you are not required to use it.

9 more replies

elAhmo3mo ago

It is the new normal, whether you are against it or not.

cstrahan3mo ago

Do you take issue with a CNC machinist stating that they made something, rather than stating that they did the CAD and CAM work but that it was the CNC machine that made the part?

1 more reply

earthnail3mo ago

We wouldn’t have called it reviewed in the old world, but in the AI coding world we’re now in it makes me realise that yes, it is a form of reviewing.

I use Claude a lot btw. But I wouldn’t trust it on mission critical stuff.

4 more replies

6r173mo ago

2 more replies

dpedu3mo ago

Huh? It says so right in the README.

https://github.com/elitan/velo/blame/12712e26b18d0935bfb6c6e...

2 more replies

taude3mo ago

why does it matter?

sheepscreek3mo ago

I set this up for my employer many years ago when they migrated to RDS. We kept bumping into issues on production migrations that would wreck things. I decided to do something about it.

The steps were basically:

1. Clone the AWS RDS db - or spin up a new instance from a fresh backup.

2. Get the arn and from that the cname or public IP.

3. Plug that into the DB connection in your app

4. Run the migration on pseudo prod.

This helped up catch many bugs that were specific to production db or data quirks and would never haven been caught locally or even in CI.

Then I created a simple ruby script to automate the above and threw it into our integrity checks before any deployment. Last I heard they were still using that script I wrote in 2016!

Tostino3mo ago

I love those "migration only fails in prod because of data quirks" bugs. They are the freaking worst. Have called off releases in the past because of it.

QuercusMax3mo ago

You should almost never test in prod, but sometimes testing on [a copy of] prod is useful

1 more reply

peterldowns3mo ago

BenjaminFaal3mo ago

For anyone looking for a simple GUI for local testing/development of Postgres based applications. I built a tool a few years ago that simplifies the process: https://github.com/BenjaminFaal/pgtt

peterldowns3mo ago

BenjaminFaal3mo ago

Thats exactly what it is, just try it with the provided docker-compose file you will get it then.

okigan3mo ago

Would love to see a snapshot of the GUI as part of the README.md.

Also docker link seems to be broken.

BenjaminFaal3mo ago

Fixed the package link. Github somehow made it private. I will add a snapshot right now.

majodev3mo ago

Raised an issue in my previous pet project for doing concurrent integration tests with real PostgreSQL DBs (https://github.com/allaboutapps/integresql) as well.

radarroark3mo ago

zX41ZdbW3mo ago

This is typical for analytical databases, e.g., ClickHouse (which I'm the author of) uses immutable data parts, allowing table cloning: https://clickhouse.com/docs/sql-reference/statements/create/...

ozgrakkurt3mo ago

`ClickHouse (which I'm the author of)` just casually dropped that in the middle

2 more replies

chamomeal3mo ago

Does datomic have built in cloning functionality? I’ve been wanting to try datomic out but haven’t felt like putting in the work to make a real app lol

radarroark3mo ago

riskable3mo ago

PostgreSQL seems to have become the be-all, end-all SQL database that does everything and does it all well. And it's free!

I'm wondering why anyone would want to use anything else at this point (for SQL).

nine_k3mo ago

Postgres is wonderful, and has great many useful extensions. But:

* MySQL has a much easier story of master-master replication.

* Mongo has a much easier story of geographic distribution and sharding. (I know that Citus exists, and has used it.)

* No matter how you tune Postgres, columnar databases like Clickhouse are still faster for analytics / time series.

* Write-heavy applications still may benefit from something like Cassandra, or more modern solutions in this space.

(I bet Oracle has something to offer in the department of cluster performance, too, but I did not check it out for a long time.)

pstuart3mo ago

And SQLite is capable enough in many cases too...

vl3mo ago

“does it all well” is a stretch.

Any non-trivial amount of data and you’ll run into non-trivial problems.

aftbit3mo ago

1: https://www.uber.com/blog/postgres-to-mysql-migration/

turtles33mo ago

hu33mo ago

PostgreSQL has no mature Vitess alternative. Hence, the largest oss OLTP database deployments tend to be MySQL. Like YouTube and Uber for example.

bddicken3mo ago

This is changing soon with Neki.

https://www.neki.dev

2 more replies

dangoodmanUT3mo ago

YouTube has migrated mostly to Spanner from what I hear

efxhoy3mo ago

It’s the clear OLTP winner but for OLAP it’s still not amazing out of the box.

scottyah3mo ago

It's heavy, I'd say sqlite3 close to the client and postgres back at the server farm is the combo to use.

wahnfrieden3mo ago

Can’t really run it on iOS. And its WASM story is weak

christophilus3mo ago

As an aside, I just jumped around and read a few articles. This entire blog looks excellent. I’m going to have to spend some time reading it. I didn’t know about Postgres’s range types.

pak9rabid3mo ago

Range types are a godsend when you need to calculate things like overlapping or intersecting time/date ranges.

zachrip3mo ago

Can you give a real world example?

2 more replies

oulipo23mo ago

Assuming I'd like to replicate my production database for either staging, or to test migrations, etc,

and that most of my data is either:

- business entities (users, projects, etc)

- and "event data" (sent by devices, etc)

where most of the database size is in the latter category, and that I'm fine with "subsetting" those (eg getting only the last month's "event data")

- copy all my business tables (any update to those after currentTimestamp would be ignored magically even if they happen during the copy) - copy a subset of my event data (same constraint)

what's the best way to do this?

gavinray3mo ago

You can use "psql" to dump subsets of data from tables and then later import them.

Something like:

  psql <db_url> -c "\copy (SELECT * FROM event_data ORDER BY created_at DESC LIMIT 100) TO 'event-data-sample.csv' WITH CSV HEADER"

https://www.postgresql.org/docs/current/sql-copy.html

It'd be really nice if pg_dump had a "data sample"/"data subset" option but unfortunately nothing like that is built in that I know of.

peterldowns3mo ago

https://github.com/peterldowns/pgmigrate

oulipo23mo ago

Indeed, but is there a way to do it as a "point in time", eg do a "virtual checkpoint" at a timestamp, and do all the copy operations from that timestamp, so they are coherent?

1f973mo ago

aws supports this as well: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

horse6663mo ago

nateroling3mo ago

This is on the cluster level, while the article is talking about the database level, I believe.

francislavoie3mo ago

Is anyone aware of something like this for MariaDB?

proaralyst3mo ago

You could use LVM or btrfs snapshots (at the filesystem level) if you're ok restarting your database between runs

francislavoie3mo ago

2 more replies

briffle3mo ago

LVM snapshots work well. Used it for years with other database tools.. But make sure you allocate enough write space for the COW.. when the write space fills up, LVM just 'drops' the snapshot.

pak9rabid3mo ago

hu33mo ago

As a consultant, I saw many teams doing that and it works well.

The only detail is that autoincrements (SEQUENCEs for PotgreSQL folks) gets bumped even if the transaction rollsback.

So tables tend to get large ids quickly. But it's just dev database so no problem.

francislavoie3mo ago

Unfortunately a lot of our tests use transactions themselves because we lock the user row when we do anything to ensure consistency, and I'm pretty sure nested transactions are still not a thing.

1 more reply

fanf23mo ago

This doesn’t work for testing migrations because MySQL/MariaDB doesn’t support DDL inside transactions, unlike PostgreSQL.

1 more reply

tudorg3mo ago

This is really cool and I love to see the interest in fast clones / branching here.

mvcosta913mo ago

It looks very interesting for integration tests

radimmOP3mo ago

OP here - yes, this is my use case too: integration and regression testing, as well as providing learning environments. It makes working with larger datasets a breeze.

febed3mo ago

If possible could you share a repo/gist with a working docker example? I’m curious how the instant clone world work there.

presentation3mo ago

drakyoko3mo ago

would this work inside test containers?

radimmOP3mo ago

OP here - still have to try (generally operate on VM/bare metal level); but my understanding is that ioctl call would get passed to the underlying volume; i.e. you would have to mount volume

odie55333mo ago

I use CREATE DATABASE dbname TEMPLATE template1; inside test containers. Have not tried this new method yet.

TimH3mo ago

Looks like it would probably be quite useful when setting up git worktrees, to get multiple claude code instances spun up a bit more easily.

horse6663mo ago

This is really cool, looking forward to trying it out.

Obligatory mention of Neon (https://neon.com/) and Xata (https://xata.io/) which both support “instant” Postgres DB branching on Postgres versions prior to 18.

eatsyourtacos3mo ago

I still cannot reliably restore any Postgres DB with the TimescaleDB extensions on it.. have tried a million things but still fails every time.

1a527dd53mo ago

Many thanks, this solves integration tests for us!

wayeq3mo ago

hmokiguess3mo ago

I’ve been a fan of Neon and it’s branching strategy, really handy thing for stuff like this.

tehlike3mo ago

Now i need to find a way to migrate from hydra columnar to pg_lake variants so i can upgrade to PG18.

j / k navigate · click thread line to collapse