Differential storage: A key building block for a DuckDB-based data warehouse (opens in new tab)

(motherduck.com)

98 pointsvgt2y ago11 comments

11 comments

I’m curious how their snapshotting and object storage implementation stacks up against SQLite’s offering:

https://sqlite.org/cloudsqlite/doc/trunk/www/index.wiki

SQLite’s object-per-page approach means you can fetch portions of the database as needed from a bucket, whereas the DuckDB snapshotting approach seems to imply that you’d need to fetch the whole database.

richieartoul2y ago

This is pretty cool. I'm a bit of a noob when it comes to stuff like FUSE. There is a bunch of commentary in the blog post about how when DuckDB does X, the Differential Storage implementation does Y. If the differential storage implementation just exposes itself as a filesystem though, how does it know what DuckDB is doing at the application layer? For example, how does it know from the filesystem layer that DuckDB is performing a snapshot?

vegetable262y ago

edit: author here

Our current FUSE implementation is specific to DuckDB and does make (as well as validate) some assumptions about DuckDB's access patterns to the underlying files. In this case - we know that currently DuckDB always truncate the WAL on successful checkpoint - which triggers a Differential Storage snapshot under the hood.

We are working on future-proofing this setup by removing our reliance on some of these assumptions and having our FUSE implementation act much more like a generic FS moving forward.

Though in general the truncation of the WAL is still usually a good time to snapshot the database, as the truncation means that the only "state" needed to reconstruct the current database is completely captured in the database file.

chrisjc2y ago

I'm not too familiar with FUSE, but I would imagine that you are doing something similar to registering a custom filesystem on duckdb, then intercepting certain filesystem activities to trigger all the magic described in the blog?

Also, do you think any of this is going to make its way back into the duckdb core, or perhaps even influencing the duckdb developers to make some of this native or easier (avoiding assumptions about what duckdb is doing)? Perhaps some kind of trigger on checkpoint/similar activities?

And btw, very interesting to read this announcement after reading through the S3 discussion yesterday.

https://news.ycombinator.com/item?id=39656657

richieartoul2y ago

Makes sense, very cool. Thanks!

TheGuyWhoCodes2y ago

I did something similar when I worked on an internal timeseries DB with optimization for long term and short term storage.

The EFS thing is a bit tricky. Once your credits are done it's super slow, so I bet they do provisioned but that's expensive.

The other thing was that multiple instances trying to reach the same EFS endpoint at the same time ,for the first time, will fail so you need to stagger it or retry and the docker mount plugin was a bit iffy.

That said cool concept.

cmollis2y ago

sounds a bit like what Iceberg does with writes on parquet.

vegetable262y ago

edit: author here

Yah that's a good point. We use a similar copy-on-write implementation as many other storage systems (such as Iceberg) that offer time-travel, branching, etc... where you have snapshots represented by pointers to immutable snapshot-layer or delta files (or however else you want to call them).

In our case, we wanted to provide customers with a very similar/seamless experience of using DuckDB on MotherDuck as they would with local DuckDB. To provide this parity we needed the ability to extend DuckDB's native storage to support these capabilities (time-travel, branching, etc...). This led us to implementing Differential Storage.

skadamat2y ago

Curious why FUSE over NFS?

klysm2y ago

I absolutely hate working with NFS - the impedance mismatch of what it does with the API it provides is terrible

zilti2y ago

Performance

j / k navigate · click thread line to collapse

11 comments

conradev2y ago

I’m curious how their snapshotting and object storage implementation stacks up against SQLite’s offering:

https://sqlite.org/cloudsqlite/doc/trunk/www/index.wiki

richieartoul2y ago

vegetable262y ago

edit: author here

We are working on future-proofing this setup by removing our reliance on some of these assumptions and having our FUSE implementation act much more like a generic FS moving forward.

chrisjc2y ago

And btw, very interesting to read this announcement after reading through the S3 discussion yesterday.

https://news.ycombinator.com/item?id=39656657

richieartoul2y ago

Makes sense, very cool. Thanks!

TheGuyWhoCodes2y ago

I did something similar when I worked on an internal timeseries DB with optimization for long term and short term storage.

The EFS thing is a bit tricky. Once your credits are done it's super slow, so I bet they do provisioned but that's expensive.

That said cool concept.

cmollis2y ago

sounds a bit like what Iceberg does with writes on parquet.

vegetable262y ago

edit: author here

skadamat2y ago

Curious why FUSE over NFS?

klysm2y ago

I absolutely hate working with NFS - the impedance mismatch of what it does with the API it provides is terrible

zilti2y ago

Performance

j / k navigate · click thread line to collapse