DoltLab v0.2.0 (opens in new tab)

(dolthub.com)

41 pointsmjangle19854y ago15 comments

15 comments

Inspired by... but admittedly off-topic... I had recently been wondering how people keep a full auditable history of their data. I've used Hibernate Envers in the past or the now seemingly-defunct Temporal Tables extension for PostgreSQL. What are people using these days? Is DoltLab it, or are there more common solutions?

cgio4y ago

Can’t say what people are using but keen to hear from others. I am looking into this atm, no implementation yet, and some of the things I am reading about are (I’ll add dolt to this list)

Xtdb: https://xtdb.com/

Terminus: https://github.com/terminusdb/terminusdb

Nessie: https://projectnessie.org/

Dvc : https://dvc.org

Liquid base: https://liquibase.org/

Datasette: https://datasette.io/

Still framing in my mind how schema evolution, x-temporal and e.g. scd in data modelling, version control etc. tie in together in an approach.

sverhagen4y ago

Not what I was looking for, but a nice list. Just want to call out, what I've also seen in the article that a sibling comment pointed to: if you're gonna mention Liquibase, why would you not mention Flyway: https://flywaydb.org/

Good luck with your search.

1 more reply

richardbarosky4y ago

They wrote a blog about database versioning solutions a little while back, which distinguishes schema vs. data versioning tools. I can't say for sure how comprehensive it is, but presumbly it touches on the more well known solutions.

https://www.dolthub.com/blog/2021-09-17-database-version-con...

parentheses4y ago

I'm surprised you didn't include things like `dat`[0]

[0] https://github.com/dat-ecosystem-archive/dat

vorpalhex4y ago

This is actually a really cool idea, and while I would have avoided it due to it's SAAS nature, now I'm actually pretty willing to try it.

caffeine4y ago

I can’t figure out if this is a real product or a joke site? The name is confusing.

If it’s a real product it’s cool, I’ve wanted something like this for a while (currently I just use git repos full of JSON files but this would be better I think).

zachmu4y ago

Yup, it's a real product :)

If you want experiment quickly and aren't squeamish about putting your data on the internet, DoltHub is easier to get started with. DoltLab is just a (limited) self-hosted version of DoltHub.

richardbarosky4y ago

Nice. While I haven't used Dolt, I've definitely enjoyed reading the blog, sepecially the MySQL compatability stuff and some of the other fun ones (like the alcohol dispenser project). Good luck guys!

smoyer4y ago

Maybe it's a regional slur here in the U.S. but I'm wondering where the name originated.

reltuk4y ago

It's a play on git, which is itself regional slang.

parentheses4y ago

So many questions about the git-like semantics.

- how are shas created?

- assuming you hash the entire diff, can columns be ignored? e.g. timestamps or other "unimportant" data

- do any two insertions into a table "conflict"?

reltuk4y ago

To quickly answer these...

- it's content addresses / merkle DAG all the way down. The commit's hash is something like meta (author, description, timestamp) + parents hash + root value hash. The root value is composed of the schema, and pointers to table and index maps. Tables and indexes are merkle DAGs of the table data organized in a structure a bit like a B-tree, but with cut points chosen by a rolling value hash in order to probabilistically re-synchronize on incremental changes. Some details: https://www.dolthub.com/blog/2020-04-01-how-dolt-stores-tabl... , https://www.dolthub.com/blog/2020-05-13-dolt-commit-graph-an...

- currently table data is stored row major for full rows of the table and so diffing cannot efficiently ignore individual columns.

- direct conflicts are computed on a row-by-row basis, using the primary key of the row. And then constraints and foreign key references are maintained and validated across merges and edits.

HTH, happy to answer any further questions :).

hestefisk4y ago

Looks interesting. It’s like Git, ZFS, and MySQL all in one?

j / k navigate · click thread line to collapse

15 comments

sverhagen4y ago

cgio4y ago

Can’t say what people are using but keen to hear from others. I am looking into this atm, no implementation yet, and some of the things I am reading about are (I’ll add dolt to this list)

Xtdb: https://xtdb.com/

Terminus: https://github.com/terminusdb/terminusdb

Nessie: https://projectnessie.org/

Dvc : https://dvc.org

Liquid base: https://liquibase.org/

Datasette: https://datasette.io/

Still framing in my mind how schema evolution, x-temporal and e.g. scd in data modelling, version control etc. tie in together in an approach.

sverhagen4y ago

Good luck with your search.

1 more reply

richardbarosky4y ago

https://www.dolthub.com/blog/2021-09-17-database-version-con...

parentheses4y ago

I'm surprised you didn't include things like `dat`[0]

[0] https://github.com/dat-ecosystem-archive/dat

vorpalhex4y ago

This is actually a really cool idea, and while I would have avoided it due to it's SAAS nature, now I'm actually pretty willing to try it.

caffeine4y ago

I can’t figure out if this is a real product or a joke site? The name is confusing.

If it’s a real product it’s cool, I’ve wanted something like this for a while (currently I just use git repos full of JSON files but this would be better I think).

zachmu4y ago

Yup, it's a real product :)

If you want experiment quickly and aren't squeamish about putting your data on the internet, DoltHub is easier to get started with. DoltLab is just a (limited) self-hosted version of DoltHub.

richardbarosky4y ago

smoyer4y ago

Maybe it's a regional slur here in the U.S. but I'm wondering where the name originated.

reltuk4y ago

It's a play on git, which is itself regional slang.

parentheses4y ago

So many questions about the git-like semantics.

- how are shas created?

- assuming you hash the entire diff, can columns be ignored? e.g. timestamps or other "unimportant" data

- do any two insertions into a table "conflict"?

reltuk4y ago

To quickly answer these...

- currently table data is stored row major for full rows of the table and so diffing cannot efficiently ignore individual columns.

- direct conflicts are computed on a row-by-row basis, using the primary key of the row. And then constraints and foreign key references are maintained and validated across merges and edits.

HTH, happy to answer any further questions :).

hestefisk4y ago

Looks interesting. It’s like Git, ZFS, and MySQL all in one?

j / k navigate · click thread line to collapse