undefined | Better HN

0 pointscoleifer8y ago0 comments

I'm jealous you got to hear Dr. Hipp, that sounds cool. Would love to hear more about the circumstances :)

Regarding the LSM engine, you can find all the relevant implementation details here: https://sqlite.org/src4/doc/trunk/www/lsm.wiki#summary

> The in-memory tree is an append-only red-black tree structure used to stage user data that has not yet flushed into the database file by the system. Under normal circumstances, the in-memory tree is not allowed to grow very large.

0 comments

8ig88y ago

I stumbled on this video presentation by Dr Hipp recently (may have been from another HN comment). I really enjoyed it. Probably because he seems so enthusiastic and passionate.

SQLite: The Database at the Edge of the Network with Dr. Richard Hipp

https://m.youtube.com/watch?v=Jib2AmRb_rk

makmanalp8y ago

I got lucky! It was a class event, and he's known to speak at databases / data systems classes. He's a very fun (and opinionated!) speaker.

> The in-memory tree is an append-only red-black tree structure used to stage user data that has not yet flushed into the database file by the system.

Hmm, ok, so this contradicts my assumption. Actually, now that I think about it, other LSMs like rocksdb / leveldb work like this too (library-like model with some in-memory component when you "open" the database).

Anyway, without diving into the details of the code, there's other technical decisions that would affect this stuff.

One thing is how big your in-memory structure is (in relation to available memory and insertion workload) and how often you flush to disk is a key thing. Another thing is what your LSM tree looks like - aside from the data structure, how many tiers/levels you have is a big thing. I assume some of these are configurable parameters. E.g. rocksdb has an enormous set of parameters that handles this stuff. It's also annoying to tune.

I found this benchmark here that is illustrative: https://sqlite.org/src4/doc/trunk/www/lsmperf.wiki

The first graph is underwhelming, but when you adjust the buffers (look at the last graph) ~250k writes / second constant regardless of database size (this is why you want an LSM tree) is darned good! And this is on a spinny drive, not an SSD. Their "large" buffer sizes aren't even that large IMHO.

So maybe his mention that the LSM storage was underwhelming was overblown :-) I don't know.

Another difference is with other LSM-based systems that aren't just key-value, it's usually in the context of column stores: you keep a separate LSM for each column family (could be 1-n columns). But I can't think off the top of my head how this would cause a difference. Perhaps in how reads happen - the query engines work quite differently.

Anyway, my talk is cheap, I'm just guessing here, actually doing the analysis is the hard work :-) Also, I'm something of an amateur currently, so take my words with a grain of salt. Anyone else have any ideas re: this?

rurban8y ago

This is the other perf graph: https://sqlite.org/src4/wiki?name=LsmPerformance

rattray8y ago

What would be performance for SQLite3 in comparable scenarios? I don't see anything comparative on that benchmarks page.

makmanalp8y ago

I've gotten it to do around 40-50k inserts / sec but that's a different scenario - nfs drive, different table and indexes, different queries, different configuration, etc etc. Also I didn't know if he meant that the inserts were disappointing or the overall results were (e.g. a suite of tests including reads / writes of all sorts).

X86BSD8y ago

I just want to hijack this thread to say hipps other creation Fossil SCM is a great SCM. Better than git imo. Everyone should check it out.

petre8y ago

We've been using it for the last 3/4 years. It's great and really user friendly, with integrated help and a web interface - everything in a single binary. The trouble with it is that it gets slower once you have a lot of history and many files. I don't know if you can use it for huge things like the Linux kernel or the FreeBSD ports tree. I once tried to import the ports tree into fossil and gave up after 2G and an hour. It will import anything that can do git fast-export. Now it also imports svn dumps as well. Fossil is a very good replacement for SVN. You can set up a central repo where everyone syncs on commit and update.

1 more reply

interfixus8y ago

Why does everybody persist in calling the great man Dr.?

He refers to himself as D. Richard Hipp.

ptrott20178y ago

D is an initial but Dr. D. Richard Hipp has a PhD from Duke - graduated in 1992 (his thesis can be found here - its worth a read: http://bit.ly/2ygiDWx ) hence the Dr.

interfixus8y ago

Thank you. Always thought it was a weird comprehension error from the slightly unusual character-combo.

The weird error may have been mine.

makmanalp8y ago

Ooops!

j / k navigate · click thread line to collapse

0 comments

8ig88y ago

I stumbled on this video presentation by Dr Hipp recently (may have been from another HN comment). I really enjoyed it. Probably because he seems so enthusiastic and passionate.

SQLite: The Database at the Edge of the Network with Dr. Richard Hipp

https://m.youtube.com/watch?v=Jib2AmRb_rk

makmanalp8y ago

I got lucky! It was a class event, and he's known to speak at databases / data systems classes. He's a very fun (and opinionated!) speaker.

> The in-memory tree is an append-only red-black tree structure used to stage user data that has not yet flushed into the database file by the system.

Anyway, without diving into the details of the code, there's other technical decisions that would affect this stuff.

I found this benchmark here that is illustrative: https://sqlite.org/src4/doc/trunk/www/lsmperf.wiki

So maybe his mention that the LSM storage was underwhelming was overblown :-) I don't know.

rurban8y ago

This is the other perf graph: https://sqlite.org/src4/wiki?name=LsmPerformance

rattray8y ago

What would be performance for SQLite3 in comparable scenarios? I don't see anything comparative on that benchmarks page.

makmanalp8y ago

X86BSD8y ago

I just want to hijack this thread to say hipps other creation Fossil SCM is a great SCM. Better than git imo. Everyone should check it out.

petre8y ago

1 more reply

interfixus8y ago

Why does everybody persist in calling the great man Dr.?

He refers to himself as D. Richard Hipp.

ptrott20178y ago

D is an initial but Dr. D. Richard Hipp has a PhD from Duke - graduated in 1992 (his thesis can be found here - its worth a read: http://bit.ly/2ygiDWx ) hence the Dr.

interfixus8y ago

Thank you. Always thought it was a weird comprehension error from the slightly unusual character-combo.

The weird error may have been mine.

makmanalp8y ago

Ooops!

j / k navigate · click thread line to collapse