Hitchhiker trees: functional, persistent, off-heap sorted maps (opens in new tab)

(github.com)

150 pointshadronzoo9y ago32 comments

32 comments

Those are very cool, and functional too!

> As it turns out, after I implemented the fractal tree, I spoke with a former employee of Tokutek, a company that commercialized fractal tree indices. That person told me that we’d actually implemented fractal reads identically!

That's not necessarily a good thing. Last I heard Tokutek patented that. Not a laywer, from what I understand patents specifically pertain to "implementation".

Not sure what happened after they were acquired by Percona, but, as I am in US I would personally stay away from it. Even if current owner doesn't want to enforce the patent, the next one might.

dgrnbrg9y ago

Creator here. I would absolutely agree with this, but let me shed a bit more light to clarify: the underlying idea of B+ trees with inline caches is not patented (or maybe it was, but a long time ago?). Tokutek's patents are all focused on the methods they use to achieve concurrent operations on a single tree--they have a very interesting & novel locking protocol.

The Hitchhiker tree completely avoids their patents, since it makes very different decisions in order to become purely functional.

rdtsc9y ago

Thanks for explaining. That makes sense.

Great work by the way! Really like the description of the code structure on the README page.

If patent is indeed not an issue, I might play with translating that to Erlang. You already did the hard work and also made it functional. I'd have to learn to read Clojure.

Speaking of functional, I was surprised to find the other day, Erlang's queue module implements an Okasaki API, complete with funny function names like deah, liat, and snoc.

http://erlang.org/doc/man/queue.html

xytop9y ago

> Not a laywer, from what I understand patents specifically pertain to "implementation".

There should be identical blocks of code which I doubt they have. I think by "identically" they mean the same thought/method.

rdtsc9y ago

Identical blocks of code, I think make sense for copyright cases -- "they copied our code, look here it is lines 12 through 25". Instead I believe patents are about a particular implementation of an idea.

2 more replies

rix0r9y ago

Funny: I implemented the exact same thing, with a focus on AWS and pagination (albeit less well-documented):

https://github.com/rix0rrr/libbruce/blob/master/README.md

dgrnbrg9y ago

That is very cool! When you end up doing reads, do you compute all the pending writes along the path from root to leaf, and then run queries on the projection?

rix0r9y ago

For index-seeks, yes.

For key-seeks, it's a normal tree walk to the leaf (unless an INSERT is found at a higher level, iirc)

In practice, the blocks are so huge that any tree is going to have at most depth 3, and the root page is likely in cache, so 1 or 2 s3 fetches (= roughly 200ms, let's say)

I've been toying with the idea of putting the root page (or maybe the top 2 levels of pages) in DynamoDB, which would make them fast even if they weren't in cache.

anorwell9y ago

I'm not sure I understand this. The claim is that "we dramatically improve the performance of insertions without hurting the IO cost of queries" by writing to the root's event log. However, it seems to me that this is only delaying the eventual write to the leaf node, not avoiding it. As more items are written, eventually the log will overflow and write to its children, and so on, until it is written to the leaf. This is what would happen in a regular B+ tree anyway, so what has been gained?

What am I missing?

dgrnbrg9y ago

It's not just the root's event log--it's all the event logs. The deferrals allow us to batch writes in an optimized way, and do so with a fully incremental algorithm. In the IO cost model, each "block" read or costs 1 IOP to perform, so we can reduce the IOP cost of writes by a factor of 100-1000x, but reads will still have the same # of nodes to fetch. The event logs augmenting the tree are an effective IO optimization on a B+ tree.

anorwell9y ago

Thanks, makes sense.

dgreensp9y ago

How many bytes is a node, in practice? If you stuff hundreds or thousands of pointers in a node, plus a bunch of log, isn't that a lot of data to clone when you do a path-copy? It seems like there's a major trade-off there, unless you always write in large batches. I suppose you could have fast "cons-ing" of events onto the root log without any copying. It would be interesting to know what choices lead to good performance in a sample use case.

dgrnbrg9y ago

The code actually includes a benchmarking tool that's meant to help in figuring out this decision. Once you've selected your key size (or distribution), value size (or distribution), and backend, you can play with these factors. Intuitively, targeting "block" size should improve perf. So, 4k-1MB if you're in memory or local, 1.5k-9k if you're over the network (and depending on configuration).

I did some work to split nodes based on size rather than number of children, but that requires accurate & fast size estimation of the decompressed objects, which isn't possible in general.

falcolas9y ago

> the best-performing data structure for looking up sorted keys cannot do those queries faster than O(log(n))

Feels like a digression off the actual tree structure, but isn't this incorrect, since a hash map can do key lookups in O(1)? With caveats, of course.

It's a great idea over all, but I'd also be curious how well overlaying an event log on a cuckoo hash would work in comparison.

dmbarbour9y ago

A hashmap isn't sorted keys. But 'O(log(n))' is still incorrect. The fastest you can look up information physically represented in a 3D universe is 'O(cube-root(N))'. That applies to radom access memory and hashmaps, too. Cf. Myth of RAM [1].

[1] http://www.ilikebigbits.com/blog/2014/4/21/the-myth-of-ram-p...

dgrnbrg9y ago

Creator here. As other commenters noted, this data structure only requires keys to be sortable, not hashable, and the best sorted performance we can achieve is O(log(n)).

That said, when a Cuckoo hash gets very full and bounces entries around a lot, there might be an advantage to buffering operations and choosing insertion patterns that reduce the batch's insertion time. Then again, Cuckoo hashes already perform so well for situations they're designed for, so it's hard to improve them with an event log overlay.

continuational9y ago

It's a common misconception that hashmaps have constant time lookups. In order for the lookup to be constant time, the hash function must output enough bits to avoid causing too many collisions. For example, 16 bits is too little for a hash map with a million keys. In fact, the hash function output size should be O(log(n)) where n is the number of entries. Processing log(n) bits takes at least O(log(n)) time. QED

jasonwatkinspdx9y ago

> Processing log(n) bits takes at least O(log(n)) time.

No, because log(n) might be smaller than the word size of the machine, which is the common case for hash table implementations.

1 more reply

cristiandonosoc9y ago

keyword here is sorted. hash map is unordered

Hugie9y ago

Does someone know of an existing visualization of this algorithm, for better understanding?

dgrnbrg9y ago

Creator here. There's no visualizations of this yet, but I'll be speaking about it this year at Strange Loop. For that talk, I'll be creating visualizations, so stay tuned!

adamretter9y ago

Is it possible to use this from Java directly, or does the Clojure API make this difficult?

dgrnbrg9y ago

Creator here. You could use this from Java using the Clojure API for Java; however, many of the extension hooks use Clojure protocols. I would recommend writing a shim to Java for your particular application, so that your data types are represented in the way you want.

sievebrain9y ago

That'd required the person doing the using to know Clojure, right?

1 more reply

tuddman9y ago

very cool!

any idea how this would compare to clojure-wrapping

https://github.com/OpenHFT/Chronicle-Map ?

Seems like there are some similar design goals between the two.

j / k navigate · click thread line to collapse

32 comments

rdtsc9y ago

Those are very cool, and functional too!

That's not necessarily a good thing. Last I heard Tokutek patented that. Not a laywer, from what I understand patents specifically pertain to "implementation".

Not sure what happened after they were acquired by Percona, but, as I am in US I would personally stay away from it. Even if current owner doesn't want to enforce the patent, the next one might.

dgrnbrg9y ago

The Hitchhiker tree completely avoids their patents, since it makes very different decisions in order to become purely functional.

rdtsc9y ago

Thanks for explaining. That makes sense.

Great work by the way! Really like the description of the code structure on the README page.

If patent is indeed not an issue, I might play with translating that to Erlang. You already did the hard work and also made it functional. I'd have to learn to read Clojure.

Speaking of functional, I was surprised to find the other day, Erlang's queue module implements an Okasaki API, complete with funny function names like deah, liat, and snoc.

http://erlang.org/doc/man/queue.html

xytop9y ago

> Not a laywer, from what I understand patents specifically pertain to "implementation".

There should be identical blocks of code which I doubt they have. I think by "identically" they mean the same thought/method.

rdtsc9y ago

2 more replies

rix0r9y ago

Funny: I implemented the exact same thing, with a focus on AWS and pagination (albeit less well-documented):

https://github.com/rix0rrr/libbruce/blob/master/README.md

dgrnbrg9y ago

That is very cool! When you end up doing reads, do you compute all the pending writes along the path from root to leaf, and then run queries on the projection?

rix0r9y ago

For index-seeks, yes.

For key-seeks, it's a normal tree walk to the leaf (unless an INSERT is found at a higher level, iirc)

In practice, the blocks are so huge that any tree is going to have at most depth 3, and the root page is likely in cache, so 1 or 2 s3 fetches (= roughly 200ms, let's say)

I've been toying with the idea of putting the root page (or maybe the top 2 levels of pages) in DynamoDB, which would make them fast even if they weren't in cache.

anorwell9y ago

What am I missing?

dgrnbrg9y ago

anorwell9y ago

Thanks, makes sense.

dgreensp9y ago

dgrnbrg9y ago

I did some work to split nodes based on size rather than number of children, but that requires accurate & fast size estimation of the decompressed objects, which isn't possible in general.

falcolas9y ago

> the best-performing data structure for looking up sorted keys cannot do those queries faster than O(log(n))

Feels like a digression off the actual tree structure, but isn't this incorrect, since a hash map can do key lookups in O(1)? With caveats, of course.

It's a great idea over all, but I'd also be curious how well overlaying an event log on a cuckoo hash would work in comparison.

dmbarbour9y ago

[1] http://www.ilikebigbits.com/blog/2014/4/21/the-myth-of-ram-p...

dgrnbrg9y ago

Creator here. As other commenters noted, this data structure only requires keys to be sortable, not hashable, and the best sorted performance we can achieve is O(log(n)).

continuational9y ago

jasonwatkinspdx9y ago

> Processing log(n) bits takes at least O(log(n)) time.

No, because log(n) might be smaller than the word size of the machine, which is the common case for hash table implementations.

1 more reply

cristiandonosoc9y ago

keyword here is sorted. hash map is unordered

Hugie9y ago

Does someone know of an existing visualization of this algorithm, for better understanding?

dgrnbrg9y ago

Creator here. There's no visualizations of this yet, but I'll be speaking about it this year at Strange Loop. For that talk, I'll be creating visualizations, so stay tuned!

adamretter9y ago

Is it possible to use this from Java directly, or does the Clojure API make this difficult?

dgrnbrg9y ago

sievebrain9y ago

That'd required the person doing the using to know Clojure, right?

1 more reply

tuddman9y ago

very cool!

any idea how this would compare to clojure-wrapping

https://github.com/OpenHFT/Chronicle-Map ?

Seems like there are some similar design goals between the two.

j / k navigate · click thread line to collapse