> As it turns out, after I implemented the fractal tree, I spoke with a former employee of Tokutek, a company that commercialized fractal tree indices. That person told me that we’d actually implemented fractal reads identically!
That's not necessarily a good thing. Last I heard Tokutek patented that. Not a laywer, from what I understand patents specifically pertain to "implementation".
Not sure what happened after they were acquired by Percona, but, as I am in US I would personally stay away from it. Even if current owner doesn't want to enforce the patent, the next one might.
The Hitchhiker tree completely avoids their patents, since it makes very different decisions in order to become purely functional.
Great work by the way! Really like the description of the code structure on the README page.
If patent is indeed not an issue, I might play with translating that to Erlang. You already did the hard work and also made it functional. I'd have to learn to read Clojure.
Speaking of functional, I was surprised to find the other day, Erlang's queue module implements an Okasaki API, complete with funny function names like deah, liat, and snoc.
There should be identical blocks of code which I doubt they have. I think by "identically" they mean the same thought/method.
For key-seeks, it's a normal tree walk to the leaf (unless an INSERT is found at a higher level, iirc)
In practice, the blocks are so huge that any tree is going to have at most depth 3, and the root page is likely in cache, so 1 or 2 s3 fetches (= roughly 200ms, let's say)
I've been toying with the idea of putting the root page (or maybe the top 2 levels of pages) in DynamoDB, which would make them fast even if they weren't in cache.
What am I missing?
I did some work to split nodes based on size rather than number of children, but that requires accurate & fast size estimation of the decompressed objects, which isn't possible in general.
Feels like a digression off the actual tree structure, but isn't this incorrect, since a hash map can do key lookups in O(1)? With caveats, of course.
It's a great idea over all, but I'd also be curious how well overlaying an event log on a cuckoo hash would work in comparison.
[1] http://www.ilikebigbits.com/blog/2014/4/21/the-myth-of-ram-p...
That said, when a Cuckoo hash gets very full and bounces entries around a lot, there might be an advantage to buffering operations and choosing insertion patterns that reduce the batch's insertion time. Then again, Cuckoo hashes already perform so well for situations they're designed for, so it's hard to improve them with an event log overlay.
No, because log(n) might be smaller than the word size of the machine, which is the common case for hash table implementations.
any idea how this would compare to clojure-wrapping
https://github.com/OpenHFT/Chronicle-Map ?
Seems like there are some similar design goals between the two.