The relative memory wastefulness of many indexing algorithms also means that you can’t load the indexes into memory on machines with high storage density, they literally won’t fit. This leads to a page fault cascade in practice that causes a very rapid degradation of performance beyond a relatively ordinary scale.
Modern indexing algorithms (not the ones in the article) give up a small amount of selectivity and balanced trees in exchange for reducing the memory footprint of indexes by multiple orders of magnitude and greatly increasing write scalability. Given the storage bandwidth available in modern systems, that is a big win in terms of throughput. Balanced trees are an anachronism generally in current design; it doesn’t matter how unbalanced an index tree is if it fits in CPU cache while searching countless terabytes of data.
20 years ago, I was using all of the indexing structures in the article for high-performance systems. They started going out of style about 10 years ago and I honestly can’t remember the last time I worked on a new system that implemented them. That transition happened slowly and then all at once.
Edit: I found some exploration: https://www.usenix.org/conference/osdi14/technical-sessions/...
Xillinx recently released a product that enables this which it's super awesome and even affordable.
https://www.xilinx.com/applications/data-center/computationa...
> How many engineers does it take to make subscripting work?
My question: How many engineers does it take to have horizontal scroller on overflow?
I also noticed that ´width tech´ is the only lacking verb before tech without a specific describtion.
So short:
I think we should call width tech the ability to change pace, change quality and change the amount of code written.
Why: Because issues like these are making or breaking tech companies.
What do you think?
Oh yes!
Probably my favorite data structure. In the past I've used them instead of arrays for performance (latency) reasons when constructing ordered sets from randomly distributed elements that came in multiple batches. Insertion into a splay tree meant that no sorting step at the end was necessary and the overhead was lower than first collecting everything into an array and only sorting after all elements were inserted.
Their amortized performance characteristics in general are close to ideal and the self-tuning character is intriguing. The only downside is the possibility of getting into a degenerate state, but that can be avoided by adding some randomness. Some paper I read even suggested that splaying only occasionally actually improves performance.
If you use GUID keys, this problem magically goes away with zero effort.