I havn't wrapped my head around how this helps speed up queries while data is being ingested.
Adtech is an example of a sector that benefits from this...they slice and dice datasets a lot to target ad campaigns and such. Being able to do that quickly is useful.
I guess it's easy for me to visualize both row and column based. Im struggling with the bitmaps concept.
000 - whatever needs to associate with animals, but has no associations currently
001 - whatever it is is associated with having a "mouse" included
111 - whatever it is is associated with having a "dog", a "cat" and a "mouse" included
In the past, high cardinality data sets weren't good for storing in binary form, or a binary index, but nowadays there are ways around this. So, that list of animals could be quite large.
The primary reason it's so much faster is that many CPUs nowadays can do 10s of lookups in a single instruction cycle. That makes them extremely fast.
Hopefully make it easy to find all the bills that your specific congress person was involved in for example.
FeatureBase could be the "feature store" in the middle of the batch prediction section's diagram, or simply be a drop-in replacement for the model's registry.
But really it's useful anytime you need low latency analytics on fresh data.