The document retrieval problem for RAG is basically a case for information retrieval
and there are simpler solutions to do so. Vector embeddings are
still useful,
but they should be used in a later stage of the IR pipeline and not as the
first stage retrieval, for which there are simpler and more performant solutions.
Github link to notebook, and blog post:
https://github.com/xetdata/RagIRBench/