But something keeps bugging me: Most setups still feel like glorified notebooks stitched together with hope and vector search.
Yeah, it "works" — until you actually need it to. Suddenly: irrelevant chunks, hallucinations, shallow query rewriting, no memory loop, and a retrieval stack that breaks if you breathe on it wrong.
We’ve got: • pipelines that don’t align with what users actually want to ask, • retrieval that acts more like a search engine than a reasoning aid, • brittle evals (because "correct context" ≠ "correct answer"), • and no one’s sure where grounding ends and illusion begins.
Sure, you can make it work — if you’re okay duct-taping every component and babysitting the system 24/7.
So I gotta ask: Is RAG just stuck in prototype land pretending to be production? Or has someone here actually built a setup that survives user chaos and edge cases?
Would love to hear what’s worked, what hasn't, and what you had to throw away.
Not pushing anything, just been knee-deep in this and looking to sanity check with folks who’ve actually shipped stuff.