>The retrieval part is way more important.
I don't agree with this - at Intercom we've put a lot of work into our Fin chatbot, which uses a RAG architecture, and we're still using GPT-4 for the generation part.
GPT-4 is a really powerful and expensive model but we find we need this power to 1) reduce hallucinations acceptably, and 2) keep the quality of inferences made using the retrieved text high.
Now, our bot is answering customer support questions unsupervised - maybe it'd be different for a human in the loop system - but at least in our case, we feel we need a very powerful generation model to reduce errors, even after having benchmarked this thoroughly.
We've also done work on the retrieval end of things, including a customised model, but found the generation side is where we need the most capable models.