undefined | Better HN

0 pointsandy992y ago0 comments

The smallest model your users agree meets their needs. It really depends.

The retrieval part is way more important.

I've used the original 13B instruction tuned llama2, quantized, and found it gives coherent answers about the context provided, ie the bottleneck was mostly getting good context.

When I played with long context models (like 16k tokens, and this was a few months ago, maybe they improved) they sucked.

0 comments

fergal_reid2y ago

>The retrieval part is way more important.

I don't agree with this - at Intercom we've put a lot of work into our Fin chatbot, which uses a RAG architecture, and we're still using GPT-4 for the generation part.

GPT-4 is a really powerful and expensive model but we find we need this power to 1) reduce hallucinations acceptably, and 2) keep the quality of inferences made using the retrieved text high.

Now, our bot is answering customer support questions unsupervised - maybe it'd be different for a human in the loop system - but at least in our case, we feel we need a very powerful generation model to reduce errors, even after having benchmarked this thoroughly.

We've also done work on the retrieval end of things, including a customised model, but found the generation side is where we need the most capable models.

andy99OP2y ago

That's interesting, thanks. My experience is with technical documentation Q&A, returning summaries and relevant passages. My takeaway was that the summary is basically as good as the passages. I do think overall response quality is very subjective and really depends on how it's being used, so whatever users do best with wins the day.

j / k navigate · click thread line to collapse