undefined | Better HN

0 pointsint_19h2y ago0 comments

That is exactly what people are expecting, and largely because of misleading metrics thrown around to claim ridiculous things like e.g. Vicuna-13b being nearly as good as GPT-3.5. It even shows up in the comments here, and if you go to any tangentially related subreddit, that's the kind of stuff that gets told as "everybody knows" to people setting up a local LLM for the first time.

0 comments

evrydayhustling2y ago

The Vicuna headline was def an overreach, although the main text admits pretty readily that their performance test (asking ChatGPT to evaluate quality) is not rigorous [1]. I'm sure that has set a lot of people pontificating about AGI with tiny models, but I can't imagine anyone who has worked directly with fine tuning having that impression.

The comments I see here are not about that. They are about small models succeeding at specific tasks, which is affirmed by this paper. Most applications of LLMs are not general purpose chat bots, so this is not bad news for most of the distill/fine tune community.

[1] https://lmsys.org/blog/2023-03-30-vicuna/

j / k navigate · click thread line to collapse

0 pointsint_19h2y ago0 comments

0 comments

evrydayhustling2y ago

[1] https://lmsys.org/blog/2023-03-30-vicuna/

j / k navigate · click thread line to collapse