undefined | Better HN

0 pointsimtringued1y ago0 comments

Except Llama 3 8b is a significant improvement over llama 2, which was basically so terrible that there was a whole community building fine tunes that are better than what the multi billion dollar company can do using a much smaller budget. With llama 3 8b things have shifted towards there being much less community fine-tunes that actually beat it. The fact that Mistral AI can still build models that beat it, means the company isn't falling too far behind a significantly better equipped competitor.

What's more irritating is that they decided to do quantization aware training for fp8. int8 quantization results in an imperceptible loss of quality that is difficult to pick up in benchmarks. They should have gone for something more aggressive like 4-bit, where quantization leads to a significant loss in quality.

0 comments

viridian1y ago

Not that you aren't correct overall in terms of difficulty, but llama3 definitely still has a handful of fine tunes that I'd say outperform the base model by quite a bit, like the hermes model from Nous research, and we're only going to see more as time goes on.

j / k navigate · click thread line to collapse

0 pointsimtringued1y ago0 comments

0 comments

viridian1y ago

j / k navigate · click thread line to collapse