undefined | Better HN

0 pointsoersted2y ago0 comments

Right thanks for the reminder, I added it

0 comments

Thanks, I don't see them being "well above GPT-4", merely 1 point? Also, no idea why one would want to exclude GPT-4-Turbo, the flagship "GPT-4" model, but w/e.

I also don't think they "beat Llama 3 8B"; their own abstract says "rivals that of models such as Mixtral 8x7B and GPT-3.5", "rivals" not even "beats".

Great model, but let's not overplay it.

oerstedOP2y ago

In the English category: GPT-4-0314 (ELO 1166), Llama 3 8B Instruct (ELO 1161), Mistral-Large-2402 (ELO 1151), GPT-4-0613 (ELO 1148).

You are right, I toned down the language, I got a bit overexcited, and I missed the difference in the versions of GPT-4. And LMSYS is a subjective benchmark for what users prefer, which I'm sure has weird inherent biases.

It's just that any signal of an 3.8B model being anywhere in the vicinity of GPT-4 is huge.

moralestapia2y ago

Yeah, GPT3.5, in a phone, at ~1,000 tokens/sec ... nice!

1 more reply

j / k navigate · click thread line to collapse

0 comments

moralestapia2y ago

Thanks, I don't see them being "well above GPT-4", merely 1 point? Also, no idea why one would want to exclude GPT-4-Turbo, the flagship "GPT-4" model, but w/e.

I also don't think they "beat Llama 3 8B"; their own abstract says "rivals that of models such as Mixtral 8x7B and GPT-3.5", "rivals" not even "beats".

Great model, but let's not overplay it.

oerstedOP2y ago

In the English category: GPT-4-0314 (ELO 1166), Llama 3 8B Instruct (ELO 1161), Mistral-Large-2402 (ELO 1151), GPT-4-0613 (ELO 1148).

It's just that any signal of an 3.8B model being anywhere in the vicinity of GPT-4 is huge.

moralestapia2y ago

Yeah, GPT3.5, in a phone, at ~1,000 tokens/sec ... nice!

1 more reply

j / k navigate · click thread line to collapse