undefined | Better HN

0 pointsglenstein2y ago0 comments

Pro is approximately in the middle between GPT 3.5 and GPT 4 on four measures (MMLU, BIG-Bench-Hard, Natural2Cod, DROP), it is closer to 3.5 on two (MATH, Hellaswag), and closer to four on the remaining two (GSM8K, HumanEval). Two one way, two the other way, and four in the middle.

So it's a split almost right down the middle, if anything closer to 4, at least if you assume the benchmarks to be of equal significance.

0 comments

daveguy2y ago

> at least if you assume the benchmarks to be of equal significance.

That is an excellent point. Performance of Pro will definitely depend on the use case given the variability between 3.5 to 4. It will be interesting to see user reviews on different tasks. But the 2 quarter lead time for Ultra means it may as well not be announced. A lot can happen in 3-6 months.

j / k navigate · click thread line to collapse

0 pointsglenstein2y ago0 comments

So it's a split almost right down the middle, if anything closer to 4, at least if you assume the benchmarks to be of equal significance.

0 comments

daveguy2y ago

> at least if you assume the benchmarks to be of equal significance.

j / k navigate · click thread line to collapse