undefined | Better HN

0 pointsimranq1y ago0 comments

Based on the chart, the Kaggle SOTA model is far more impressive. These O3 models are more expensive to run than just hiring a mechanical turk worker. It's nice we are proving out the scaling hypothesis further, it's just grossly inelegant.

The Kaggle SOTA performs 2x as well as o1 high at a fraction of the cost

0 comments

derac1y ago

But does that Kaggle solution achieve human level perf with any level of compute? I think you're missing the forest for the trees here.

tripletao1y ago

The article says the ensemble of Kaggle solutions (aggregated in some unexplained way) achieves 81%. This is better than their average Mechanical Turk worker, but worse than their average STEM grad. It's better than tuned o3 with low compute, worse than tuned o3 with high compute.

There's also a point on the figure marked "Kaggle SOTA", around 60%. I can't find any explanation for that, but I guess it's the best individual Kaggle solution.

The Kaggle solutions would probably score higher with more compute, but nobody has any incentive to spend >$1M on approaches that obviously don't generalize. OpenAI did have this incentive to spend tuning and testing o3, since it's possible that will generalize to a practically useful domain (but not yet demonstrated). Even if it ultimately doesn't, they're getting spectacular publicity now from that promise.

cvhc1y ago

I was going to say the same.

I wonder what exactly o3 costs. Does it still spend a terrible amount of time thinking, despite being finetuned to the dataset?

j / k navigate · click thread line to collapse