undefined | Better HN

0 pointsImnimo1y ago0 comments

>Achieving human-level performance in the ARC benchmark, as well as top human performance in GPQA, Codeforce, AIME, and Frontier Math strongly suggests the model can potentially solve any problem at the human level if it possesses essential knowledge about it.

The article notes, "o3 still fails on some very easy tasks". What explains these failures if o3 can solve "any problem" at the human level? Do these failed cases require some essential knowledge that has eluded the massive OpenAI training set?

0 comments

nopinsight1y ago

Great point. I'd love to see what these easy tasks are and would be happy to revise my hypothesis accordingly. o3's intelligence is unlikely to be a strict superset of human intelligence. It is certainly superior to humans in some respects and probably inferior in others. Whether it's sufficiently generally intelligent would be both a matter of definition and empirical fact.

1 more reply

j / k navigate · click thread line to collapse

0 comments

nopinsight1y ago

1 more reply

j / k navigate · click thread line to collapse