undefined | Better HN

0 pointsZeroCool2u1y ago0 comments

The private evaluation set is private from the public/OpenAI so companies can't train on those problems and cheat their way to a high score by overfitting.

0 comments

jsheard1y ago

If the models run on OpenAIs servers then surely they could still see the questions being put into it if they wanted to cheat? That could only be prevented by making the evaluation a one-time deal that can't be repeated, or by having OpenAI distribute their models for evaluators to run themselves, which I doubt they're inclined to do.

1 more reply

j / k navigate · click thread line to collapse

0 comments

jsheard1y ago

1 more reply

j / k navigate · click thread line to collapse