undefined | Better HN

0 pointsnmca1y ago0 comments

We worked with ARC to run inference on the semi-private tasks last week, after o3 was trained, using an inference only API that was sent the prompts but not the answers & did no durable logging.

0 comments

idontknowmuch1y ago

What's your opinion on the veracity of this benchmark - given o3 was fine-tuned and others were not? Can you give more details on how much data was used to fine-tune o3? It's hard to put this into perspective given this confounder.

1 more reply

j / k navigate · click thread line to collapse

0 comments

idontknowmuch1y ago

1 more reply

j / k navigate · click thread line to collapse