Something I missed until I scrolled back to the top and reread the page was this
> OpenAI's new o3 system - trained on the ARC-AGI-1 Public Training set
So yeah, the results were specifically from a version of o3 trained on the public training set
Which on the one hand I think is a completely fair thing to do. It's reasonable that you should teach your AI the rules of the game, so to speak. There really aren't any spoken rules though, just pattern observation. Thus, if you want to teach the AI how to play the game, you must train it.
On the other hand though, I don't think the o1 models nor Claude were trained on the dataset, in which case it isn't a completely fair competition. If I had to guess, you could probably get 60% on o1 if you trained it on the public dataset as well.