FrontierMath did this a year ago. Where is the novelty here?
> a solution is known, but is guaranteed to not be in the training set for any AI.
Wrong, as the questions were poses to commercial AI models and they can solve them.
This paper violates basic benchmarking principles.