undefined | Better HN

0 pointsdata_maan3mo ago0 comments

> these are problems of some practical interest, not just performative/competitive maths.

FrontierMath did this a year ago. Where is the novelty here?

> a solution is known, but is guaranteed to not be in the training set for any AI.

Wrong, as the questions were poses to commercial AI models and they can solve them.

This paper violates basic benchmarking principles.

0 comments

offnominal3mo ago

> Wrong, as the questions were poses to commercial AI models and they can solve them.

Why does this matter? As far as I can tell, because the solution is not known this only affects the time constant (i.e. the problems were known for longer than a week). It doesn't seem that I should care about that.

data_maanOP3mo ago

Because the companies have the data and can solve them -- so providing the question to a company with the necessary manpower, one cannot guarantee anymore that the solution is not known, and not contained in the training sample.

j / k navigate · click thread line to collapse

0 comments

offnominal3mo ago

> Wrong, as the questions were poses to commercial AI models and they can solve them.

data_maanOP3mo ago

j / k navigate · click thread line to collapse