undefined | Better HN

0 pointsdata_maan3mo ago0 comments

On the website https://1stproof.org/#about they claim: "This project represents our preliminary efforts to develop an objective and realistic methodology for assessing the capabilities of AI systems to autonomously solve research-level math questions."

Sounds to me to be a benchmark in all but a name. And they failed pretty terribly at achieving what they set out to do.

0 comments

bwfan1233mo ago

> And they failed pretty terribly at achieving what they set out to do.

Why the angst ? If the ai can autonomously solve these problems, isnt that a huge step forward for the field.

data_maanOP3mo ago

It's not angst. It's intense frustration that they 1) are not doing the science correctly, and 2) that others (e.g. FrontierMath) already did everything they claim to be doing, so we won't learn anything new here, but somehow 1stproof get all the credit.

bawolff3mo ago

Are they really trying to do science, or are they just trying to determine pragmatically whether or not current AI is useful for a research mathematician in their day to day job?

1 more reply

bwfan1233mo ago

> are not doing the science correctly

What do you mean ? These are top-notch mathematicians who are genuinely trying to see how these tools can help solve cutting edge research problems. Not toy problems like those in AIME/AMC/IMO etc. or other similar benchmarks which are gamed easily.

> that others (e.g. FrontierMath) already did everything they claim to be doing

You are kidding right ? FrontierMath benchmark [1] is produced by a startup whose incentives are dubious to say the least.

[1] https://siliconreckoner.substack.com/p/the-frontier-math-sca...

Unlike the AI hypesters, these are real mathematicians trying to inject some realism and really test the boundaries of these tools. I see this as a welcome and positive development which is a win-win for the ecosystem.

1 more reply

j / k navigate · click thread line to collapse

0 comments

bwfan1233mo ago

> And they failed pretty terribly at achieving what they set out to do.

Why the angst ? If the ai can autonomously solve these problems, isnt that a huge step forward for the field.

data_maanOP3mo ago

bawolff3mo ago

Are they really trying to do science, or are they just trying to determine pragmatically whether or not current AI is useful for a research mathematician in their day to day job?

1 more reply

bwfan1233mo ago

> are not doing the science correctly

> that others (e.g. FrontierMath) already did everything they claim to be doing

You are kidding right ? FrontierMath benchmark [1] is produced by a startup whose incentives are dubious to say the least.

[1] https://siliconreckoner.substack.com/p/the-frontier-math-sca...

1 more reply

j / k navigate · click thread line to collapse