undefined | Better HN

0 pointskhafra9mo ago0 comments

My two claims:

1. OpenAI has been doing verifier-guided training since last year.

2. No SOTA model was trained without verified reward training for math and programming.

I supported the first claim with a document describing what OpenAI was doing last year; the extrapolation should have been straightforward, but it's easy for people who aren't tracking AI progress to underestimate the rate at which it occurs. So, here's some support for my second claim:

https://arxiv.org/abs/2507.06920 https://arxiv.org/abs/2506.11425 https://arxiv.org/abs/2502.06807

0 comments

troupo8mo ago

> the extrapolation should have been straightforward,

Indeed."By late next month you'll have over four dozen husbands" https://xkcd.com/605/

> So, here's some support for my second claim:

I don't think any of these links support the claim that "No SOTA model was trained without verified reward training for math and programming"

https://arxiv.org/abs/2507.06920: "We hope this work contributes to building a scalable foundation for reliable LLM code evaluation"

https://arxiv.org/abs/2506.11425: A custom agent with a custom environment and a custom training dataset on ~800 predetermined problems. Also "Our work is limited to Python"

https://arxiv.org/abs/2502.06807: The only one that somewhat obliquely refers to you claim

j / k navigate · click thread line to collapse