1Why SWE-bench Verified no longer measures frontier coding capabilities (opens in new tab)(openai.com)10tedsanders1mo ago0
2METR estimates that GPT-5.2 has a 50%-time-horizon of around 6.6 hrs (opens in new tab)(twitter.com)2tedsanders1mo ago0