undefined | Better HN

0 pointstux31y ago0 comments

Although the answer isn't sent, so it would have to be a very deliberate effort to fish those out of the API chatter and find the right domain expert with 4-10 hours to spend on cracking it

Just letting the AI train on its own wrong output wouldn't help. The benchmark already gives them lots of time for trial and error.

0 comments

youoy1y ago

Why do people still insist that this is unlikely? Like assuming that the company that payed 15M for chat.com does not have some spare change to pay some graduate students/postdocs to solve some math problems. The publicity of solving such benchmark would definitely raise the valuation so it would 100% be worth it for them...

llm_trw1y ago

Any benchmark which isn't dynamically generated is useless for that very reason.

rl31y ago

Simple: I highly doubt they're willing to risk a scandal that would further tarnish their brand. It's still reeling from last year's drama, in addition to a spate of high-profile departures this year. Not to mention a few articles with insider sources that aren't exactly flattering.

aiono1y ago

I doubt it would be seen as scandal. They can simply generate training data for these questions just like how they generate for other problems. Only difference is probably pay rate is much higher for this kind training data than most other areas.

IAmGraydon1y ago

You’re not thinking about the other side of the equation. If they win (becoming the first to excel at the benchmark), they potentially make billions. If they lose, they’ll be relegated to the dustbin of LLM history. Since there is an existential threat to the brand, there is almost nothing that isn’t worth risking to win. Risking a scandal to avoid irrelevance is an easy asymmetrical bet. Of course they would take the risk.

1 more reply

EGreg1y ago

Parallel construction

Doesnt cause too much scandal lol

j / k navigate · click thread line to collapse