undefined | Better HN

0 pointspydry10mo ago0 comments

Really? This paper cut through the same kind of bullshit with puzzles: https://ml-site.cdn-apple.com/papers/the-illusion-of-thinkin...

What do you think is so difficult about doing the same thing with coding problems?

0 comments

simonw10mo ago

I don't understand the connection between that paper and my comment.

pydryOP10mo ago

They created an environment to expose LLMs to problems and test their performance which were immune from benchmark hacking using puzzles.

Your comment was about how this was unreasonably hard (for coding challenges).

Anecdotally Ive seen LLMs do all sorts of amazing shit which was obviously drawn from their training set and fall flat on their faces doing simple coding tasks which are novel enough to not appear in the training set.

simonw10mo ago

That Apple paper mainly demonstrated that "reasoning" LLMs - with no access to additional tools - can't solve problems that deliberately exceed their token context length.

I don't think it has much relevance at all to a conversational about how good LLMs are at solving programming problems by running tools in a loop.

I keep seeing this idea that LLMs can't handle problems that aren't in their training data and it's frustrating because anyone who has spent significant time working with these systems knows that it obviously isn't true.

1 more reply

j / k navigate · click thread line to collapse

0 comments

simonw10mo ago

I don't understand the connection between that paper and my comment.

pydryOP10mo ago

They created an environment to expose LLMs to problems and test their performance which were immune from benchmark hacking using puzzles.

Your comment was about how this was unreasonably hard (for coding challenges).

simonw10mo ago

That Apple paper mainly demonstrated that "reasoning" LLMs - with no access to additional tools - can't solve problems that deliberately exceed their token context length.

I don't think it has much relevance at all to a conversational about how good LLMs are at solving programming problems by running tools in a loop.

1 more reply

j / k navigate · click thread line to collapse