undefined | Better HN

0 pointsdanpalmer1y ago0 comments

Is there any other confirmation of the assumptions, other than the LLM behaviour, because that still feels like circular reasoning.

I think a similar claim could be levelled against other benchmarks or LLM evaluation tasks. One could say that the Turing test was designed to assess human intelligence, and LLMs pass it, therefore LLMs have human intelligence. This is generally considered to be false now, because we can plainly see that LLMs do not have intelligence in the same way as humans (yet? debatable, not the point), and instead we concluded that the Turing test was not the right benchmark. That's not to diminish its importance, it was hugely important as a part of AI education and possibly even AI development for decades.

ARC does seem to be pushing the boundaries, I'm just not convinced that it's testing a provable step change.

0 comments

JFingleton1y ago

I'm not sure that's quite correct about the Turing test. From Wikipedia:

"Turing did not explicitly state that the Turing test could be used as a measure of "intelligence", or any other human quality. He wanted to provide a clear and understandable alternative to the word "think", which he could then use to reply to criticisms of the possibility of "thinking machines" and to suggest ways that research might move forward."

j / k navigate · click thread line to collapse

0 pointsdanpalmer1y ago0 comments

Is there any other confirmation of the assumptions, other than the LLM behaviour, because that still feels like circular reasoning.

ARC does seem to be pushing the boundaries, I'm just not convinced that it's testing a provable step change.

0 comments

JFingleton1y ago

I'm not sure that's quite correct about the Turing test. From Wikipedia:

j / k navigate · click thread line to collapse