undefined | Better HN

0 pointsdanielmarkbruce1y ago0 comments

Highly challenging for LLMs because it has nothing to do with language. LLMs and their training processes have all kinds of optimizations for language and how it's presented.

This benchmark has done a wonderful job with marketing by picking a great name. It's largely irrelevant for LLMs despite the fact it's difficult.

Consider how much of the model is just noise for a task like this given the low amount of information in each token and the high embedding dimensions used in LLMs.

0 comments

computerex1y ago

The benchmark is designed to test for AGI and intelligence, specifically the ability to solve novel problems.

If the hypothesis is that LLMs are the “computer” that drives the AGI then of course the benchmark is relevant in testing for AGI.

I don’t think you understand the benchmark and its motivation. ARC AGI benchmark problems are extremely easy and simple for humans. But LLMs fail spectacularly at them. Why they fail is irrelevant, the fact they fail though means that we don’t have AGI.

1 more reply

j / k navigate · click thread line to collapse

0 comments

computerex1y ago

The benchmark is designed to test for AGI and intelligence, specifically the ability to solve novel problems.

If the hypothesis is that LLMs are the “computer” that drives the AGI then of course the benchmark is relevant in testing for AGI.

1 more reply

j / k navigate · click thread line to collapse