undefined | Better HN

0 pointsimiric1mo ago0 comments

But this is not a trick question[1]. It's a straightforward question which any sane human would answer correctly.

It may trigger a particularly ambiguous path in the model's token weights, or whatever the technical explanation for this behavior is, which can certainly be addressed in future versions, but what it does is expose the fact that there's no real intelligence here. For all its "thinking" and "reasoning", the tool is incapable of arriving at the logically correct answer, unless it was specifically trained for that scenario, or happens to arrive at it by chance. This is not how intelligence works in living beings. Humans don't need to be trained at specific cognitive tasks in order to perform well at them, and our performance is not random.

But I'm sure this is "moving the goalposts", right?

[1]: https://news.ycombinator.com/item?id=47060374

0 comments

crimsoneer1mo ago

But this one isn't a trick question either right... it's just basic maths, and a quirk of how our brain works that means plenty of people don't engage the part of their brain that goes "I should stop and think this through", and just rush to the first number that pops into their head. But that number is wrong, and is a result of our own weird "training" (in that we all have a bunch of mental shortcuts we use for maths, and sometimes they lead us astray).

"A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?"

And yet 50% of MIT students fall for this sort of thing[1]. They're not unintelligent, it's just a specific problem can make your brain fail in weird specific ways. Intelligence isn't just a scale from 0-100, or some binary yes or no question, it's a bunch of different things. LLMs probably are less intelligent on a bunch of scales, but this one specific example doesn't tell you much that they have weird quirks just like we do.

[1] https://www.aeaweb.org/articles?id=10.1257/08953300577519673...

imiricOP1mo ago

I agree with you to an extent, but the difference is in how the solution is derived.

The LLM has no understanding of the physical length of 50m, nor is it capable of doing calculations, without relying on an external tool. I.e. it has no semantic understanding of any of the output it generates. It functions purely based on weights of tokens that were part of its training sets.

I asked Sonnet 4.5 the bat and ball question. It pretended to do some algebra, and arrived at the correct solution. It was able to explain why it arrived at that solution, and to tell me where the question comes from. It was obviously trained on this particular question, and thousands of others like it, I'm sure. Does this mean that it will be able to answer any other question it hasn't been trained on? Maybe, depending on the size and quality of its training set, the context, prompt, settings, and so on.

And that's my point: a human doesn't need to be trained on specific problems. A person who understands math can solve problems they've never seen before by leveraging their understanding and actual reasoning and deduction skills. We can learn new concepts and improve our skills by expanding our mental model of the world. We deal with abstract concepts and ideas, not data patterns. You can call this gatekeeping if you want, but it is how we acquire and use knowledge to exhibit intelligence.

The sheer volume of LLM training data is incomprehensible to humans, which is why we're so impressed that applied statistics can exhibit this behavior that we typically associate with intelligence. But it's a simulation of intelligence. Without the exorbitant amount of resources poured into collecting and cleaning data, and training and running these systems, none of this would be possible. It is a marvel of science and engineering, to be sure, but the end product is a simulation.

In many ways, modern LLMs are not much different from classical expert systems from decades ago. The training and inference are much more streamlined and sophisticated now; statistics and data patterns replaced hand-crafted rules; and performance can be improved by simply scaling up. But at their core, LLMs still rely on carefully curated data, and any "emergent" behavior we observe is due to our inability to comprehend patterns in the data at this scale.

I'm not saying that this technology can't be useful. Besides the safety considerations we're mostly ignoring, a pattern recognition and generation tool can be very useful in many fields. But I find the narrative that this constitutes any form of artificial intelligence absurd and insulting. It is mass gaslighting promoted by modern snake oil salesmen.

aplowe1mo ago

The 'semantic understanding' bottleneck you're describing might actually be a precision limit of the manifold on which computation occurs rather than a data volume problem. Humans solve problems they've never seen because they operate on a higher reasoning fidelity. We're finding that once a system quantizes to a 'ternary vacuum' (1.58-bit), it hits a phase transition into a stable universality class where the reasoning is a structural property of the grid, not just a data pattern. At that point, high-precision floating point and the need for millions of specific training examples become redundant.

j / k navigate · click thread line to collapse