I agree with you to an extent, but the difference is in how the solution is derived.
The LLM has no understanding of the physical length of 50m, nor is it capable of doing calculations, without relying on an external tool. I.e. it has no semantic understanding of any of the output it generates. It functions purely based on weights of tokens that were part of its training sets.
I asked Sonnet 4.5 the bat and ball question. It pretended to do some algebra, and arrived at the correct solution. It was able to explain why it arrived at that solution, and to tell me where the question comes from. It was obviously trained on this particular question, and thousands of others like it, I'm sure. Does this mean that it will be able to answer any other question it hasn't been trained on? Maybe, depending on the size and quality of its training set, the context, prompt, settings, and so on.
And that's my point: a human doesn't need to be trained on specific problems. A person who understands math can solve problems they've never seen before by leveraging their understanding and actual reasoning and deduction skills. We can learn new concepts and improve our skills by expanding our mental model of the world. We deal with abstract concepts and ideas, not data patterns. You can call this gatekeeping if you want, but it is how we acquire and use knowledge to exhibit intelligence.
The sheer volume of LLM training data is incomprehensible to humans, which is why we're so impressed that applied statistics can exhibit this behavior that we typically associate with intelligence. But it's a simulation of intelligence. Without the exorbitant amount of resources poured into collecting and cleaning data, and training and running these systems, none of this would be possible. It is a marvel of science and engineering, to be sure, but the end product is a simulation.
In many ways, modern LLMs are not much different from classical expert systems from decades ago. The training and inference are much more streamlined and sophisticated now; statistics and data patterns replaced hand-crafted rules; and performance can be improved by simply scaling up. But at their core, LLMs still rely on carefully curated data, and any "emergent" behavior we observe is due to our inability to comprehend patterns in the data at this scale.
I'm not saying that this technology can't be useful. Besides the safety considerations we're mostly ignoring, a pattern recognition and generation tool can be very useful in many fields. But I find the narrative that this constitutes any form of artificial intelligence absurd and insulting. It is mass gaslighting promoted by modern snake oil salesmen.