To use chess as an example. Humans sometimes play illegal moves. That does not mean Humans cannot reason. It is an instance of failing to show proof of reasoning. Not a proof of the inability to reason.
The argument is not "here's one failure case, therefore they don't reason". The argument is that systematically if you given an LLM problem instances outside training sets in domains with clear structural rules, they will fail to solve them. The argument then goes that they must not have an actual model or understanding of the rules, as they seem to only be capable of solving problems in the training set. That is, they have failed to figure out how to solve novel problem instances of general problem structures using logical reasoning.
Their strict dependence on having seen the exact or extremely similar concrete instances suggests that they don't actually generalize—they just compute a probability based on known instances—which everyone knew already. The problem is we just have a lot of people claiming they are capable of more than this because they want to make a quick buck in an insane market.
Given models can get things wrong even when the training data contains the answer, failure cannot show absence.
If you really wanted to ensure this with certainty just use the natural numbers to parameterize an aspect of a general problem. Assume there are N foo problems in the training set, then there is always a case N+1 parameter not in the training set, and you can use this as an indicative case. Go ahead and generate an insane number of these and eventually the probability that the Mth instance is not in the set is effectively 1.
Edit: Of course, it would not be perfect certainty, but it is probabilistically effectively certain. The number of problem instances in the set is necessarily finite, so if you go large enough you get what you need. Sure, you wouldn't be able to say there is a specific problem instance not in the set, but the aggregate results would evidence whether or no the LLm deals with all cases or (on assumption) just known ones.
Another algorithmic learning breakthrough, on the order of perceptrons, deep learning, transformers, etc is necessary to get anywhere near AGI.
PROMPT: Let's play a chess game. You start! e4 d5 2. exd5 e5 3. Bb5+ Bd7 4. Bxd7+ Nxd7 5. d4 Ngf6 6. dxe5 Qe7 7. f4 Qb4+ 8. Nc3 Nb6 9. exf6 Nc4 10. Qe2+ Be7 11. Qxe7+ Qxe7+ 12. Nge2 Qf8 13. fxg7 Qxg7 14. O-O Nd6 15.
RESPONSE: <played_move>15. Nxd5</played_move>
Most humans wouldn't even be able to play like this. Reasonably experienced chess players would play a lot of illegal moves.
The reason is that the encoding above requires cumulatively applying a series of actions to a two-dimensional model to which you apply rules that are described in a two-dimensional fashion.
It'd be interesting to see what the results would be if each prompt contained a two dimensional representation of the up to date board state.
Human fails at task due to not knowing the rules in perfect detail.
AI fails at task even though it knows the rules and could easily reproduce them for chess and dozens of chess variants.
"Look! The fallibility of humans rubbed off onto the AI, proving that they are more human and AGI than we give them credit to!"
Your statement that AI knows the rules would be considered anthropomorphising by many, I take it more to mean it 'knows' in the same sense that an election 'wants' to be at a lower energy level.
That said, humans who have written entire books on chess have been known to play illegal moves. That should count as proof by counterexample that your reasoning as to why humans fail at tasks is false.
But you misrepresented the test with respect to humans. Humans who know how to play chess don't make illegal moves.
> That said, humans who have written entire books on chess have been known to play illegal moves.
Citation needed. Unless you are talking about stories from when they first learned the rules?