When you ask an LLM "what is 2 + 2?" and it says "2 + 2 = 4", it looks like it's recognizing two numbers and the addition operation, and performing a calculation. It's not. It's finding a common response in its training data and returning that. That's why you get hallucinations on any uncommon math question, like multiplying two random 5 digit numbers. It's not carrying out the logical operations, it's trying to extract the an answer by next token prediction. That's not reasoning.
When you ask "will water freeze at 27F?" and it replies "No, the freezing point of water is 32F", what's happening is that it's not recognizing the 27 and 32 are numbers, that a freezing point is an upper threshold, and that any temperature lower than that threshold will therefore also be freezing. It's looking up the next token and finding nothing about how 27F is below freezing.
Again, it's not reasoning. It's not exercising any logic. Its huge training data set and tuned proximity matching helps it find likely responses, and when it seems right, that's about the token relationship pre-existing in the training data set.
That it occasionally breaks the rules of chess just shows it has no concept of those rules, only that the next token for a chess move is most likely legal because most of its chess training data is of legal games, not illegal moves. I'm unsurprised to find that it can beat an average player if it doesn't break the rules: most chess information in the world is about better than average play.
If an LLM came up with a proof no one had seen, but it checks out, that doesn't prove it's reasoning either, just because it's next token prediction that came up with it. It found token relationships no one had noticed before, but that's inherent in the training data, and not a reflective intelligence doing logic.
When we discuss things like reinforcement learning and chain of reasoning, what we're really talking about are ways of restricting/strengthening those token relationships. It's back-tuning of the training data. Still not doing logic.