But does it matter if it "really, really" reasons in the human sense, if it's able to prove some famous math theorem or come up with a novel result in theoretical physics?
While beyond current motels, that would be the final test of AGI capability.
I've seen one Twitter thread from a mathematician who used an llm to come up with a new math result. Both coming up with the theorem statement and a unique proof,iirc.
Though to be clear, this wasn't a one shot thing - it was iirc a few months of back and forth chats with plenty of wrong turns too.
Then he used it as a random text generator, LLM is by far the most configurable and best random test generators we have. You can use that to generate random theorem noise and then try to work with that to find actual theorems, still doesn't replace mathematicians though.