So back to the questions of "What is knowing?" "Are talking like someone with theory of mind and having a theory of mind the same thing?"
If your argument is that the only way to answer this it to have a first person experience of that consciousness then that's not a scientific question. No one will ever have one for an LLM or any other AI. It's like asking "What's happening right now outside of the observable universe?". If it can't impact us, it's irrelevant to science. If that ever changes it will become relevant, but until then it's not a scientific question. Similarly no person can ever have a first person experience of the consciousness of an LLM, so anything that requires being the LLM isn't relevant.
So that means the only relevant question is what distinction can outside observers make between an agent talking like a theory of mind and having a theory of mind. And given a high enough accuracy / fidelity of responses I think we're only forced to conclude one of two things: 1. Something that is able to simulate having a theory of mind sufficiently well does actually have a theory of mind. OR 2. I am the only person on the planet with a theory of mind, and all of you are all just simulating having but don't actually have one.
It's all "Searle's Chinese room" and "What consciousness is" discussions all over again. And from a scientific point of you either you get into the "it must be implemented identically as me to count" (which is as wrong as saying an object must flap its wings to fly), or you have to conclude the room plus the person combined are knowledgeable and conscious.
But:
- In this context, following on the whole 2nd half of the 20th century where cognitive science and psychology moved past behaviorism and sought explanations of the _mechanisms_ underlying mental phenomena, a scientific discussion doesn't have to restrict itself to only considering what the LLM says. Neither we, nor the LLM are black boxes. Evidence of _how_ we do what we do is part of scientific inquiry.
- But the LLM does _not_ reproduce all the behaviors of an agent with a theory of mind. A two year-old with a developing theory of mind may try to hide food they don't want to eat. A 4-year-old playing hide-and-seek picks locations where they think their play-partner won't look. They take _actions_ which are appropriate for their goals and context which require consideration of the goals of others. The LLM shows elaborate behaviors in one dimension, in which it has been extensively trained. It has no capacity to do anything else, or even receive exposure to non-linguistic contexts.
I am in no way arguing that only meat-based minds can "know". I'm saying that the data, training regime and model structure used for LLMs specifically is extremely impoverished, in that we show it language but no other representation of the things language refers to. Similarly, image-generating AIs know what images look like, but they don't know how bodies or physical objects interact, because they have never been exposed to them. Of _course_ we get LLMs that hallucinate and image-generators that produce messed up bodies.
On the other hand, there are some pretty cool reinforcement-learning results where agents show what looks like cooperation, develop adversarial strategies, etc. There's experiments where software agents collaboratively invent a language to refer to objects in their (virtual) environment to accomplish simple tasks. I think there are a lot of near and medium-term possibilities coming from multi-modal models (i.e. can models trained on related text, images, audio, video) and RL which could yield knowledge of a kind that LLMs simply do not have.
That presupposes that our existing tools for detecting the presence of ToM are 100% accurate. Might it be possible that they are imprecise and it’s only now that their critical flaws have been exposed?
And what is “knowing”? If I know that a Mæw tends to nạ̀ng bn a S̄eụ̄̀x, isn’t that the first thing I’ve learned? And couldn’t I continue to learn other properties of Mæws? How many do I need to learn to “know” what a Mæw is?
As for how you would test it, I think one-shot learning would get one closer to proving understanding.
And why do you think "feeling of a cat" cannot be encoded as a stream of tokens?