I'm more shocked that so many people seem unable to come to grips with the fact that something can be a next token predictor and demonstrate intelligence. That's what blows my mind, people unable to see that something can be more than the sum of its parts. To them, if something is a token predictor clearly it can't be doing anything impressive - even while they watch it do I'm impressive things.
Except LLMs have not shown much intelligence. Wisdom yes, intelligence no. LLMs are language models, not 'world' models. It's the difference of being wise vs smart. LLMs are very wise as they have effectively memorized the answer to every question humanity has written. OTOH, they are pretty dumb. LLMs don't "understand" the output they produce.
> To them, if something is a token predictor clearly it can't be doing anything impressive
Shifting the goal posts. Nobody said that a next token predictor can't do impressive things, but at the same time there is a big gap between impressive things and other things like "replace very software developer in the world within the next 5 years."
Why is that wrong? I mean, I support that thesis.
> since being a next-token-predictor is compatible with being intelligent.
No. My argument is by definition that is wrong. It's wisdom vs intelligence. Street-smart vs book smart. I think we all agree there is a distinction between wisdom and intelligence. I would define wisdom as being able to recall pertinent facts and experiences. Intelligence is measured in novel situations, it's the ability to act as if one had wisdom.
A next token predictor by definition is recalling. The intelligence of a LLM is good enough to match questions to potentially pertinent definitions, but it ends there.
It feels like there is intelligence for sure. In part it is hard to comprehend what it would be like to know the entirety of every written word with perfect recall - hence essentially no situation is novel. LLMs fail on anything outside of their training data. The "outside of the training" data is the realm of intelligence.
I don't know why it's so important to argue that LLMs have this intelligence. It's just not there by definition of "next token predictor", which is at core a LLM.
For example, a human being probably could pass through a lot of life by responding with memorized answers to every question that has ever been asked in written history. They don't know a single word of what they are saying, their mind perfectly blank - but they're giving very passable and sophisticated answers.
> When mikert89 says "thinking machines have been invented",
Yeah, absolutely they have not. Unless we want to reducto absurd-um the definition of thinking.
> they must become "more than a statistical token predictor"
Yup. As I illustrated by breaking down the components of "smart" into the broad components of 'wisdom' and 'intelligence', through that lens we can see that next token predictor is great for the wisdom attribute, but it does nothing for intelligence.
>dgfitz argument is wrong and BoiledCabbage is right to point that out.
Why exactly? You're stating apriori that the argument is wrong without saying way.
I think there may be some terminology mismatch, because under the statistical definitions of these words, which are the ones used in the context of machine learning, this is very much a false assertion. A next-token predictor is a mapping that takes prior sentence context and outputs a vector of logits to predict the next most likely token in the sequence. It says nothing about the mechanisms by which this next token is chosen, so any form of intelligent text can be output.
A predictor is not necessarily memorizing either, in the same way that a line of best fit is not a hash table.
> Why exactly? You're stating a priori that the argument is wrong without saying way.
Because you can prove that for any human, there exists a next-token predictor that universally matches word-for-word their most likely response to any given query. This is indistinguishable from intelligence. That's a theoretical counterexample to the claim that next-token prediction alone is incapable of intelligence.