What exactly do you mean by that? I've seen this exact comment stated many times, but I always wonder:
What limitations of AI chat bots do you currently see that are due to them using next token prediction?
It’s kind of like you’re saying “prove god doesn’t exist” when it’s supposed to be “prove god exists.”
If a problem isn’t documented LLMs simply have nowhere to go. It can’t really handle the knowledge boundary [1] at all, since it has no reasoning ability it just hallucinates or runs around in circles trying the same closest solution over and over.
It’s awesome that they get some stuff right frequently and can work fast like a computer but it’s very obvious that there really isn’t anything in there that we would call “reasoning.”
I don't want to address directly your claim about lack of generalization, because there's a more basic issue with the GP statement. Even though I will say, today's models do seem to generalize quite a bit better than you make it sound.
But more importantly, you and GP don't mention any evidence for why that is due to specifically using next token prediction as a mechanism.
Why would it not be possible for a highly generalizing model to use next token prediction for its output?
That doesn't follow to me at all, which is why the GP statement reads so weird.
Again, inverted burden of proof. We don’t have to prove that next token prediction is unable to do things that it currently cannot do and has no compelling roadmap that would lead us to believe it will do those things.
It’s perhaps a lot like Tesla’s “we can do robocars with just cameras” manifesto. They are just saying that they can do it because humans use eyes and nothing else. But they haven’t actually shown their technology working as well as even impaired human driving, so the burden of proof is on them to prove naysayers wrong. Put up or shut up, their system is approaching a decade late from their promises.
To my knowledge Tesla is still failing simple collision avoidance tests while their competitors are operating revenue service.
https://www.carscoops.com/2025/06/teslas-fsd-botches-another...
This other article critical of the test methodology actually still points out (defends?) the Tesla system by saying that it’s not reasonable to expect Tesla to train the system on unrealistic scenarios:
https://www.forbes.com/sites/bradtempleton/2025/03/17/youtub...
That really gets back to my exact point: AI implemented the way it is today (e.g. next token prediction) can’t handle anything it has no training data for while the human brain is amazingly good at making new connections without taking a ton of time to be fed thousands of examples of that new discovery.
If you're saying "X can't do Y because Z" you do need to say what the connection between Y and Z is. You do need to define what Y is. That's got nothing to do with a burden of proof, just speaking in a understandable manner.
The Tesla tangent is totally unhelpful because I know exactly how to make those connections in that example.
The issue is that it uses next token prediction for its training, it doesn't matter how it outputs things but it matters how its trained.
As long as these models are trained to be next token predictors you will always be able to find flaws with it that are related to it being a next token predictor, so understanding that is how they work really makes them much easier to use.
So since it is so easy to get the model to make errors due to it being trained to just predict tokens people argue that is proof they aren't really thinking. Like, any extremely common piece of text when altered slightly will typically still output the same follow-up as the text it has seen millions of times even though it makes no logical sense. That is due to them being next token predictors instead of reasoning machines.
You might say its unfair to abuse their weaknesses as next token predictors, but then you admit that being a next token predictor interferes with their ability to reason, which was the argument you said you don't understand.
LLM research is trying out a lot of different things that move away from just training on next token prediction, and I buy the argument that not doing anything else would be limiting.
The model is still fundamentally a next token predictor.