A calculator can do very complex sums very quickly, but we don't tend to call it "smart" because we don't think it's operating intelligently to some internal model of the world. I think the "LLMs are AGI" crowd would say that LLMs are, but it's perfectly consistent to think the output of LLMs is consistent/impressive/useful, but still maintain that they aren't "smart" in any meaningful way.
Okay, but you have to actually address why you think LLMs lack an "internal model of the world"
You can train one on 1930s text, and then teach it Python in-context.
They've produced multiple novel mathematical proofs now; Terrance Tao is impressed with them as research assistants.
You can very clearly ask them questions about the world, and they'll produce answers that match what you'd get from a "model" of the world.
What are weights, if not a model of the world? It's got a very skewed perspective, certainly, since it's terminally online and has never touched grass, but it still very clearly has a model of the world.
I'd dare say it's probably a more accurate model than the average person has, too, thanks to having Wikipedia and such baked in.
There's obviously a lot more of a case for suggesting LLMs are generally intelligent than a calculator, but for me, I think the key point is that understanding them as "next token generators" is a lot more helpful to explain things like hallucinations and some of the other issues/loops they get into.
For me, if understanding models as "generally intelligent agents operating with an internal model of the world" explained their behaviour better than "next token generators", I'd think calling them "smart" would have some justification[0]. I'm just a person on the internet though, and defining intelligence is pretty rarely clear, even without bringing LLMs into the mix.
[0] In case it's interesting to anyone, I'm basically given a half-baked version of how Daniel Dennet defined intention: https://en.wikipedia.org/wiki/Intentional_stance
Now we have these LLMs that provide some simulation of reasoning merely through prediction of token patterns and that is indeed unexpected and astonishing. However, the AI promoters want to suggest that this simulation of reasoning is human-level reasoning or evolving toward human-level reasoning and this is the same as mistaking game engine physics for real physics. The failure cases (e.g. the walk vs drive to a car wash next door question or the generating an image of a full glass of wine issue), even if patched away, are enough to reveal the token predictor underneath.
It's not like a calculator because LLM can solve very broad classes of problems - you'd struggle to define problems which LLM can't solve (given some fine-tuning, harness, KB, etc).
All this talk about "smartness" isn't even particularly cute...
I definitely buy this, as least somewhat. Personally I think it'd be a lot more helpful to talk about how "generalisable" a tool is, rather than "general intelligence". LLMs can definitely solve a much broader class of problems than a calculator.
I don't know that "artificial general intelligence" or even "general intelligence" has a very good definition, personally I feel like "solving problems generally" doesn't seem to capture what I mean when I use those kinds of terms. For one, it makes a swiss army knife seem more intelligent than a cat, which personally seems the opposite of what I'd want a good definition of general intelligence to do.
So can computer programs. Are computer programs intelligent?
If you make a program which can solve many different classes of problems that's called AI.