That may change, particularly if the intelligence of LLMs proves to be analogous to our own in some deep way—a point that is still very much undecided. However, if the similarities are there, so is the potential for knowledge. We have a complete mechanical understanding of LLMs and can pry apart their structure, which we cannot yet do with the brain. And some of the smartest people in the world are engaged in making LLMs smaller and more efficient; it seems possible that the push for miniaturization will rediscover some tricks also discovered by the blind watchmaker. But these things are not a given.
I would push back on this a little bit. While it has not helped us to understand our own intelligence, it has made me question whether such a thing even exists. Perhaps there are no simple and beautiful natural laws, like those that exists in Physics, that can explain how humans think and make decisions. When CNNs learned to recognize faces through a series of hierarchical abstractions that make intuitive sense it's hard to deny the similarities to what we're doing as humans. Perhaps it's all just emergent properties of some messy evolved substrate.
The big lesson from the AI development in the last 10 years from me has been "I guess humans really aren't so special after all" which is similar to what we've been through with Physics. Theories often made the mistake of giving human observers some kind of special importance, which was later discovered to be the cause of theories not generalizing.
Instead I would take the opposite take.
How wonderful is it, that with naturally evolved processes and neural structures, have we been able to create what we have. Van Gogh’s paintings came out of the human brain. The Queens of the Skies - hundreds of tons of metal and composites - flying across continents in the form of a Boeing 747 or an A380 - was designed by the human brain. We went to space, have studied nature (and have conservation programs for organisms we have found to need help), took pictures the pillars of creation that are so incredibly far… all with such a “puny” structure a few cm in diameter? I think that’s freaking amazing.
Isn't Physics trying to describe the natural world? I'm guessing you are taking two positions here that are causing me confusion with your statement: 1) that our minds can be explained strictly through physical processes, and 2) our minds, including our intelligence, are outside of the domain of Physics.
If you take 1) to be true, then it follows that Physics, at least theoretically, should be able to explain intelligence. It may be intractably hard, like it might be intractably hard to have physics decribe and predict the motions of more than two planetary bodies.
I guess I'm saying that Physical laws ARE natural laws. I think you might be thinking that natural laws refer solely to all that messy, living stuff.
> Perhaps there are no simple and beautiful natural laws, like those that exists in Physics, that can explain how humans think and make decisions...Perhaps it's all just emergent properties of some messy evolved substrate.
Yeah, it is very likely that there are not laws that will do this, it's the substrate. The fruit fly brain (let alone human) has been mapped, and we've figured out that it's not just the synapse count, but the 'weights' that matter too [0]. Mind you, those weights adjust in real time when a living animal is out there.
You'll see in literature that there are people with some 'lucky' form of hydranencephaly where their brain is as thin as paper. But they vote, get married, have kids, and for some strange reason seem to work in mailrooms (not a joke). So we know it's something about the connectome that's the 'magic' of a human.
My pet theory: We need memristors [2] to better represent things. But that takes redesigning the computer from the metal on up, so is unlikely to occur any time soon with this current AI craze.
> The big lesson from the AI development in the last 10 years from me has been "I guess humans really aren't so special after all" which is similar to what we've been through with Physics.
Yeah, biologists get there too, just the other way abouts, with animals and humans. Like, dogs make vitamin C internally, and humans have that gene too, it's just dormant, ready for evolution (or genetic engineering) to reactivate. That said, these neuroscience issues with us and the other great apes are somewhat large and strange. I'm not big into that literature, but from what little I know, the exact mechanisms and processes that get you from tool using ourangs to tool using humans, well, those seem to be a bit strange and harder to grasp for us. Again, not in that field though.
In the end though, humans are special. We're the only ones on the planet that ever really asked a question. There's a lot to us and we're actually pretty strange in the end. There's many centuries of work to do with biology, we're just at the wading stage of that ocean.
[0] https://en.wikipedia.org/wiki/Drosophila_connectome
I was reading a reddit post the other day where the guy lost his crypto holdings because he input his recovery phrase somewhere. We question the intelligence of LLMs because they might open a website, read something nefarious, and then do it. But here we have real humans doing the exact same thing...
> I guess humans really aren't so special after all
No they are not. But we are still far from getting there with the current LLMs and I suspect mimicking the human brain won't be the best path forward.
> one, that the facility of LLMs is fantastic and useful
I didn't see where he was disagreeing with this.I'm assuming this was the part you were saying he doesn't hold, because it is pretty clear he holds the second thought.
| is it likely that programs will be devised that surpass human capabilities? We have to be careful about the word “capabilities,” for reasons to which I’ll return. But if we take the term to refer to human performance, then the answer is: definitely yes.
I have a difficult time reading this as saying that LLMs aren't fantastic and useful. | We can make a rough distinction between pure engineering and science. There is no sharp boundary, but it’s a useful first approximation. Pure engineering seeks to produce a product that may be of some use. Science seeks understanding.
This seems to be the core of his conversation. That he's talking about the side of science, not engineering.It is as if a biochemist looks at a human brain, and concludes there is no 'intelligence' there at all, just a whole lot of electro-chemical reactions. It fully ignores the potential for emergence.
Don't misunderstand me, I'm not saying 'AGI has arrived', but I'd say even current LLM's do most certainly have interesting lessons for Human Language development and evolution in science. What can the success in transfer learning in these models contribute to the debates on universal language faculties? How do invariants correlated across LLM systems and humans?
There's two kinds of emergence, one scientific, the other a strange, vacuous notion in the absence of any theory and explanation.
The first case is emergence when we for example talk about how gas or liquid states, or combustibility emerge from certain chemical or physical properties of particles. It's not just that they're emergent, we can explain how they're emergent and how their properties are already present in the lower level of abstraction. Emergence properly understood is always reducible to lower states, not some magical word if you don't know how something works.
In these AI debates that's however exactly how "emergence" is used, people just assert it, following necessarily from their assumptions. They don't offer a scientific explanation. (the same is true with various other topics, like consciousness, or what have you). This is pointless, it's a sort of god of the gaps disguised as an argument. When Chomsky talks about science proper, he correctly points out that these kinds of arguments have no place in it, because the point of science is to build coherent theories.
People's illusions and willingness to debase their own authority and control to take shortcuts to optimise towards lowest effort / highest yield (not dissimilar to something you would get with... auto regressive models!) was an astonishing insight to me.
At some point you have to wonder: is an LLM making your hiring decision really better than rolling a dice? At least the dice doesn't give you the illusion of rationality, it doesn't generate a neat sounding paragraph "explaining" why candidate A is the obvious choice. The LLM produces content that looks like reasoning but has no actual causal connection to the decision - it's a mimicry of explanation without true substance of causation.
You can argue that humans do the same thing. But post-hoc reasoning is often a feedback loop for the eventual answer. That's not the case for LLMs.
However, a paper published last year (Mission: Impossible Language Models, Kallini et al.) proved that LLMs do NOT learn impossible languages as easily as they learn possible languages. This undermines everything that Chompsky says about LLMs in the linked interview.
Also, GPT-2 actually seems to do quite well on some of the tested languages, including word-hop, partial reverse, and local-shuffle. It doesn't do quite as well as plain English, but GPT-2 was designed to learn English, so it's not surprising that it would do a little better. For instance, they tokenization seems biased towards English. They show "bookshelf" becoming the tokens "book", "sh", and "lf" – which in many of the languages get spread throughout a sentence. I don't think a system designed to learn shuffled-English would tokenize this way!
The AI works on English, C++, Smalltalk, Klingon, nonsense, and gibberish. Like Turing's paper this illustrates the difference between, "machines being able to think" and, "machines being able to demonstrate some well understood mathematical process like pattern matching."
https://en.wikipedia.org/wiki/Computing_Machinery_and_Intell...
Science progresses in a manner that when you see it happen in front of you it doesn't seem substantial at all, because we typically don't understand implications of new discoveries.
So far, in the last few years, we have discovered the importance of the role of language behind intelligence. We have also discovered quantitative ways to describe how close one concept is from another. More recently, from the new reasoning AI models, we have discovered something counterintuitive that's also seemingly true for human reasoning--incorrect/incomplete reasoning can often reach the correct conclusion.
People are waiting for this Prometheus-level moment with AI where it resembles us exactly but exceeds our capabilities, but I don't think that's necessary. It parallels humanity explaining Nature in our own image as God and claiming it was the other way around.
First, they have to implement "intelligence" for LLMs, then we can compare. /s
LLM designs to date are purely statistical models. A pile, a morass of floating point numbers and their weighted relationships, along with the software and hardware that animates them and the user input and output that makes them valuable to us. An index of the data fed into them, different from a Lucene or SQL DB index made from compsci algorithms & data structure primitives. Recognizable to Azimov's definition.
And these LLMs feature no symbolic reasoning whatsoever within their computational substrate. What they do feature is a simple recursive model: Given the input so far, what is the next token? And they are thus enabled after training on huge amounts of input material. No inherent reasoning capabilities, no primordial ability to apply logic, or even infer basic axioms of logic, reasoning, thought. And therefore unrecognizable to Chomsky's definition.
So our LLMs are a mere parlor trick. A one-trick pony. But the trick they do is oh-so vastly complicated, and very appealing to us, of practical application and real value. It harkens back to the question: What is the nature of intelligence? And how to define it?
And I say this while thinking of the marked contrast of apparent intelligence between an LLM and say a 2-year age child.
They may not be doing strict formal logic, but they are definitely compressing information into, and operating using, symbols.
Sentence parsing with multiple potential verb-noun-adjective interpretations are an example of old, Chomsky made fruit flies like a banana famous for a reason.
(without the weights and that specific sentence programmed in, I would be interested exactly how the symbol models cope with that, and the myriad other examples)
LLM's seem to have proven themselves to be more than a one-trick-pony. There is actually some resemblance of reasoning and structuring etc.. No matter if directly within the LLM, or supported by computer code. E.g it can be argued that the latest LLMs like Gemini 2.5 and Claude 4 in fact do complex reasoning.
We have always taken for granted you need intelligence for that, but what if you don't? It would greatly change our view on intelligence and take away one of the main factors that we test for in e.g. animals to define their "intelligence".
They most definitely don't. We attach symbolic meaning to their output because we can map it semantically to the input we gave it. Which is why people are often caught by surprise when these mappings break down.
LLMs can emulate reasoning, but the failure modes show that they don't. We can get them to be coincidentally emulating reasoning well enough long enough to fools us, investors and the media. But doubling down on it hoping that this problem goes away with scale or fine tuning is proving more and more reckless.
Even more strangely, the act of giving a statistical model symbolic input allows it to build a context which then shapes the symbolic output in a way that depends on some level of "understanding" instructions.
We "train" this model on raw symbolic data and it extracts the inherent semantic structure without any human ever embedding in the code anything resembling letters, words, or the like. It's as if Chomsky's elusive universal language is semantic structure itself.
Chomsky vs Norvig
I dunno if people knew it at that time, but those two views are completely equivalent.
Yes, because anthropomorphism is hardwired into our biology. Just two dots and an arc triggers a happy feeling in all humans. :)
> of practical application and real value
That is debatable. So far no groundbreaking useful applications have been found for LLMs. We want to believe, because they make us feel happy. But the results aren't there.
That part is unusually good btw. It's actually elegaic.
Are we seriously saying that his ideas are not taken seriously? his theory of grammar/language construction was a major contributor to modern programming languages, for one.
And there's a fact here that's very hard to dispute, this method works. I can give a computer instructions and it "understands" them in a way that wasn't possible before LLMs. The main debate now is over the semantics of words like "understanding" and whether or not an LLM is conscious in the same way as a human being (it isn't).
I'm surprised that he doesn't mention "universal grammar" once in that essay. Maybe it so happens that humans do have some innate "universal grammar" wired in by instinct but it's clearly not _necessary_ to be able to parse things. You don't need to set up some explicit language rules or generative structure, enough data and the model learns to produce it. I wonder if anyone has gone back and tried to see if you can extract out some explicit generative rules from the learned representation though.
Since the "universal grammar" hypothesis isn't really falsifiable, at best you can hope for some generalized equivalent that's isomorphic to the platonic representation hypothesis and claim that all human language is aligned in some given latent representation, and that our brains have been optimized to be able to work in this subspace. That's at least a testable assumption, by trying to reverse engineer the geometry of the space LLMs have learned.
(I'm not that familiar with LLM/ML, but it seems like trained behavioral response rather than intelligent parsing. I believe this is part of why it hallucinates? It doesn't understand concepts, it just spits out words - perhaps a parrot is a better metaphor?)
No, not "obviously". They work well for languages like English or Chinese, where word order determines grammar.
They work less well where context is more important. (e.g. Grammatical gender consistency.)
I'm not sure there's much evidence for this one way or another at this point.
I'm not actually comfortable saying that LLMs aren't conscious. I think there's a decent chance they could be in a very alien way.
I realize that this is a very weird and potentially scary claim for people to parse but you must understand how weird and scary consciousness is.
The problem is... that there is a whole amount of "smart" activities humans do without being conscious of it.
- Walking, riding a bike, or typing on a keyboard happen fluidly without conscious planning of each muscle movement.
- You can finish someone sentence or detect if a sentence is grammatically wrong, often without being able to explain the rule.
- When you enter a room, your brain rapidly identifies faces, furniture, and objects without you consciously thinking, “That is a table,” or “That is John.”
During Covid I gave a lecture on Python on Zoom in a non-English language. It was a beginner's topic about dictionary methods. I was attempting to multi-task and had other unrelated tasks open on second computer.
Midway through the lecture I noticed to my horror that I had switched to English without the audience noticing.
Going back through the recording I noticed the switch was fluid and my delivery was reasonable. What I talked about was just as good as something presented by LLM these days.
So this brings up the question - why aren't we p-zombies all the time instead of 99% of time?
Are there any tasks that absolutely demand human consciousness as we know it?
Presumably long term planning is something that active human consciousness is needed.
Perhaps there is some need for consciousness when one is in "conscious mastery" phase of acquiring a skill.
This goes for any skill such as riding a bicycle/playing chess/programming at a high level.
Once one reaches "unconscious mastery" stage the rider can concentrate on higher meta game.
Is/was the same true for ASCII/Smalltalk/binary? They are all another way to translate language into something the computer "understands".
Perhaps the fact that it hasn't would lead some to question the validity of their claims. When a scientist makes a claim about how something works, it's expected that they prove it.
If the technology is as you say, show us.
That's converting characters into a digital representation. "A" is represented as 01000001. The tokenization process for an LLM is similar, but it's only the first step.
An LLM isn't just mapping a word to a number, you're taking the entire sentence, considering the position of the words and converting it into vectors within a 1,000+ dimensional space. Machine learning has encoded some "meaning" within these dimensions that goes far far beyond something like an ASCII string.
And the proof here is that the method actual works, that's why we have LLMs.
> The fact that we have figured out how to translate language into something a computer can "understand" should thrill linguists.
I think they are really excited by this. There seems no deficiency of linguists using these machines.But I think it is important to distinguish the ability to understand language and translate it. Enough that you yourself put quotes around "understanding". This can often be a challenge for many translators, not knowing how to properly translate something because of underlying context.
Our communication runs far deeper than the words we speak or write on a page. This is much of what linguistics is about, this depth. (Or at least that's what they've told me, since I'm not a linguist) This seems to be the distinction Chomsky is trying to make.
> The main debate now is over the semantics of words like "understanding" and whether or not an LLM is conscious in the same way as a human being (it isn't).
Exactly. Here, I'm on the side of Chomsky and I don't think there's much of a debate to be had. We have a long history of being able to make accurate predictions while erroneously understanding the underlying causal nature.My background is physics, and I moved into CS (degrees in both), working on ML. I see my peers at the top like Hinton[0] and Sutskever[1] making absurd claims. I call them absurd, because it is a mistake we've made over and over in the field of physics[2,3]. One of those lessons you learn again and again, because it is so easy to make the mistake. Hinton and Sutskever say that this is a feature, not a bug. Yet we know it is not enough to fit the data. Fitting the data allows you to make accurate, testable predictions. But it is not enough to model the underlying causal structure. Science has a long history demonstrating accurate predictions with incorrect models. Not just in the way of the Relativity of Wrong[4], but more directly. Did we forget that the Geocentric Model could still be used to make good predictions? Copernicus did not just face resistance from religious authorities, but also academics. The same is true for Galileo, Boltzmann, Einstein and many more. People didn't reject their claims because they were unreasonable. They rejected the claims because there were good reasons to. Just... not enough to make them right.
[0] https://www.reddit.com/r/singularity/comments/1dhlvzh/geoffr...
[1] https://www.youtube.com/watch?v=Yf1o0TQzry8&t=449s
[2] https://www.youtube.com/watch?v=hV41QEKiMlM
[3] Think about what Fermi said in order to understand the relevance of this link: https://en.wikipedia.org/wiki/The_Unreasonable_Effectiveness...
[4] https://hermiene.net/essays-trans/relativity_of_wrong.html
No, there is no understanding at all. Please don't confuse codifying with understanding or translation. LLMs don't understand their input, they simply act on it based on the way they are trained on it.
"And there's a fact here that's very hard to dispute, this method works. I can give a computer instructions and it "understands" them "
No, it really does not understand those instructions. It is at best what used to be called an "idiot savant". Mind you, people used to describe others like that - who is the idiot?
Ask your favoured LLM to write a programme in a less used language - ooh let's try VMware's PowerCLI (it's PowerShell so quite popular) and get it to do something useful. It wont because it can't but it will still spit out something. PowerCLI is not extant across Stackoverflow and co much but it is PS based so the LLMs will hallucinate madder than a hippie on a new super weed.
so what they don't "understand", by your very specific definition of the word "understanding"? the person you're replying to is talking about the fact that they can say something to their computer in the form of casual human language and it will produce a useful response, where previously that was not true. whether that fits your suspiciously specific definition of "understanding" does not matter a bit.
so what they are over-confident with areas outside of their training data? provide more training data, improve the models, reduce the hallucination. it isn't an issue with the concept, it's an issue with the execution. yes you'll never be able to reduce it to 0%, but so what? humans hallucinate too. what are we aiming for? omniscience?
These days, Chomsky is working on Hopf algebras (originally from quantum physics) to explain language structure.
It's like wondering how well your shoes fit your feet, forgetting that shoes are made and chosen to fit your feet in the first place.
Chomsky also talks about these kind of things in detail in Hauser, Chomsky and Fitch (2002) where they describe them as "third factors" in language acquisition.
This quote brought to mind the very different technological development path of the spider species in Adrian Tchaikovsky's Children of Time. They used pheromones to 'program' a race of ants to do computation.
Sounds like "ineffable nature" mumbo-jumbo.
Chomsky made interesting points regarding the performance of AI with the performance of biological organisms in comparison to human but his conclusion is not correct. We already know that cheetah run faster human and elephant is far stronger than human. Bat can navigate in the dark with echo location and dolphin can hunt in synchronization with high precision coordination in pack to devastating effect compared to silo hunting.
Whether we like or not human is the the top unlike the claim of otherwise by Chomsky. By scientific discovery (understanding) and designing (engineering) by utilizing law of nature, human can and has surpassed all of the cognitive capabilities of these petty animals, and we're mostly responsible for their inevitable demise and extinction. Human now need to collectively and consciously reverse the extinction process of these "superior" cognitive animals in order to preserve these animals for better or worst. No other earth bound creature can do that to us.
Many of the comments herein lack that feature and seem to convey that the author might be full of him(her)self.
Also, some of the comment are a bit pejorative.
OTOH, consider LLMs as a roomful of monkeys that can communicate to each other, look at words,sentences and paragraphs on posters around the room with a human in the room that gives them a banana when they type out a new word, sentence or paragraph.
You may eventually get a roomful of monkeys that can respond to a new sentence you give them with what seems an intelligent reply. And since language is the creation of humans, it represents an abstraction of the world made by humans.
I happen to agree with his view, so i came armed to agree and read this with a view in mind which I felt was reinforced. People are overstating the AGI qualities and misapplying the tool, sometimes the same people.
In particular, the lack of theory, and scientific method means both we're, not learning much, and we've rei-ified the machine.
I was disappointed nothing said of Norbert Weiner. A man who invented cybernetics and had the courage to stand up to the military industrial complex.
But what we're good as using all of our capabilities to transform the world around us according to an internal model that is partially shared between individuals. And we have complete control over that internal model, diverging from reality and converging towards it on whims.
So we can't produce and manipulate text faster, but rarely the end game is to produce and manipulate text. Mostly it's about sharing ideas and facts (aka internal models) and the control is ultimately what matters. It can help us, just like a calculator can help us solve an equation.
EDIT
After learning to draw, I have that internal model that I switch to whenever I want to sketch something. It's like a special mode of observation, where you no longer simply see, but pickup a lot of extra details according to all the drawing rules you internalized. There's not a lot, they're just intrinsically connected with each other. The difficult part is hand-eye coordination and analyzing the divergences between what you see and the internal model.
I think that's why a lot of artists are disgusted with AI generators. There's no internal models. Trying to extract one from a generated picture is a futile exercice. Same with generated texts. Alterations from the common understanding follows no patterns.
A calculator is consistent and doesn’t “hallucinate” answers to equations. An LLM puts an untrustworthy filter between the truth and the person. Google was revolutionary because it increased access to information. LLMs only obscure that access, while pretending to be something more.
Also I used it for a few programming tasks I was pretty sure was in the datasets (how to draw charts with python and manipulate pandas frame). I know the domain, but wasn't in the mood to analyse the docs to get the implementation information. But the information I was seeking was just a few lines of sample code. In my experience, anything longer is pretty inconsistent and worthless explanations.
Learning language from small data.
"To characterize a structural analysis of state violence as “apologia” reveals more about prevailing ideological filters than about the critique itself. If one examines the historical record without selective outrage, the pattern is clear—and uncomfortable for all who prefer myths to mechanisms." the fake academic facade, the us diabolism, the unwillingness to see complexity and responsibility in other its all with us forever ..
> We can make a rough distinction between pure engineering and science. There is no sharp boundary, but it’s a useful first approximation. Pure engineering seeks to produce a product that may be of some use. Science seeks understanding. If the topic is human intelligence, or cognitive capacities of other organisms, science seeks understanding of these biological systems.
If you take this approach, of course it follows that we should laugh at Tom Jones.
But a more differentiated approach is to recognize that science also falls into (at least) two categories; the science that we do because it expands our capability into something that we were previously incapable of, and the one that does not. (we typically do a lot more of the former than the latter, for obvious practical reasons)
Of course it is interesting from a historical perspective to understand the seafaring exploits of Polynesians, but as soon as there was a better way of navigating (i.e. by stars or by GPS) the investigation of this matter was relegated to the second type of science, more of a historical kind of investigation. Fundamentally we investigate things in science that are interesting because we believe the understanding we can gain from it can move us forwards somehow.
Could it be interesting to understand how Hamilton was thinking when he came up with imaginary numbers? Sure. Are a lot of mathematicians today concerning themselves with studying this? No, because the frontier has been moved far beyond.*
When you take this view, it´s clear that his statement
> These considerations bring up a minor problem with the current LLM enthusiasm: its total absurdity, as in the hypothetical cases where we recognize it at once. But there are much more serious problems than absurdity.
is not warranted. Consider the following, in his own analogy:
> These considerations bring up a minor problem with the current GPS enthusiasm: its total absurdity, as in the hypothetical cases where we recognize it at ones. But there are much more serious problems than absurdity. One is that GPS systems are designed in such a way that they cannot tell us anything about navigation, planning routes or other aspects of orientation, a matter of principle, irremediable.
* I´m making a simplifying assumption here that we can´t learn anything useful for modern navigation anymore from studying Polynesians or ants; this might well be untrue, but that is also the case for learning something about language from LLMs, which according to Chomsky is apparently impossible and not even up for debate.
What you think about his argument about “not being able to distinguish possible language from impossible”?
And why is it inherent in ML design?
Does he assume that there could be such an instrument/algorithm that could do that with a certainty level higher than LLM/some ml model?
I mean, certainly they can be used to make a prediction/answer to this question, but he argues that this answer has no credibility? I mean, LLM is literally a model, ie probability distribution over what is language and what is not, what gives?
Current models are probably tuned more “strictly” to follow existing languages closely, ie that will say “no-no” to some yet-unknown language, but isn’t this improvable in theory?
Or is he arguing precisely that this “exterior” is not directly correlated with “internal processes and faculties” and cannot make such predictions in principle?
While there's some things in this I find myself nodding along to in this, I can't help but feel it's an a really old take that is super vague and hand-wavy. The truth is that all of the progress on machine learning is absolutely science. We understand extremely well how to make neural networks learn efficiently; it's why the data leads anywhere at all. Backpropagation and gradient descent are extraordinarily powerful. Not to mention all the "just engineering" of making chips crunch incredible amounts of numbers.
Chomsky is extremely ungenerous to the progress and also pretty flippant about what this stuff can do.
I think we should probably stop listening to Chomsky; he hasn't said anything here that he hasn't already say a thousand times for decades.
Are LLM's still the same black box as they were described as a couple years ago? Are their inner workings at least slightly better understood than in the past?
Running tens of thousands of chips crunching a bajillion numbers a second sounds fun, but that's not automatically "engineering". You can have the same chips crunching numbers with the same intensity just to run an algorithm to run a large prime number. Chips crunching numbers isn't automatically engineering IMO. More like a side effect of engineering? Or a tool you use to run the thing you built?
What happens when we build something that works, but we don't actually know how? We learn about it through trial and error, rather than foundational logic about the technology.
Sorta reminds me of the human brain, psychology, and how some people think psychology isn't science. The brain is a black box kind of like a LLM? Some people will think it's still science, others will have less respect.
This perspective might be off base. It's under the assumption that we all agree LLM's are a poorly understood black box and no one really knows how they truly work. I could be completely wrong on that, would love for someone else to weigh in.
Separately, I don't know the author, but agreed it reads more like a pop sci book. Although I only hope to write as coherently as that when I'm 96 y/o.
Not if some properties are unexpectedly emergent. Then it is science. For instance, why should a generic statistical model be able to learn how to fill in blanks in text using a finite number of samples? And why should a generic blank-filler be able to produce a coherent chat bot that can even help you write code?
Some have even claimed that statistical modelling shouldn't able to produce coherent speech, because it would need impossible amounts of data, or the optimisation problem might be too hard, or because of Goedel's incompleteness theorem somehow implying that human-level intelligence is uncomputable, etc. The fact that we have a talking robot means that those people were wrong. That should count as a scientific breakthrough.
That's not a good argument. Neuroscience was constructed by (other) brains. The brain is trying to explain itself.
> The truth is that all of the progress on machine learning is absolutely science.
But not much if you're interested in finding out how our brain works, or how language works. One of the interesting outcomes of LLMs is that there apparently is a way to represent complex ideas and their linguistic connection in a (rather large) unstructured state, but it comes without thorough explanation or relation to the human brain.
> Chomsky is [...] pretty flippant about what this stuff can do.
True, that's his style, being belligerently verbose, but others have been pretty much fawning and drooling over a stochastic parrot with a very good memory, mostly with dollar signs in their eyes.
This is not relevant. An observer who deceives for purposes of “balancing” other perceived deceptions is as untrustworthy and objectionable as one who deceives for other reasons.
To be fair the article is from two years ago, which when talking about LLMs in this age arguably does count as "old", maybe even "really old".
I've been saying this my whole life, glad it's finally catching on
I remember having thoughts like this until I listened to him talk on a podcast for 3 hours about chatGPT.
What was most obvious is Chomsky really knows linguistics and I don't.
"What Kind of Creatures Are We?" is good place to start.
We should take having Chomsky still around to comment on LLMs as one of the greatest intellectual gifts.
Much before listening to his thoughts on LLMs was me projecting my disdain for his politics.
It is not science, which is the study of the natural world. You are using the word "science" as an honorific, meaning something like "useful technical work that I think is impressive".
The reason you are so confused is that you can't distinguish studying the natural world from engineering.
What is elegant as a model is not always what works, and working towards a clean model to explain everything from a model that works is fraught, hard work.
I don’t think anyone alive will realize true “AGI”, but it won’t matter. You don’t need it, the same way particle physics doesn’t need elegance
https://www.nytimes.com/2023/03/08/opinion/noam-chomsky-chat...
"NC: Credit for the article should be given to the actual author, Jeffrey Watumull, a fine mathematician-linguist-philosopher. The two listed co-authors were consultants, who agree with the article but did not write it."
He alludes to quite a bit here - impossible languages, intrinsic rules that don’t actually express in the language, etc - that leads me to believe there’s a pretty specific sense by which he means “understanding,” and I’d expect there’s a decent literature in linguistics covering what he’s referring to. If it’s a topic of interest to you, chasing down some of those leads might be a good start.
(I’ll note as several others have here too that most of his language seems to be using specific linguistics terms of art - “language” for “human language” is a big tell, as is the focus on understanding the mechanisms of language and how humans understand and generate languages - I’m not sure the critique here is specifically around LLMs, but more around their ability to teach us things about how humans understand language.)
If we define "understanding" like "useful", as in, not an innate attribute, but something in relation to a goal, then again, a good imitation, or a rudimentary model can get very far. ChatGPT "understood" a lot of things I have thrown at it, be that algorithms, nutrition, basic calculations, transformation between text formats, where I'm stuck in my personal development journey, or how to politely address people in the email I'm about to write.
>What if our „understanding“ is just unlocking another level in a model?
I believe that it is - that understanding is basically an illusion. Impressions are made up from perceptions and thinking, and extrapolated over the unknown. And just look how far that got us!
I would say that it is to what extent your mental model of a certain system is able to make accurate predictions of that system's behavior.
I agree with the rest of these comments though, listening to Chomsky wax about the topic-du-jour is a bit like trying to take lecture notes from the Swedish Chef.
I'll be liberally borrowing, and using that simile! It's hilarious. Bork, bork, bork!
The best thing is you can be right, and the other side can't take offense. It's the Muppets after all. It's brilliant!
If you're only going to read one part, I think it is this:
| I mentioned insect navigation, which is an astonishing achievement. Insect scientists have made much progress in studying how it is achieved, though the neurophysiology, a very difficult matter, remains elusive, along with evolution of the systems. The same is true of the amazing feats of birds and sea turtles that travel thousands of miles and unerringly return to the place of origin.
| Suppose Tom Jones, a proponent of engineering AI, comes along and says: “Your work has all been refuted. The problem is solved. Commercial airline pilots achieve the same or even better results all the time.”
| If even bothering to respond, we’d laugh.
| Take the case of the seafaring exploits of Polynesians, still alive among Indigenous tribes, using stars, wind, currents to land their canoes at a designated spot hundreds of miles away. This too has been the topic of much research to find out how they do it. Tom Jones has the answer: “Stop wasting your time; naval vessels do it all the time.”
| Same response.
It is easy to look at metrics of performance and call things solved. But there's much more depth to these problems than our abilities to solve some task. It's not about just the ability to do something, the how matters. It isn't important that we are able to do better at navigating than birds or insects. Our achievements say nothing about what they do.This would be like saying we developed a good algorithm only my looking at it's ability to do some task. Certainly that is an important part, and even a core reason for why we program in the first place! But its performance tells us little to nothing about its implementation. The implementation still matters! Are we making good uses of our resources? Certainly we want to be efficient, in an effort to drive down costs. Are there flaws or errors that we didn't catch in our measurements? Those things come at huge costs and fundamentally limit our programs in the first place. The task performance tells us nothing about the vulnerability to hackers nor what their exploits will cost our business.
That's what he's talking about.
Just because you can do something well doesn't mean you have a good understanding. It's natural to think the two relate because understanding improves performance that that's primarily how we drive our education. But this is not a necessary condition and we have a long history demonstrating that. I'm quite surprised this concept is so contentious among programmers. We've seen the follies of using test driven development. Fundamentally, that is the same. There's more depth than what we can measure here and we should not be quick to presume that good performance is the same as understanding[0,1]. We KNOW this isn't true[2].
I agree with Chomsky, it is laughable. It is laughable to think that the man in The Chinese Room[3] must understand Chinese. 40 years in, on a conversation hundreds of years old. Surely we know you can get a good grade on a test without actually knowing the material. Hell, there's a trivial case of just having the answer sheet.
[0] https://www.reddit.com/r/singularity/comments/1dhlvzh/geoffr...
[1] https://www.youtube.com/watch?v=Yf1o0TQzry8&t=449s
Quoting Chomsky:
> These considerations bring up a minor problem with the current LLM enthusiasm: its total absurdity, as in the hypothetical cases where we recognize it at once. But there are much more serious problems than absurdity.
> One is that the LLM systems are designed in such a way that they cannot tell us anything about language, learning, or other aspects of cognition, a matter of principle, irremediable... The reason is elementary: The systems work just as well with impossible languages that infants cannot acquire as with those they acquire quickly and virtually reflexively.
Response from o3:
LLMs do surface real linguistic structure:
• Hidden syntax: Attention heads in GPT-style models line up with dependency trees and phrase boundaries—even though no parser labels were ever provided. Researchers have used these heads to recover grammars for dozens of languages.
• Typology signals: In multilingual models, languages that share word-order or morphology cluster together in embedding space, letting linguists spot family relationships and outliers automatically.
• Limits shown by contrast tests: When you feed them “impossible” languages (e.g., mirror-order or random-agreement versions of English), perplexity explodes and structure heads disappear—evidence that the models do encode natural-language constraints.
• Psycholinguistic fit: The probability spikes LLMs assign to next-words predict human reading-time slow-downs (garden-paths, agreement attraction, etc.) almost as well as classic hand-built models.
These empirical hooks are already informing syntax, acquisition, and typology research—hardly “nothing to say about language.”
It's completely irrelevant because the point he's making is that LLMs operate differently from human languages as evidenced by the fact that they can learn language structures that humans cannot learn. Put another way, I'm sure you can point out an infinitude of similarities between human language faculty and LLMs but it's the critical differences that make LLMs not useful models of human language ability.
> When you feed them “impossible” languages (e.g., mirror-order or random-agreement versions of English), perplexity explodes and structure heads disappear—evidence that the models do encode natural-language constraints.
This is confused. You can pre-train an LLM on English or an impossible language and they do equally well. On the other hand humans can't do that, ergo LLMs aren't useful models of human language because they lack this critical distinctive feature.
It's impressive that LLMs can learn languages that humans cannot. In what frame is this a negative?
Separately, "impossible language" is a pretty clear misnomer. If an LLM can learn it, it's possible.
This is what Chomsky always wanted ai to be... especially language ai. Clever solutions to complex problems. Simple once you know how they work. Elegant.
I sympathize. I'm a curious human. We like elegant, simple revelations that reveal how out complex world is really simple once you know it's secrets. This aesthetic has also been productive.
And yet... maybe some things are complicated. Maybe LLMs do teach us something about language... that language is complicated.
So sure. You can certainly critique "ai blogosphere" for exuberance and big speculative claims. That part is true. Otoh... linguistics is one of the areas that ai based research may turn up some new insights.
Overall... what wins is what is most productive.
It certainly teaches us many things. But an LLM trained on as many words (or generally speaking an AI trained on sounds) in similar quantities of a toddler learning to understand, parse and apply language, would not perform well with current architectures. They need orders of magnitude more training material to get even close. Basically, current AI learns slowly, but of course it’s much faster in wall clock time because it’s all computer.
What I mean is: what makes an ALU (CPU) better than a human at arithmetic? It’s just faster and makes fewer errors. Similarly, what makes Google or Wikipedia better than an educated person? It’s just storing and helping you access stored information, it’s not magic (anymore). You can manually do everything mechanically, if you’re willing to waste the time to prove a point.
An LLM does many things better than humans, but we forget they’ve been trained on all written history and have hundreds of billions of parameters. If you compare what an LLM can do with the same amount of training to a human, the human is much better even at picking up patterns – current AIs strongest skill. The magic comes from the unseen vast amounts of training data. This is obvious when using them – stray just slightly outside of the training zone to unfamiliar domains and ”ability” drops rapidly. The hard part is figuring out these fuzzy boundaries. How far does interpolating training data get you? What are the highest level patterns are encoded in the training data? And most importantly, to what extent do those patterns apply to novel domains?
Alternatively, you can use LLMs as a proxy for understanding the relationship between domains, instead of letting humans label them and decide the taxonomy. One such example is the relationship between detecting patterns and generating text and images – it turns out to be more or less reversible through the same architecture. More such remarkable similarities and anti-similarities are certainly on the horizon. For instance, my gut feeling says that small talk is closer to driving a car but very different from puzzle solving. We don’t really have a (good) taxonomy over human- or animal brain processes.
In this article he is very focused on science and works hard to delineate science (research? deriving new facts?) from engineering (clearly product oriented). In his opinion ChatGPT falls on the engineering side of this line: it's a product of engineering, OpenAI is concentrating on marketing. For sure there was much science involved but the thing we have access to is a product.
IMHO Chomsky is asking: while ChatGPT is a fascinating product, what is it teaching us about language? How is it advancing our knowledge of language? I think Chomsky is saying "not much."
Someone else mentioned embeddings and the relationship between words that they reveal. Indeed, this could be a worthy area of further research. You'd think it would be a real boon when comparing languages. Unfortunately the interviewer didn't ask Chomsky about this.
Understanding Linguistics before LLMs:
“We think Birds fly by flapping their wings”
Understanding Linguistics Theories after LLMs:
“Understanding the physics of Aerofoils and Bernoulli’s principle mean we can replicate what birds do”
By whom?
As this is Hacker News, it is worth mentioning that he developed the concept of context-free grammars. That is something many of us encounter on a regular basis.
No matter what personality flaws he might have and how misguided some of his political ideas might be, he is one of the big thinkers of the 20th century. Very much unlike Trump.
"The first principle is that you must not fool yourself, and you are the easiest person to fool."
The same way that reddit comments arent a formal debate.
Mocking is absolutely useful. Sometimes you debate someone like graham hancock and force him to confirm that he has no evidence for his hypotheses, then when you discuss the debate, you mock him relentlessly for having no evidence for his hypotheses.
> Yet here is Chomsky addressing a lay audience that has no linguistics background
So not a formal debate or paper where I would expect anyone to hold to debate principles.
> instead of even attempting to summarize the arguments for his position..
He makes a very clear, simple argument, accessible to any layperson who can read. If you are studying insects what you are interested in is how insects do it not what other mechanisms you can come up with to "beat" insects. This isn't complicated.
Where is the research on impossible language that infants can't acquire? A good popsci article would give me leads here.
Even assuming Chomsky's claim is true, all it shows is that LLMs aren't an exact match for human language learning. But even an inexact model can still be a useful research tool.
>That’s highly unlikely for reasons long understood, but it’s not relevant to our concerns here, so we can put it aside. Plainly there is a biological endowment for the human faculty of language. The merest truism.
Again, a good popsci article would actually support these claims instead of simply asserting them and implying that anyone who disagrees is a simpleton.
I agree with Chomsky that the postmodern critique of science sucks, and I agree that AI is a threat to the human race.
I searched for an actual paper by that guy because you’ve mentioned his real name. I found “Modern language models refute Chomsky’s approach to language”. After reading it seems even more true that Chomsky’s Tom Jones is a strawman.
Chomsky introduced his theory of language acquisition, according to which children have an inborn quality of being biologically encoded with a universal grammar
https://psychologywriting.com/skinner-and-chomsky-on-nature-...Yes, maybe we can reproduce that learning process in LLMs, but that doesn't mean the LLMs imitate only the nurture part (might as well be just finetuning), and not the nature part.
An airplane is not an explanation for a bird's flight.
Isn't AI optimism an ideological motivation? It's a spectrum, not a mental model.
They're firmly on one extreme end of the spectrum. I feel as though I'm somewhere in between.
For Chomsky specifically, the entire existence of LLM, however it's framed, is a massive middle finger to him and a strike-through on a large part of his academic career. As much as I find his UG theory and its supporters irritating, it might be felt a bit unfair to someone his age.
Usually what happens is the information bubble bursts, and gets corrected, or it just fades out.
I was quite dismissive of him on LLMs until I realized the utter hubris and stupidity of dismissing Chomsky on language.
I think it was someone asking if he was familiar with the Wittgenstein Blue and Brown books and of course because he as already an assistant professor at MIT when they came out.
I still chuckle at my own intellectual arrogance and stupidity when thinking about how I was dismissive of Chomsky on language. I barely know anything and I was being dismissive of one of unquestionable titans and historic figures of a field.
https://www.scientificamerican.com/article/evidence-rebuts-c...
But at least he admits that:
An equivalent observation might be that the only people who seem really, really excited about current AI products are grifters who want to make money selling it. Which looks a lot like Blockchain to many.
The perception on the left is that once again, corporations are foisting products on us that nobody wants, with no concern for safety, privacy, or respect for creators.
For better or worse, the age of garage-tech is mostly dead and Tech has become synonymous with corporatism. This is especially true with GenAI, where the resources to construct a frontier model (or anything remotely close to it) are far outside what a hacker can afford.
Not that I am an LLM zealot. Frankly, some of the clear trajectory it puts humans on makes me question our futures in this timeline. But even if I am not a zealot, but merely an amused, but bored middle class rube, the serious issues with it ( privacy, detailed personal profiling that surpasses existing systems, energy use, and actual power of those who wield it ), I can see it being implemented everywhere with a mix of glee and annoyance.
I know for a fact it will break things and break things hard and it will be people, who know how things actually work that will need to fix those.
I will be very honest though. I think Chomsky is stuck in his internal model of the world and unable to shake it off. Even his arguments fall flat, because they don't fit the domain well. It seems like they should given that he practically made his name on syntax theory ( which suggests his thoughts should translate well into it ) and yet.. they don't.
I have a minor pet theory on this, but I am still working on putting it into some coherent words.
Perhaps it is more important to know the limitations of tools rather than dismiss their utility entirely due to the existence of limitations.
> Perhaps it is more important to know the limitations of tools rather than dismiss their utility entirely due to the existence of limitations.
Well, yes. And "reasoning" is only something LLMs do coincidentally, to their function as sequence continuation engines. Like performing accurate math on rationale numbers, it can happen if you put in a lot of work and accept a LOT of expensive computation. Even then there exists computations that just are not reasonable or feasible.
Reminding folks to dismiss the massive propaganda engine pushing this bubble isn't "dismissing their utility entirely".
These are not reasoning machines. Treating them like they are will get you hurt eventually.
Who would make such a claim? LLM’s are of course incredible, but it seems obvious that their mechanism is quite different than the human brain.
I think the best you can say is that one could motivate lines of inquiry in human understanding, especially because we can essentially do brain surgery on an LLM in action in a way that we can’t with humans.
> Again, we’d laugh. Or should.
Should we? This reminds me acutely of imaginary numbers. They are a great theory of numbers that can list many numbers that do 'exist' and many that can't possibly 'exist'. And we did laugh when imaginary numbers were first introduced - the name itself was intended as a derogatory term for the concept. But who's laughing now?
The term “imaginary number” was coined by Rene Descartes as a derogatory and the ill intent behind his term has stuck ever since. I suspect his purpose was theological rather than mathematical and we are all the worse for it.
Care to expand on how his theories can be taught in such a binary way?
This leads to people who agree hiring each other and departments ‘circling the wagon’ on these issues. You’ll see this referred to as east vs west coast, but it’s not actually that clearly geographically delineated.
So anyways, these are open questions that people do seriously discuss and study, but the politics of academia make it difficult and unfortunately this often trickles down to students.
Same thing happened in Astronomy. Students of Fred Hoyle can't work in some institutions. &c &c.
But generally speaking Chomsky's ideas, and in particular, the Universal Grammar are no longer in vogue.
Chomsky talks about how the current approach can't tell you about what humans are doing, only approximate it; the example he has given in the past is taking thousands of hours of footage of falling leaves and then training a model to make new leaf falling footage versus producing a model of gravity, gas mechanics for the air currents, and air resistance model of leaves. The later representation is distilled down into something that tells you about what is happening at the end of some scientific inquiry, and the former is a opaque simulation for engineering purposes if all you wanted was more leaf falling footage.
So I interpret Chomsky as meaning "Look, these things can be great for an engineering purpose but I am unsatisfied in them for scientific research because they do not explain language to me" and mostly pushing back against people implying that the field he dedicated much of his life to is obsolete because it isn't being used for engineering new systems anymore, which was never his goal.
I’m perfectly willing to bet that there are LLMs that can pass a Turing test, even against a mind like Chomsky.
He wants to understand how human language works. If I get him right — and I'm absolutely sure that I don't in important ways — then LLMs are not that interesting because both (1) and (2) above are true of them.
I understand his diction is a bit impenetrable but I believe the intention is to promote literacy and specificity, not just to be a smarty-pants.