The answers to "why" that Chomsky pushes so hard for are very valuable to adult language learners. There are basic syntactic rules to generating broadly correct language. Having these rules discovered and explained in the simplest possible form is irreplaceable by statistical models. Neural networks, much like native speakers can say "well this just sounds right," but adult learners need a mathematical theory of how and why they can generate sentences. Yes, this changes with time and circumstances, but the simple rules and theories are there if we put the effort in to look for them.
There are many languages with a very small corpus of training data. The LLMs fail miserably at communicating with them or explaining things about their grammar, but if we look hard for the underlying theories Chomsky was looking for, we can make huge leaps and bounds in understanding how to use them.
https://www.stat.berkeley.edu/~aldous/157/Papers/shmueli.pdf
To Explain Or To Predict?
Nice quote
We note that the practice in applied research of concluding that a model with a higher predictive validity is “truer,” is not a valid inference. This paper shows that a parsimonious but less true model can have a higher predictive validity than a truer but less parsimonious model.
Hagerty+Srinivasan (1991)
*like TFA it's a sorta review of Breiman
Unfortunately, studying the behavior of a system doesn't necessarily provide insight into why it behaves that way; it may not even provide a good predictive model.
It's crazy how wrong Chomsky was about machine learning. Maybe the real truth is that humans are stochastic parrots who have an underlying probability distribution - and because gradient descent is so good at reproducing probability distributions - LLMs are incredibly good at reproducing language.
In more detail: Chomsky is/was not concerned with the models themselves, but rather with the distinction between statistical modelling in general, and "clean slate" models in particular on the one hand, and structural models discovered through human insight on the other.
With "clean slate" I mean models that start with as little linguistically informed structure as possible. E.g., Norvig mentions hybrid models: these can start out as classical rule based models, whose probabilities are then learnt. A random neural network would be as clean as possible.
Could it be this?
Not sure the approach holds.
Mine is as ludicrous a suggestion as it is to damn by association.
Chomsky is wrong by the standards of his time and is making things worse rather than better.
It was very much the opposite of Chomsky's ideology as well. So it additionally means he's fake. BOTH on his morals and politics/activism, from both sides (ie. both helping a paedophile, and helping/entertaining a billionnaire).
So it's (yet another) case of an important figure that supposedly stands for something, not just demonstrating he stands for nothing at all, but being a disgusting human being as well.
https://hn.algolia.com/?query=Chomsky%20and%20the%20Two%20Cu...
The oldest submission is from 15 y.o ago - that is 2010.
I resubmitted it - thinking - with the success of LLMs - felt this was worth a revisit from "how real-world scientific progress works" point of view.
The title should say (2011), otherwise the whole piece is confusing.
> But it must be recognized that the notion of "probability of a sentence" is an entirely useless one, under any known interpretation of this term.
He was impressively early to the concept, but I think even those skeptical of the ultimate value of LLMs must agree that his position has aged terribly. That seems to have been a fundamental theoretical failing rather than the computational limits of the time, if he couldn't imagine any framework in which a novel sentence had probability other than zero.
I guess that position hasn't aged worse than his judgment of the Khmer Rouge (or Hugo Chavez, or Epstein, or ...) though. There's a cult of personality around Chomsky that's in no way justified by any scientific, political, or other achievements that I can see.
There's no point minimizing his intelligence and achievements, though.
His linguistics work (eg: grammars) is still relevant in computer science, and his cynical view of the West has merit in moderation.
The problem is that they're weak models for the languages that humans prefer to use with each other (i.e., natural languages). He seems to have convinced enough academic linguists otherwise to doom most of that field to uselessness for his entire working life, while the useful approach moved to the CS department as NLP.
As to politics, I don't think it's hard to find critics of the West's atrocities with less history of denying or excusing the West's enemies' atrocities. He's certainly not always wrong, but he's a net unfortunate choice of figurehead.
The question then becomes on of actual novelty versus the learned joint probabilities of internalised sentences/phrases/etc.
Generation or regurgitation? Is there a difference to begin with..?
If we define perplexity in the usual way in NLP, then that probability approaches zero as the length of the sequence increases, but it does so smoothly and never reaches exactly zero. This makes it useful for sequences of arbitrary length. This latter metric seems so obviously better that it seems ridiculous to me to reject all statistical approaches based on the former. That's with the benefit of hindsight for me; but enough of Chomsky's less famous contemporaries did judge correctly that I get that benefit, that LLMs exist, etc.
In any case, do you see evidence that Chomsky changed his view? The quote from 2011 ("some successes, but a lot of failures") is softer but still quite negative.
It's of rather limited use for natural languages.