Chomsky and the Two Cultures of Statistical Learning (2011) (opens in new tab)

(norvig.com)

103 pointsatomicnature5mo ago119 comments

119 comments

> Chomsky has focused on the generative side of language

The answers to "why" that Chomsky pushes so hard for are very valuable to adult language learners. There are basic syntactic rules to generating broadly correct language. Having these rules discovered and explained in the simplest possible form is irreplaceable by statistical models. Neural networks, much like native speakers can say "well this just sounds right," but adult learners need a mathematical theory of how and why they can generate sentences. Yes, this changes with time and circumstances, but the simple rules and theories are there if we put the effort in to look for them.

There are many languages with a very small corpus of training data. The LLMs fail miserably at communicating with them or explaining things about their grammar, but if we look hard for the underlying theories Chomsky was looking for, we can make huge leaps and bounds in understanding how to use them.

intalentive5mo ago

This essay is missing the words “cause” and “causal”. There is a difference between discovering causes and fitting curves. The search for causes guides the design of experiments, and with luck, the derivation of formulae that describe the causes. Norvig seems to be confusing the map (data, models) for the territory (causal reality).

gsf_emergency_65mo ago

A related* essay (2010) by a statistician on the goals of statistical modelling that I've been procrastinating on:

https://www.stat.berkeley.edu/~aldous/157/Papers/shmueli.pdf

To Explain Or To Predict?

Nice quote

We note that the practice in applied research of concluding that a model with a higher predictive validity is “truer,” is not a valid inference. This paper shows that a parsimonious but less true model can have a higher predictive validity than a truer but less parsimonious model.

Hagerty+Srinivasan (1991)

*like TFA it's a sorta review of Breiman

09283740825mo ago

is it more than a commentary on overfitting to the tune of "with enough epicycles you can make the elephant wiggle its trunk"?

gsf_emergency_65mo ago

If you are referring to Hagerty+Srinivasan:

They certainly didn't think that a better fit => "truer".

They used the term "truer" to describe a model that more accurately captures the underlying causal structure or "true" relationship between variables in a population.

As for the paper I linked, I still haven't read it closely enough to confirm that D-Machine's comment below is a good dismissal.

I'm inclined to think it's more like "interpolating vs extrapolating"

tripletao5mo ago

This essay frequently uses the word "insight", and its primary topic is whether an empirically fitted statistical model can provide that (with Norvig arguing for yes, in my opinion convincingly). How does that differ from your concept of a "cause"?

musicale5mo ago

> I agree that it can be difficult to make sense of a model containing billions of parameters. Certainly a human can't understand such a model by inspecting the values of each parameter individually. But one can gain insight by examing (sic) the properties of the model—where it succeeds and fails, how well it learns as a function of data, etc.

Unfortunately, studying the behavior of a system doesn't necessarily provide insight into why it behaves that way; it may not even provide a good predictive model.

tripletao5mo ago

Norvig's textbook surely appears on the bookshelf of researchers including those building current top LLMs. So it's odd to say that such an approach "may not even provide a good predictive model". As of today, it is unquestionably the best known predictive model for natural language, by huge margin. I don't think that's for lack of trying, with billions of dollars or more at stake.

Whether that model provides "insight" (or a "cause"; I still don't know if that's supposed to mean something different) is a deeper question, and e.g. the topic of countless papers trying to make sense of LLM activations. I don't think the answer is obvious, but I found Norvig's discussion to be thoughtful. I'm surprised to see it viewed so negatively here, dismissed with no engagement with his specific arguments and examples.

4 more replies

D-Machine5mo ago

I had this exact reaction, no discussion of "causal modeling" makes the whole thing seem horribly out of touch with the real issues here. You can have explanatory and predictive models that are causal models, or explanatory and predictive models that are non-causal, and that this the actual issue, not "explanation" vs. "prediction", which is not a tight enough distinction.

MoravecsParadox5mo ago

> derided researchers in machine learning who use purely statistical methods to produce behavior that mimics something in the world, but who don't try to understand the meaning of that behavior.

It's crazy how wrong Chomsky was about machine learning. Maybe the real truth is that humans are stochastic parrots who have an underlying probability distribution - and because gradient descent is so good at reproducing probability distributions - LLMs are incredibly good at reproducing language.

AuthAuth5mo ago

Is it crazy? Chomsky is wrong on so many of the topics he speaks about.

barrenko5mo ago

Is this bayesian vs. frequentist?

tgv5mo ago

In one word: no.

In more detail: Chomsky is/was not concerned with the models themselves, but rather with the distinction between statistical modelling in general, and "clean slate" models in particular on the one hand, and structural models discovered through human insight on the other.

With "clean slate" I mean models that start with as little linguistically informed structure as possible. E.g., Norvig mentions hybrid models: these can start out as classical rule based models, whose probabilities are then learnt. A random neural network would be as clean as possible.

barrenko5mo ago

Thank you!

pmkary5mo ago

I have many books from Chomsky, and I want to throw them away because it disgusts me to have them. Then I think, why should I throw away things I spent so much on? It makes me more angry. So I have pilled them up somewhere to figure out what ti do with them and each time I walk past it I feel sad to ever passed by his work.

eucyclos5mo ago

There's an interview with Dan schmachtenberger where he talks about the worst book ever written (his opinion is that it's 'the 48 laws of power'). He made the point that being consistently wrong is actually pretty impressive, and there are worthwhile lessons from watching someone getting taken seriously despite being wrong. Maybe you could revisit them with that approach.

aleph_minus_one5mo ago

> There's an interview with Dan schmachtenberger where he talks about the worst book ever written (his opinion is that it's 'the 48 laws of power').

Could it be this?

> https://www.youtube.com/watch?v=eIzRV4TxHo8

malvim5mo ago

I don’t think they’re disgusted by Chomsky’s work because it’s wrong. They’re disgusted because of the recently surfaced ties with Epstein.

Not sure the approach holds.

pmkary5mo ago

Actually, it's both. I wanted to study media theory, and it was interesting that his work both appeared in compilers and philosophy, so I thought, “Let’s buy some books and dig into them.” The content was stupid, but I didn't need to throw the books away. After writing that comment here, I actually went and sent all of them to paper recycling...

rixed5mo ago

Are you reacting with as much intensity when you walk past any scientific work older than 20 years?

pmkary5mo ago

It's not about the science, I keep all the deprecated or rendered wrong/irrelevant books because they shaped me at some point and I'm proud of that. But finding out an author sitting on your bookshelf can possibly be a child abuser and definitely in-ties with Epstein disgusts me and I no longer keep anything from them.

rixed5mo ago

So that's what it is, isn't it? A FUD campaign against the old political rival who is now dying and unable to defend himself?

I guess there is no point for me asking you if you even cared to look at the "evidence ".