>
Chomsky reminds us every time we let him: human children learn to understand and use language — an incredibly complex and nuanced domain, to say the least — with shockingly little data and often zero-to-none intentional instruction.Chomsky's arguments about "poverty of the stimulus" rely on using non-probabistic grammars. Norvig discusses this here: https://norvig.com/chomsky.html
> In 1967, Gold's Theorem showed some theoretical limitations of logical deduction on formal mathematical languages. But this result has nothing to do with the task faced by learners of natural language. In any event, by 1969 we knew that probabilistic inference (over probabilistic context-free grammars) is not subject to those limitations (Horning showed that learning of PCFGs is possible).
If I recall correctly, human toddlers hear about 3-13 million spoken words per year, and the higher ranges are correlated with better performance in school. Which:
- Is a lot, in an absolute sense.
- But is still much less training data than LLMs require.
Adult learners moving between English and romance languages can get a pretty decent grasp of the language (C1 or C2 reading ability) with about 3 million words of reading. Which is obviously exploiting transfer learning and prior knowledge, because it's harder in a less related language.
So yeah, humans are impressive. But Chomsky doesn't really seem to have the theoretical toolkit to deal with probabilistic or statistical learning. And LLMs are closer to statistical learning than to Chomsky's formal models.