undefined | Better HN

0 pointsbbor1y ago0 comments

Eh, still doesn’t hold up. I really don’t think there’s many psychologists working on the posited mechanism of simple NN-like backprop learning. Aka conditioning, I guess. As Chomsky reminds us every time we let him: human children learn to understand and use language — an incredibly complex and nuanced domain, to say the least — with shockingly little data and often zero-to-none intentional instruction. We definitely employ principles and patterns that are far more complex (more “emergent”?) than linear regression.

Tho I only ever did undergrad stats, maybe ML isn’t even technically a linear regression at this point. Still, hopefully my gist is clear

0 comments

VirusNewbie1y ago

>human children learn to understand and use language — an incredibly complex and nuanced domain, to say the least — with shockingly little data and often zero-to-none intentional instruction

This isn't accurate comparison imo, because we're mapping language to a world model which was built through a ton of trial and error.

Children aren't understanding language at six months old, there seems to be a minimum amount of experience with physics and the world before language can click for them.

ekidd1y ago

> Chomsky reminds us every time we let him: human children learn to understand and use language — an incredibly complex and nuanced domain, to say the least — with shockingly little data and often zero-to-none intentional instruction.

Chomsky's arguments about "poverty of the stimulus" rely on using non-probabistic grammars. Norvig discusses this here: https://norvig.com/chomsky.html

> In 1967, Gold's Theorem showed some theoretical limitations of logical deduction on formal mathematical languages. But this result has nothing to do with the task faced by learners of natural language. In any event, by 1969 we knew that probabilistic inference (over probabilistic context-free grammars) is not subject to those limitations (Horning showed that learning of PCFGs is possible).

If I recall correctly, human toddlers hear about 3-13 million spoken words per year, and the higher ranges are correlated with better performance in school. Which:

- Is a lot, in an absolute sense.

- But is still much less training data than LLMs require.

Adult learners moving between English and romance languages can get a pretty decent grasp of the language (C1 or C2 reading ability) with about 3 million words of reading. Which is obviously exploiting transfer learning and prior knowledge, because it's harder in a less related language.

So yeah, humans are impressive. But Chomsky doesn't really seem to have the theoretical toolkit to deal with probabilistic or statistical learning. And LLMs are closer to statistical learning than to Chomsky's formal models.

j / k navigate · click thread line to collapse