This reminds of the story of Adam learning names, or how some languages can express a lot more in fewer words. And it makes sense that LLMs look intelligent to us.
My kid loves repeating the names of things he learned recently. For past few weeks, after learning 'spider' and 'snake' and 'dangerous' he keeps finding spiders around, no snakes so makes up snakes from curly drawn lines and tells us they are dangerous.
I think we learn fast because of stereo (3d) vision. I have no idea how these models learn and don't know if 3d vision will make multi model LLMs better and require exponentially less examples.