Why do you believe that a system should not need to consume millions of documents in order to be able to make predictions?
For your example, the concepts of driving, night, vision, all need to be clearly understood, as well as how they relate to each other. The idea of 'common sense' is a good example of something which takes years to develop in humans, and develops to varying extents (although driving at night vs at day is one example, driving while drunk and driving while sober is a different one where humans routinely make poor decisions, or have incorrect beliefs).
It's estimated that humans are exposed to around 11 million bits of information per second.
Assuming humans do not process any data while they sleep (which is almost certainly false): newborns are awake for 8 hours per day, so they 'consume' around 40GB of data per day. This ramps up to around 60GB by the time they're 6 months old. That means that in the first month alone, a newborn has processed 1TB of input.
By the age of six months, they're between 6 and 10TB, and they haven't even said their first word yet. Most babies have experienced more than 20TB of sensory input by the time they say their first word.
Often, children are unable to reason even at a very basic level until they have been exposed to more than 100TB of sensory input. GPT-3, by contrast was trained on a corpus of around 570GB worth of text.
We are simply orders of magnitude away from being able to make a meaningful comparison between GPT-3 and humans and determine conclusively that our 'intelligence' is of a different category to the 'intelligence' displayed by GPT-3.