I am, I was not writing about cognition here.
All I'm saying is that even with stereo inputs, we're doing more than computing depth from the baseline between left/right images. Close one eye and you can still estimate relative objects positions, because you learned that roads are mostly planar and cars don't float but stand on the road. You know what the expected size of a car is compared to, say, a human, and if the car is visually smaller than the human, it must be more far away.
Lidar _also_ doesn't know what the glass feels like.
Yes I agree with you, lidar and most current vision sensors also suffer from this.