We are no more in touch with physical reality than an LLM, unless you are in the habit of pressing your brain against things. Everything is interpreted through a symbolic map.
Models see when photons hit camera sensors, you see when photons hit your retina. Both of them are some kind of sight.
When photons hit the retina, the same kind of photochemical transduction happens — but the signal does not stop at measurement. It flows through a living system that integrates it with memory, emotion, context, and self-awareness. The brain does not just register and store the light, it constructs an experience of seeing, a subjective phenomenon — qualia.
Once models start continuously learning from visual subjective experience, hit me up – and I'll tell you the models "see objects" now. Until direct raw photovoltaic information stream about the world around them without any labelling can actually make model to learn anything, they are not even close to "seeing".