If sensor data were the problem, computers could easily outperform humans since we have sensors that generate much more detailed data than the human senses: High-resolution cameras, multi-spectral and thermal imaging, x-rays, radar, etc.
The actual difference is that when shown a picture and told "this is a cat", humans already know what to look for. Even if a human has never seen a cat before, they will not, for example, examine the background of the photo, or the floor the cat is lying on. They will also instinctively derive analogies from similar animals they already know, and deduce lots of correct information about that "cat" without needing to be told explicitly.