The bottleneck for self-driving technology isn't sensors - it's AI. Building a car that collects enough sensory data to enable self-driving is easy. Building a car AI that actually drives well in a diverse range of conditions is hard.
I think there is a good chance that what we currently call "AI" is fundamentally not technologically capable of human levels of driving in diverse conditions. It can support and it can take responsibility in certain controlled (or very well known) environments, but we'll need fundamentally new technology to make the jump.
Modern cars can have 360 vision at all times, as a default. With multiple overlapping camera FoVs. Which is exactly what humans use to get near field 3D vision. And far field 3D vision?
The depth-discrimination ability of binocular vision falls off with distance squared. At far ranges, humans no longer see enough difference between the two images to get a reliable depth estimate. Notably, cars can space their cameras apart much further, so their far range binocular perception can fare better.
How do humans get that "3D" at far distances then? The answer is, like it usually is when it comes to perception, postprocessing. Human brain estimates depth based on the features it sees. Not unlike an AI that was trained to predict depth maps from a single 2D image.
If you think that perceiving "inertia and movement" is vital, then you'd be surprised to learn that an IMU that beats a human on that can be found in an average smartphone. It's not even worth mentioning - even non-self-driving cars have that for GPS dead reckoning.
A lot of the problems with driving aren't driving problems. They are other people are stupid problems, and nature is random problems. A good driver has a lot of ability to predict what other drivers are going to do. For example people commonly swerve slightly on the direction they are going to turn, even before putting on a signal. A person swerving in a lane is likely going to continue with dumb actions and do something worse soon. Clouds in the distance may be a sign of rain and that bad road conditions and slower traffic may exist ahead.
Very little of this has to do with the quality of our sensors. Current sensors themselves are probably far beyond what we actually need. It's compute speed (efficiency really) and preemption that give humans an edge, at least when we're paying attention.
Between brightly sunlit snow and a starlit night, we can cover more than 45 stops with the same pair of eyeballs; the very best cinematographic cameras reach something like 16.
In a way it's not a fair comparison, since we're taking into account retinal adaptation, eyelids/eyelashes, pupil constriction. But that's the point - human vision does not use cameras.
Indeed. And the comparison is unnecessarily unfair.
You're comparing the dynamic range of a single exposure on a camera vs. the adaptive dynamic range in multiple environments for human eyes. Cameras do have comparable features: adjustable exposure times and apertures. Additionally cameras can also sense IR, which might be useful for driving in the dark.
A system that replicates the human eye's rapid aperture adjustment and integration of images taken at quickly changing aperture/ filter settings is very much not what Tesla is putting in their cars.
But again, the argument is fine in principle. It's just that you can't buy a camera that performs like the human visual system today.
That means that to view some things better, you have to accept being completely blind to others. That is not a substitute for dynamic range.
Most are minor, but even so - beating that shouldn't be a high bar.
There is no good reason not to use LIDAR with other sensing technologies, because cameras-only just makes the job harder.
They get into less accidents, mile for mile and road type for road type, and the ones they get into trend towards less severe. Why?
Because self-driving cars don't drink and drive.
This is the critical safety edge a machine holds over a human. A top tier human driver in the top shape outperforms this generation of car AIs. But a car AI outperforms the bottom of the barrel human driver - the driver who might be tired, distracted and under influence.
Generally you are comparing Apples and Oranges if you are comparing the safety records of i.e. Waymos to that of the general driving population.
Waymos drive under incredibly favorable circumstances. They also will simply stop or fall back on human intervention if they don't know what to do – failing in their fundamental purpose of driving from point A to point B. To actually get comparable data, you'd have to let Waymos or Teslas do the same type of drives that human drivers do, under the same curcumstances and without the option of simply stopping when they are unsure, which they simply are not capable of doing at the moment.
That doesn't mean that this type of technology is useless. Modern self-driving and adjacent tech can make human drivers much safer. I imagine, it would be quite easy to build some AI tech that has a decent success rate in recognizing inebriated drivers and stopping the cars until they have talked to a human to get cleared for driving. I personally love intelligent lane and distance assistance technology (if done well, which Tesla doesn't in my view). Cameras and other assistive technology are incredibly useful when parking even small cars and I'd enjoy letting a computer do every parking maneuver autonomously until the end of my days. The list could go on.
Waymos have cumulatively driven about 100 million miles without a safety driver as of July 2025 (https://fifthlevelconsulting.com/waymos-100-million-autonomo...) over a span of about 5 years. This is such a tiny fraction of miles driven by US (not to speak of worldwide) drivers during that time, that it can't usefully be expressed. And they've driven these miles under some of the most favorable conditions available to current self-driving technology (completely mapped areas, reliable and stable good weather, mostly slow, inner city driving, etc.). And Waymo themselves have repeatedly said that overcoming the limitations of their tech will be incredibly hard and not guaranteed.
Most non-impaired humans outperform the current gen. The study I saw had FSD at 10x fatalities per mile vs non-impaired drivers.
Not true. Humans also interpret the environment in 3D space. See a Tesla fail against a Wile E. Coyote-inspired mural which humans perceive:
Teslas "interpret the environment in 3D space" too - by feeding all the sensor data into a massive ML sensor fusion pipeline, and then fusing that data across time too.
This is where the visualizers, both the default user screen one and the "Terminator" debugging visualizer, get their data from. They show plain and clear that the car operates in a 3D environment.
You could train those cars to recognize and avoid Wile E. Coyote traps too, but do you really want to? The expected amount of walls set in the middle of the road with tunnels painted onto them is very close to zero.
Let’s also not forget murals like that do exist in real life. And those aren’t foam.
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg...
Additionally, as the other commenter pointed out, trucks often have murals painted on them, either as art or adverts.
https://en.wikipedia.org/wiki/Truck_art_in_South_Asia
https://en.wikipedia.org/wiki/Dekotora
Search for “truck ads” and you’ll find a myriad companies offering the service.
Even at that point, why would you possibly use only cameras though, when you can get far better data by using multiple complementary systems? Humans still crash plenty often, in large part because of how limited our "camera" system can be.
Even if what you're saying is true, which it's not, cameras are so inferior to eyes it's not even funny
Any 2 cameras separated by a few inches.
> dynamic range of an eye
Many cameras nowadays match or exceed the eye in dynamic range. Specially if you consider that cameras can vary their exposure from frame to frame, similar to the eye, but much faster.
Human skull only has two eyesockets, and it can only get this wide. But cars can carry a lot of cameras, and maintain a large fixed distance between them.
Our cameras (also called eyes) have way better dynamic range, focus speed, resolution and movement detection capabilities, Backed by a reduced bandwidth peripheral vision which is also capable of detecting movement.
No camera, incl. professional/medium format still cameras are that capable. I think one of the car manufacturers made a combined tele/wide lens system for a single camera which can see both at the same time, but that's it.
Dynamic range, focus speed, resolution, FoV and motion detection still lacks.
...and that's when we imagine that we only use our eyes.
That’s the mistake Elon Musk made and the same one you’re making here.
Not to mention that humans driving with cameras only is absolutely pathetic. The amount of accidents that occur that are completely avoidable doesn’t exactly inspire confidence that all my car needs to be safe and get me to my destination is a couple cameras.
Elon Musk is right. You can't cram 20 radars, 50 LIDARs and 100 cameras into a car and declare self-driving solved. No amount of sensors can redeem a piss poor driving AI.
Conversely, if you can build an AI that's good enough, then you don't need a lot of sensors. All the data a car needs to drive safely is already there - right in the camera data stream.
So far, every self-driving accident where the self-driving car was found to be at fault follows the same pattern: the car had all the sensory data it needed to make the right call, and it didn't make the right call. The bottleneck isn't in sensors.
You keep insisting that cameras are good enough, but it’s empirically possible since safe autonomous driving AI has not been achieved yet to say that cameras alone collect enough data.
The minimum setup without lidar would be cameras, radar, ultrasonic, GPS/GNSS + IMU.
Redundancy is key. With lidar, multiple sensors cover each other’s weaknesses. If LiDAR is blinded by fog, radar steps in.