Egomotion is very useful but relies on being able to reliably extract features from objects which isn't always possible. Smooth, monochromatic walls do exist and it's imperative a car be able to avoid them. It is possible for a human to figure out (almost always) their shape and distance form visual cues but our brains are throwing far more computational horsepower at the task than even Tesla's new computer has available. But perhaps knowing when it doesn't know is sufficient for their purposes and probably an easier task.
An interesting intermediate case between a pure video system and a lidar is a structured light sensor like the Kinect. In those you project a pattern of features onto an object in infrared. Doesn't work so well in sunlight but be interested in learning if someone had ever tried to use that approach with ego motion.