A 2.5D model is probably less data?
The value provided was, IMO, quite minimal. It was not easier to use or better in any user-facing discernible way, the 3D stuff just felt like a gimmick. 3D photos taken by the phone and viewed on its screen did not feel more lifelike. The color depth and image quality was poor even by the standards of other phones of its era.
While it was very cool and felt very futuristic, it did not feel worth the cost.
I’m curious how big of a step forward this is from the previous state of the art, and at what computational cost.
Also curious if the technique scales well with multiple cameras with overlapping fields of view. That is to say, I assume accuracy can be increase through sensor fusion in the basic sense of averaging errors, but actually molding a cohesive 3D view of a 360° environment and understanding that an object at the end of one frame is the same object from a different perspective at the end of another camera frame.
Obviously this seems like it should be extremely useful for AutoPilot. Compared to the relative inaccuracy of the positional information of adjacent cars on the AutoPilot guidance display that we have today this seems like a big step forward.
I think it’s interesting how the RNN is identifying specific types of objects and then depth mapping them. I assume it can’t just depth map the whole image without that first classification step? I’m thinking like for the Smart Summon application where depth mapping everything around you is pretty crucial and obviously not entirely working at this point.
A few of the models I've shrunk down and posted: https://sketchfab.com/darkphibre
I've used the og Kinect, Kinect v2, and intel realsense d435 and it was much more accurate then all of those.
I hope the researchers are advocating for its ethical use.