Are people really attempting to have LLMs replace vision models in robots, and trying to agentically make a robot work with an LLM?? This seems really silly to me, but perhaps I am mistaken.
The only other thing I could think of is real-time translation during special ops with parabolic microphones and AR goggles...