For example, I tried asking ChatGPT-4o to commentate a soccer game, but I got pretty bad hallucinations, as the model couldn’t see any new video come in after my instruction.
So when using ChatGPT-4o you’ll have to point the camera first and then ask your question - it won’t work to first ask the question and then point the camera.
(I was able to play with the model early because I work at OpenAI.)