* "Poetic typography" sample: I paste the prompt, and get an image with the typical lack of coherent text, just mangled letters.
* "Visual Narratives: Robot Writer's Block" - Mangled letters also
* "Visual Narratives: Sally the mailwoman" - not following instructions about camera angle. Sally looks different in each subsequent photo.
* "Meeting Notes with multiple speakers" - I uploaded the exact same audio file and used input 'How many speakers in this audio and what happened?'. gpt4o went off about about audio sample rates, speaker diarization models, torchaudio, and how its execution environment is broken and can't proceed to do it.