Indeed, your example explains what's going on perfectly.
Generative AI images are "plausible description generators" with a human in the loop.
They aren't trying to draw something, they're trying to get you to call what they draw something.
Given a prompt from a human, they produce an image likely to be labeled as such by a human.
"Well sure that's a Mexican person but I meant..." is not a valid caption.