These look like a digital Rorschach test. With the minor difference that a human "knows" the image is not really an object and is being asked to interpret in some way; where the machine has no "knowledge" nor "understanding" but has an imperative to match this input to
something in its repertoire of knowledge.
As a human, if you asked me to match these synthetic images to a most-likely real-world match, then my responses will also look "confused."