This weekend I played with ChatGPT and entered "can you draw a house in ASCII art?" in the prompt.
Sure enough I got a basic outline of a house in ASCII art as a response.
I then asked ChatGPT for a more precise outline and it added a horizontal line in the middle of the house saying it "added a floor" and mentioned smoothing the slopes of the roof to make it more house-like (although I did not notice a visual difference for the roof).
If the A.I was only trained on text how can it infer the visual outline of a physical object?