First, it creates a random 10x10 pixel blurry image and asks a neural net: "Could this be a duck wearing a hat on Mars?" and the neural net replies "No, because all the pictures I've ever seen of Mars have lots of red color in them" so the system tweaks the pixels to make them more red, put some pixels in the center that have a plausible duck color, etc.
After it has a 10x10 image that is a plausible duck on Mars, the system scales the image to 20x20 pixels, and then uses 4 different neural nets on each corner to ask "Does this look like the upper/lower left/right corner of a duck wearing a hat on Mars?" Each neural net is just specialized for one corner of the image.
You keep repeating this with more neural nets until you have a pretty 1000x1000 (or whatever) image.