CLIP+VQGAN generation IIRC works by replacing the adversarial network with CLIP, so it understands text prompts, then retraining it for a while towards the prompted target, then generating whatever it's learned from that.
GANs are a silly idea that shouldn't work but somehow do. There's some attempts to replace the idea: https://www.microsoft.com/en-us/research/blog/unlocking-new-...
Makes sense to me as far as avoiding a sort of maximized sunset that is always there and is SUNSET rather than a nice sunset... but also avoiding watering it down and getting a way too subtle sunset.
It's not AI but I've been watching some folks solving / trying to solve some routing (vehicles) problems and you get the "this looks like it was maximized for X" kind of solution but that's maybe not what is important / customer perception is unpredictable. I kinda want to just come up with 3 solutions and let someone randomly click .... in fact i see some software do that at times.