In fact, I find that with a strict feedback loop set up (i.e. a lot of lint rules, a strict type checker and fast unit tests), it will almost always generate what I want.
As someone else said, each step might be pretty stupid, but if you have a fast iteration loop, it can run until everything passes cleanly. My recommendation is to specify what really counts as "done" in your AGENTS.md/CLAUDE.md.