Modern agents can write tests that are meaningful and require the agent to pass them with any change to avoid regressions. Humans can review the code/test the downstream application to ensure it works as intended.
It rarely takes a single prompt to get something done, but the agents can figure out as long as the human is specific about what constitutes accurate.