Of more concern to me is that when it's unleashed on the ephemera of coding (Jira tickets, bug reports, update logs) it generates so much noise you need another AI to summarize it for you.
- Proliferation of utils/helpers when there are already ones defined in the codebase. Particularly a problem for larger codebases
- Tests with bad mocks and bail-outs due to missing things in the agent's runtime environment ("I see that X isn't available, let me just stub around that...")
- Overly defensive off-happy-path handling, returning null or the semantic "empty" response when the correct behavior is to throw an exception that will be properly handled somewhere up the call chain
- Locally optimal design choices with very little "thought" given to ownership or separation of concerns
All of these can pretty quickly turn into a maintainability problem if you aren't keeping a close eye on things. But broadly I agree that line-per-line frontier LLM code is generally better than what humans write and miles better than what a stressed-out human developer with a short deadline usually produces.
But of course it doesn't do that becaude we can't trust it the way we do a traditional compiler. Someone has to validate its output, meaning it most certainly IS meant for humans. Maybe that will change someday, but we're not there yet.