The reason that claude code is “good” is because it can run tests, compile the code, run a linter, etc. If you actually pay attention to what it’s doing, at least in my experience, it constantly fucks up, but can sort of correct itself by taking feedback from outside tools. Eventually it proclaims “Perfect!” (which annoys me to no end), and spits out code that at least looks like it satisfies what you asked for. Then if you just ignore the tests that mock all the useful behaviors out, the amateur hour mistakes in data access patterns, and the security vulnerabilities, it’s amazing!