SHOW AN EXAMPLE OF YOU ACTUALLY DOING WHAT YOU SAY!
It’s not perfect so I still review the code myself, but it helps decrease the number of defects I have to then have the AI correct.
In addition to that, we do use CodeRabbit plugin on GitHub to perform a 4th code review. And we tell all of our agents to not get into gold-plating mode.
You can choose not to use modern tools like these to write software. You can also choose to write software in binary.
by level of compute spend, it might look like:
- ask an LLM in the same query/thread to write code AND tests (not good)
- ask the LLM in different threads (meh)
- ask the LLM in a separate thread to critique said tests (too brittle, testing guidelines, testing implementation and not out behavior, etc). fix those. (decent)
- ask the LLM to spawn multiple agents to review the code and tests. Fix those. Spawn agents to critique again. Fix again.
- Do the same as above, but spawn agents from different families (so Claude calls Gemini and Codex).
—-
these are usually set up as /slash commands like /tests or /review so you aren’t doing this manually. since this can take some time, people might work on multiple features at once.