This is a beautiful articulation of a major pet peeve when using these coding tools. One of my first review steps is just looking for all the extra optional arguments it's added instead of designing something good.
To solve this permanently, use a linter and apply a "ratchet" in CI so that the LLM cannot use ignore comments
I inherited some vibe-coded scripts that dealt with AWS services like Bedrock and S3. These scripts needed create various AWS SDK clients. These clients needed to know which account/region to use.
Had this been well designed, there would have some function/module responsible for deciding what account/region to use. This decision point might be complex: it might consider things like environment variables, configuration files, and command line arguments. It might need to impose some precedence among these options. Whatever these details, the decision would be authoritative once made. The rest of the code base should have expected a clear decision and just do what was decided.
Instead, the coding assistant added optional account/region arguments in many submodules. These arguments were nullable. When left unspecified, "convenient" logic did its own environment lookups and similar. The result was many "works on my machine" failures because command-line arguments affected only certain portions of the program, environment variables others, config files still others.
This is grim stuff. It's a ton of code that should not exist, spreading the decision all over the code.
The “what is this trying to do?” has never been harder to answer than before. It creates scenarios where 99% is correct, but the most important area is subtly broken. I prefer it to be human, where 60-80% will be correct, and the problematic areas begin to smell more and more gradually.
In my experience LLMs, at times, may hide the truth from you in a haystack made of needles.
Harder to catch because nothing is factually wrong. You have to ask: could this output have been produced without actually reading my codebase?
I will, often go back, after the fact, and ask for refactors and documentation.
It works. Probably a lot slower than using agents, but I test every step, and it is a lot faster than I would do it, unassisted.
And if you can define "quality" in a way the agent can check against it, it will follow the instructions.
Related, it seems to me that there are two types of tests, the ones created in a TDD style and can be modified and the ones that come from acceptance criteria and should only be changed very carefully.
I use a test harness, and step through the code, look at debug logs, and abuse the code, as much as possible.
Kind of a pain, but I find unit tests are a bit of a "false hope" kind of thing: https://littlegreenviper.com/testing-harness-vs-unit/
https://testing.googleblog.com/2025/10/simplify-your-code-fu...
The key insight of FCIS is that complicated logic with large dependencies leads to a large test suite that runs slowly. The solution is to isolate the complicated logic in the functional core. Test that separately from the simpler, more sequential tests of the imperative shell.
For instance, two things I'm currently working on: - A reasonably complicated indie game project I've been doing solo for four years. - A basic web API exposing data from a legacy database for work.
I can see how the API could be developed mostly by agents - it's a pretty cookie cutter affair and my main value in the equation is just my knowledge of the legacy database in question.
But for the game... man, there's a lot of stuff in there that's very particular when it comes to performance and the logic flow. An example: entities interacting with each other. You have to worry about stuff like the ordering of events within a frame, what assumptions each entity can make about the other's state, when and how they talk to each other given there's job based multi-threading, and a lot of performance constraints to boot (thousands of active entities at once). And that's just a small example from a much bigger iceberg.
I'm pretty confident that if I leaned into using agents on the game I'd spend more time re-explaining things to them than I do just writing the code myself.
Got it, good idea.
You can ask the agent to make 10 different solutions in the time it takes you to make 0.5.
Then you review them based on whatever criteria you feel is right and either throw them all away and do it yourself (maybe with inspiration from the other solutions) or pick one to progress further.
It's basically like hiring a new developer for one task and letting them go right after. They don't know your conventions, your history, or why things are the way they are. The only thing they have is what they can see in the code. Your code quality is basically the prompt now.
This is my take on how to not write slop.
In a blameless postmortem style process, you would look at not just the mistake itself but the factors influencing the mistake and how to mitigate them. E.g., doctor was tired AND the hospital demanded long hours AND the industry has normalized this.
So yes, the programmers need to hold the line AND ALSO the velocity of the tool makes it easy to get tired AND and its confidence and often-good results promote laziness or maybe folks just don’t know better AND it can thrash your context and bounce you around the code base making it hard to remember the subtleties AND on and on.
Anyway, strong agree on “dude, review better” as a key part of the answer. Also work on all this other stuff and understand the cost of VeLOciTy…
And you’re completely right, humans are still the ones in control here. It’s entirely possible to use AI without lowering your standards.
Holy fuck Batman
The useful part is not just asking it to write code, but giving it context: how the codebase got here, what constraints are intentional, where the sharp edges are, and what direction we want to take.
With that guidance, it can be excellent. Without it, it tends to produce changes that make sense in isolation but not in the system.