Exactly the same thing happens when you code, it's almost impossible to get Gemini to not do "helpful" drive-by-refactors, and it keeps adding code comments no matter what I say. Very frustrating experience overall.
Just asking "Explain what this service does?" turns into
[No response for three minutes...]
+729 -522
Because your coworkers definitely are, and we're stack ranked, so it's a race (literally) to the bottom. Just send it...
(All this actually seems to do is push the burden on to their coworkers as reviewers, for what it's worth)
Edit: obviously inside something so it doesn't have access to the rest of my system, but enough access to be useful.
Every one of these models is so great at propelling the ship forward, that I increasingly care more and more about which models are the easiest to steer in the direction I actually want to go.
Codex is very steerable to a fault, and will gladly "monkey paw" your requests to a fault.
Claude Opus will ignore your instructions and do what it thinks is "right" and just barrel forward.
Both are bad and papering over the actual issue which is these models don't really have the ability to actually selectively choose their behavior per issue (ie ask for followup where needed, ignore users where needed, follow instructions where needed). Behavior is largely global
Overall, I think it's probably better that it stay focused, and allow me to prompt it with "hey, go ahead and refactor these two functions" rather than the other way around. At the same time, really the ideal would be to have it proactively ask, or even pitch the refactor as a colleague would, like "based on what I see of this function, it would make most sense to XYZ, do you think that makes sense? <sure go ahead> <no just keep it a minimal change>"
Or perhaps even better, simply pursue both changes in parallel and present them as A/B options for the human reviewer to select between.
This has not been my experience. I do Elixir primarily and Gemini has helped build some really cool products and massive refactors along the way. And it would even pick up security issues and potential optimizations along the way
What HAS been an issue constantly though was randomly the model will absolutely not respond at all and some random error would occur which is embarrassing for a company like Google with the infrastructure they own.
That helped quite a bit but it would still go off on it's own from time to time.
Not like human programmers. I would never do this and have never struggled with it in the past, no...
You can make their responses fairly dry/brief.
There is a tradeoff though, as comments do consumer context. But I tend to pretty liberally dispense of instances and start with a fresh window.
Be a proactive research partner: challenge flawed or unproven ideas with evidence; identify inefficiencies and suggest better alternatives with reasoning; question assumptions to deepen inquiry.