undefined | Better HN

0 pointsembedding-shape3mo ago0 comments

It's an excellent demonstration of the main issue I have with the Gemini family of models, they always go "above and beyond" to do a lot of stuff, even if I explicitly prompt against it. In this case, most of the SVG ends up consisting not just of a bike and a pelican, but clouds, a sun, a hat on the pelican and so much more.

Exactly the same thing happens when you code, it's almost impossible to get Gemini to not do "helpful" drive-by-refactors, and it keeps adding code comments no matter what I say. Very frustrating experience overall.

0 comments

mullingitover3mo ago

> it's almost impossible to get Gemini to not do "helpful" drive-by-refactors

Just asking "Explain what this service does?" turns into

[No response for three minutes...]

+729 -522

cowmoo7283mo ago

it's also so aggressive about taking out debug log statements and in-progress code. I'll ask it to fill in a new function somewhere else and it will remove all of the half written code from the piece I'm currently working on.

chankstein383mo ago

I ended up adding a "NEVER REMOVE LOGGING OR DEBUGGING INFO, OPT TO ADD MORE OF IT" to my user instructions and that has _somewhat_ fixed the problem but introduced a new problem where, no matter what I'm talking to it about, it tries to add logging. Even if it's not a code problem. I've had it explain that I could setup an ESP32 with a sensor so that I could get logging from it then write me firmware for it.

bratwurst30003mo ago

"I've had it explain that I could setup an ESP32 with a sensor so that I could get logging from it then write me firmware for it." lol did you try it? This so far from everything ratinonal

1 more reply

sd93mo ago

If it's adding too much logging now, have you tried softening the instruction about adding more?

"NEVER REMOVE LOGGING OR DEBUGGING INFO. If unsure, bias towards introducing sensible logging."

Or just

"NEVER REMOVE LOGGING OR DEBUGGING INFO."

quotemstr3mo ago

What. You don't have yours ask for edit approval?

girvo3mo ago

The depressing truth is most I know just run all these tools in /yolo mode or equivalents.

Because your coworkers definitely are, and we're stack ranked, so it's a race (literally) to the bottom. Just send it...

(All this actually seems to do is push the burden on to their coworkers as reviewers, for what it's worth)

embedding-shapeOP3mo ago

You're mixing up two things though. One is what the agent does "locally", wherever that might be (for me it's inside a VM), and second is what code you actually share or as you call "send".

Just because you don't want to gate every change in #1, doesn't mean you're just throwing shit via #2, I'm still reviewing my code as much as before, if not more now, before I consider it ready to be reviewed by others.

But I'm seemingly also one of the few developers who seem to take responsibility of the code I produce, even if AI happens to have coded it.

1 more reply

embedding-shapeOP3mo ago

Who has time for that? This is how I run codex: `codex --sandbox danger-full-access --dangerously-bypass-approvals-and-sandbox --search exec "$PROMPT"`, having to approve each change would effectively destroy the entire point of using an agent, at least for me.

Edit: obviously inside something so it doesn't have access to the rest of my system, but enough access to be useful.

quotemstr3mo ago

I wouldn't even think of letting an agent work in that made. Even the best of them produce garbage code unless I keep them on a tight leash. And no, not a skill issue.

What I don't have time to do is debug obvious slop.

2 more replies

well_ackshually3mo ago

>Who has time for that?

People that don't put out slop, mostly.

1 more reply

mullingitover3mo ago

Ask mode exists, I think the models work on the assumption that if you're allowing edits then of course you must want edits.

BartShoot3mo ago

if you had to ask it obviously needs to refactor code for clarity so next person does not need to ask

kylec3mo ago

"I don't know what did it, but here's what it does now"

moffkalast3mo ago

I've seen Kimi do this a ton as well, so insufferable.

h14h3mo ago

Would be really interesting to see an "Eager McBeaver" bench around this concept. When doing real work, a model's ability to stay within the bounds of a given task has almost become more important than its raw capabilities now that every frontier model is so dang good.

Every one of these models is so great at propelling the ship forward, that I increasingly care more and more about which models are the easiest to steer in the direction I actually want to go.

cglan3mo ago

being TOO steerable is another issue though.

Codex is very steerable to a fault, and will gladly "monkey paw" your requests to a fault.

Claude Opus will ignore your instructions and do what it thinks is "right" and just barrel forward.

Both are bad and papering over the actual issue which is these models don't really have the ability to actually selectively choose their behavior per issue (ie ask for followup where needed, ignore users where needed, follow instructions where needed). Behavior is largely global

kees993mo ago

I my experience Claude gradually stops being opinionated as task at hand becomes more arcane. I frequently add "treat the above as a suggestion, and don't hesitate to push back" to change requests, and it seems to help quite a bit.

cglan3mo ago

Yeah that happens to me too. It’s hard to know where it’s going to break off and follow instructions too well vs use it as a tip. Idk it’s all tiring

h14h3mo ago

For sure. I imagine it'd be pretty difficult to evaluate the "correct" amount of steer-ability. You'd probably just have to measure a delta in eagerness on a single same task between when given highly-specified prompts, and more open-ended prompts. Probably not dissimilar from how artificialanalysis.ai does their "omniscience index".

enobrev3mo ago

I have the same issue. Even when I ask it to do code-reviews and very explicitly tell it not to change files, it will occasionally just start "fixing" things.

mikepurvis3mo ago

I find Copilot leans the other way. It'll myopically focus its work in the exact function I point it at, even when it's clear that adding a new helper would be a logical abstraction to share behaviour with the function right beside it.

Overall, I think it's probably better that it stay focused, and allow me to prompt it with "hey, go ahead and refactor these two functions" rather than the other way around. At the same time, really the ideal would be to have it proactively ask, or even pitch the refactor as a colleague would, like "based on what I see of this function, it would make most sense to XYZ, do you think that makes sense? <sure go ahead> <no just keep it a minimal change>"

Or perhaps even better, simply pursue both changes in parallel and present them as A/B options for the human reviewer to select between.

neya3mo ago

> it's almost impossible to get Gemini to not do "helpful" drive-by-refactors

This has not been my experience. I do Elixir primarily and Gemini has helped build some really cool products and massive refactors along the way. And it would even pick up security issues and potential optimizations along the way

What HAS been an issue constantly though was randomly the model will absolutely not respond at all and some random error would occur which is embarrassing for a company like Google with the infrastructure they own.

embedding-shapeOP3mo ago

Out of curiosity, do you have any public projects (with public source code) you've made exclusively with Gemini, so one could take a look? I've tried a bunch of times to use Gemini to at least finish something small but I always end up sufficiently frustrated to abort it as the instruction-following seems so bad.

neya3mo ago

That's a good idea. I will publish something open source. Gemini has Pro, Thinking and Flash. The pro is the one I'm referring to. The thinking, I haven't used it much for development though.

Yizahi3mo ago

Asking LLM programs to "not do the thing" often results in them tripping and generating output including that "thing", since those are simply the tokens which will enter the input. I always try to rephrase query the way that all my instructions have only "positive" forms - "do only this" or "do it only in that way" or "do it only for those parameters requested" etc. Can't say if that helps much, but it is possible.

kolinko3mo ago

Which is how it works with people as well

tyfon3mo ago

I was using gemini antigravity in opencode a few weeks ago before they started banning everyone for that and I got into the habit of writing "do x, then wait for instructions".

That helped quite a bit but it would still go off on it's own from time to time.

apitman3mo ago

This matches my experience using Gemini CLI to code. It would also frequently get stuck in loops. It was so bad compared to Codex that I feel like I must have been doing something fundamentally wrong.

msteffen3mo ago

> it's almost impossible to get Gemini to not do "helpful" drive-by-refactors

Not like human programmers. I would never do this and have never struggled with it in the past, no...

embedding-shapeOP3mo ago

Fairer comparison would be against other models, which are typically better at instruction following. You say "don't change anything not explicitly mentioned" or "Don't add any new code comments" and they tend to follow that.

JLCarveth3mo ago

Every time I have tried using `gemini-cli` it just thinks endlessly and never actually gives a response.

gavinray3mo ago

Do you have Personalization Instructions set up for your LLM models?

You can make their responses fairly dry/brief.

embedding-shapeOP3mo ago

I'm mostly using them via my own harnesses, so I have full control of the system prompts and so on. And no matter what I try, Gemini keeps "helpfully" adding code comments every now and then. With every other model, "- Don't add code comments" tends to be enough, but with Gemini I'm not sure how I could stop the comments from eventually appearing.

WarmWash3mo ago

I'm pretty sure it writes comments for itself, not for the user. I always let the models comment as much as they want, because I feel it makes the context more robust, especially when cycling contexts often to keep them fresh.

There is a tradeoff though, as comments do consumer context. But I tend to pretty liberally dispense of instances and start with a fresh window.

embedding-shapeOP3mo ago

> I'm pretty sure it writes comments for itself, not for the user

Yeah, that sounds worse than "trying to helpful". Read the code instead, why add indirection in that way, just to be able to understand what other models understand without comments?

1 more reply

metal_am3mo ago

I'd love to hear some examples!

gavinray3mo ago

I use LLM's outside of work primarily for research on academic topics, so mine is:

  Be a proactive research partner: challenge flawed or unproven ideas with evidence; identify inefficiencies and suggest better alternatives with reasoning; question assumptions to deepen inquiry.

zengineer3mo ago

true, whenever I ask Gemini to help me with a prompt for generating an image of XYZ, it generates the image.

j / k navigate · click thread line to collapse

0 comments

mullingitover3mo ago

> it's almost impossible to get Gemini to not do "helpful" drive-by-refactors

Just asking "Explain what this service does?" turns into

[No response for three minutes...]

+729 -522

cowmoo7283mo ago

chankstein383mo ago

bratwurst30003mo ago

"I've had it explain that I could setup an ESP32 with a sensor so that I could get logging from it then write me firmware for it." lol did you try it? This so far from everything ratinonal

1 more reply

sd93mo ago

If it's adding too much logging now, have you tried softening the instruction about adding more?

"NEVER REMOVE LOGGING OR DEBUGGING INFO. If unsure, bias towards introducing sensible logging."

Or just

"NEVER REMOVE LOGGING OR DEBUGGING INFO."

quotemstr3mo ago

What. You don't have yours ask for edit approval?

girvo3mo ago

The depressing truth is most I know just run all these tools in /yolo mode or equivalents.

Because your coworkers definitely are, and we're stack ranked, so it's a race (literally) to the bottom. Just send it...

(All this actually seems to do is push the burden on to their coworkers as reviewers, for what it's worth)

embedding-shapeOP3mo ago

You're mixing up two things though. One is what the agent does "locally", wherever that might be (for me it's inside a VM), and second is what code you actually share or as you call "send".

But I'm seemingly also one of the few developers who seem to take responsibility of the code I produce, even if AI happens to have coded it.

1 more reply

embedding-shapeOP3mo ago

Edit: obviously inside something so it doesn't have access to the rest of my system, but enough access to be useful.

quotemstr3mo ago

I wouldn't even think of letting an agent work in that made. Even the best of them produce garbage code unless I keep them on a tight leash. And no, not a skill issue.

What I don't have time to do is debug obvious slop.

2 more replies

well_ackshually3mo ago

>Who has time for that?

People that don't put out slop, mostly.

1 more reply

mullingitover3mo ago

Ask mode exists, I think the models work on the assumption that if you're allowing edits then of course you must want edits.

BartShoot3mo ago

if you had to ask it obviously needs to refactor code for clarity so next person does not need to ask

kylec3mo ago

"I don't know what did it, but here's what it does now"

moffkalast3mo ago

I've seen Kimi do this a ton as well, so insufferable.

h14h3mo ago

Every one of these models is so great at propelling the ship forward, that I increasingly care more and more about which models are the easiest to steer in the direction I actually want to go.

cglan3mo ago

being TOO steerable is another issue though.

Codex is very steerable to a fault, and will gladly "monkey paw" your requests to a fault.

Claude Opus will ignore your instructions and do what it thinks is "right" and just barrel forward.

kees993mo ago

cglan3mo ago

Yeah that happens to me too. It’s hard to know where it’s going to break off and follow instructions too well vs use it as a tip. Idk it’s all tiring

h14h3mo ago

enobrev3mo ago

I have the same issue. Even when I ask it to do code-reviews and very explicitly tell it not to change files, it will occasionally just start "fixing" things.

mikepurvis3mo ago

Or perhaps even better, simply pursue both changes in parallel and present them as A/B options for the human reviewer to select between.

neya3mo ago

> it's almost impossible to get Gemini to not do "helpful" drive-by-refactors

embedding-shapeOP3mo ago

neya3mo ago

That's a good idea. I will publish something open source. Gemini has Pro, Thinking and Flash. The pro is the one I'm referring to. The thinking, I haven't used it much for development though.

Yizahi3mo ago

kolinko3mo ago

Which is how it works with people as well

tyfon3mo ago

I was using gemini antigravity in opencode a few weeks ago before they started banning everyone for that and I got into the habit of writing "do x, then wait for instructions".

That helped quite a bit but it would still go off on it's own from time to time.

apitman3mo ago

msteffen3mo ago

> it's almost impossible to get Gemini to not do "helpful" drive-by-refactors

Not like human programmers. I would never do this and have never struggled with it in the past, no...

embedding-shapeOP3mo ago

JLCarveth3mo ago

Every time I have tried using `gemini-cli` it just thinks endlessly and never actually gives a response.

gavinray3mo ago

Do you have Personalization Instructions set up for your LLM models?

You can make their responses fairly dry/brief.

embedding-shapeOP3mo ago

WarmWash3mo ago

There is a tradeoff though, as comments do consumer context. But I tend to pretty liberally dispense of instances and start with a fresh window.

embedding-shapeOP3mo ago

> I'm pretty sure it writes comments for itself, not for the user

Yeah, that sounds worse than "trying to helpful". Read the code instead, why add indirection in that way, just to be able to understand what other models understand without comments?

1 more reply

metal_am3mo ago

I'd love to hear some examples!

gavinray3mo ago

I use LLM's outside of work primarily for research on academic topics, so mine is:

  Be a proactive research partner: challenge flawed or unproven ideas with evidence; identify inefficiencies and suggest better alternatives with reasoning; question assumptions to deepen inquiry.

zengineer3mo ago

true, whenever I ask Gemini to help me with a prompt for generating an image of XYZ, it generates the image.

j / k navigate · click thread line to collapse