undefined | Better HN

0 pointsWorkaccount24mo ago0 comments

It doesn't matter, the real benchmark is taking the community temperature on the model after a few weeks of usage.

0 comments

>"It doesn't matter, the real benchmark is taking the community temperature on the model after a few weeks of usage."

Indeed. It's almost impossible to truly know a model before spending a few million tokens on a real world task. It will take a step-change level advancement at this point for me to trust anything but Claude right now.

epolanski4mo ago

Imho Gemini 2.5 was by far the better model on non-trivial tasks.

oezi4mo ago

To this day, I still don't understand why Claude gets more acclaim for coding. Gemini 2.5 consistently outperformed Claude and ChatGPT mostly because of the much larger context.

WhyOhWhyQ4mo ago

I'm not sure about this. I used gemini and claude for about 12 hours a day for a month and a half straight in an unhealthy programmer bender and claude was FAR superior. It was not really that close. Going to be interesting to test gemini 3 though.

1 more reply

viraptor4mo ago

Different styles of usage? I see Gemini praised for being able to feed the whole project and ask changes. Which is cool and all but... I never do that. Claude for me is better for specific modifications to specific parts of the app. There's a lot of context behind what's "better".

1 more reply

jarjoura4mo ago

Gemini 2.5 and now 3 seem to continue their trend of being horrific in agentic tasks, but almost always impress me with the single first shot request.

Claude Sonnet is way better about following up and making continuous improvements during a long running session.

For some reason Gemini will hard freeze-up on the most random queries, and when it is able to successfully continue past the first call, it only keeps a weird summarized version of its previous run available to itself, even though it's in the payload. It's a weird model.

My take is that, it's world-class at one-shotting, and if a task benefits from that, absolutely use it.

dist-epoch4mo ago

Gemini 2.5 couldn't apply an edit to a file if it's life depended on it.

So unless you love copy/pasting code, Gemini 2.5 was useless for agentic coding.

Great for taking it's output and asking Sonnet to apply it though.

decide10004mo ago

I use Gemini cli, Claude Code and Codex daily. If I present the same bug to all 3, Gemini often is the one missing a part of the solution or drawing the wrong conclusion. I am curious for G3.

nhumrich4mo ago

The secret sauce isn't Claude the model, but Claude code the tool. Harness > model.

1 more reply

artdigital4mo ago

Claude doesn’t gaslight me, or flat out refuses to do something I ask it to because it believes it won’t work anyway. Gemini does

Gemini also randomly just reverts everything because of some small mistake it found, makes assumptions without checking if those are true (eg this lib absolutely HAS TO HAVE a login() method. If we get a compile error it’s my env setup fault)

It’s just not a pleasant model to work with

1 more reply

j / k navigate · click thread line to collapse

0 comments

ramesh314mo ago

>"It doesn't matter, the real benchmark is taking the community temperature on the model after a few weeks of usage."

epolanski4mo ago

Imho Gemini 2.5 was by far the better model on non-trivial tasks.

oezi4mo ago

To this day, I still don't understand why Claude gets more acclaim for coding. Gemini 2.5 consistently outperformed Claude and ChatGPT mostly because of the much larger context.

WhyOhWhyQ4mo ago

1 more reply

viraptor4mo ago

1 more reply

jarjoura4mo ago

Gemini 2.5 and now 3 seem to continue their trend of being horrific in agentic tasks, but almost always impress me with the single first shot request.

Claude Sonnet is way better about following up and making continuous improvements during a long running session.

My take is that, it's world-class at one-shotting, and if a task benefits from that, absolutely use it.

dist-epoch4mo ago

Gemini 2.5 couldn't apply an edit to a file if it's life depended on it.

So unless you love copy/pasting code, Gemini 2.5 was useless for agentic coding.

Great for taking it's output and asking Sonnet to apply it though.

decide10004mo ago

I use Gemini cli, Claude Code and Codex daily. If I present the same bug to all 3, Gemini often is the one missing a part of the solution or drawing the wrong conclusion. I am curious for G3.

nhumrich4mo ago

The secret sauce isn't Claude the model, but Claude code the tool. Harness > model.

1 more reply

artdigital4mo ago

Claude doesn’t gaslight me, or flat out refuses to do something I ask it to because it believes it won’t work anyway. Gemini does

It’s just not a pleasant model to work with

1 more reply

j / k navigate · click thread line to collapse