undefined | Better HN

0 pointsatonse7mo ago0 comments

For day to day coding, I've found Anthropic to be killing it with Sonnet 3.7 and now Sonnet 4, and Claude Code feeling like it has even bigger advantages over when it's used in Cursor (And I can't explain why).

I don't even try to use the OpenAI models because it's felt like night and day.

Hopefully GPT-5 helps them catch up. Although I'm sure there are 100 people that have their own personal "hopefully GPT-5 fixes my personal issue with GPT4"

0 comments

IdealeZahlen7mo ago

Whatever the benchmarks might say, there's something about Claude that seems to deliver consistently (although not always perfect) quite reliable outputs across various coding tasks. I wonder what that 'secret sauce' might be and whether GPT-5 has figured it out too.

weego7mo ago

Agreed, I always give my one pager product briefs to AI to break down into phases and tasks, and then progress trackers. I explicitly prompt for verbose phases, tasks and test plans.

Yesterday without much promoting Claude 4.1 gave me 10 phases, each with 5-12 tasks that could genuinely be used to kanban out a product step by step.

Claude 3.7 sonnet was effectively the same with fewer granular suggestions for programming strategies.

Gemini 2.5 gave me a one pager back with some trivial bullet points in 3 phases, no tasks at all.

o3 did the same as as Gemini, just less coherent.

Claude just has whatever the thing is for now

unshavedyak7mo ago

How are you having claude track these phases/tasks? Eg are you having it write to a TASKS.md and update it after each phase?

1 more reply

SequoiaHope7mo ago

If you have any examples of these one pagers I’d love to see them!

concinds7mo ago

Gemini Pro or Flash?

dudeinhawaii7mo ago

My experience has been that Claude Code is exceptional at tool use (and thus working with agentic IDEs) but... not the smartest coder. It will happy re-invent the wheel, create silos, or generate terrible code that you'll only discover weeks or months later. I've had to rollback weeks of code to discover major edge regressions that Claude had introduced.

Now, someone will say 'add more tests'. Sure. But that's a bandaid.

I find that the 'smarter' models like Gemini and o3 output better quality code overall and if you can afford to send them the entire context in a non-agentic way .. then they'll generate something dramatically superior to the agentic code artifacts.

That said, sometimes you just want speed to proof a concept and Claude is exceptional there. Unfortunately, proof of concepts often... become productionized rather than developers taking a step back to "do it right".

dagss7mo ago

I disagree that tests are bandaids. Humans needs tests to avoid doing regressions. If you avoid tests you are giving the AI a much harder task than what human programmers usually have.

atonseOP7mo ago

That's been my experience too. Even though Gemini also does seem to do the fancy one-shot demo code well, in day to day coding, Claude seems to do a much better job of just understanding how programming actually works, what to do, what not to do, etc.

deadbabe7mo ago

The secret is just better context engineering. There is no other “secret” sauce, all these models are built on the same concepts.

bamboozled7mo ago

Claude is fast too, Gemini isn’t as good and just gets hung up on things Claude doesn’t.

NitpickLawyer7mo ago

Colleagues were saying that horizon alpha and beta were looking better than claude4 for frontend stuff, especially newer frameworks. I think the idea of having full + mini + nano is really good, as long as the smaller ones can reasonably handle small-ish tasks. You'd have your architect / plan whatever sessions with the large one, scoping out regular tasks for the -mini version and then the really easy ones to -nano.

4.1 was almost usable in that fashion. I had 4.1-nano working in cline with really trivial stuff (add logging, take this example and adapt it in this file, etc) and it worked pretty well most of the time.

jstummbillig7mo ago

Well, since (like you pointed out) using the Anthropic models in different settings is not that exciting anymore, the difference is what Claude Code does. It's a good product.

mlsu7mo ago

Claude Code is good because the Anthropic models are trained/finetuned to be good at using it.

jstummbillig7mo ago

Sure, that might be part of it.

pawelduda7mo ago

Yup, Claude has been kicking GPT's ass for months now

octo8887mo ago

Killing it - at what type of coding task? What "bigger advantages" specifically? What is night and day?

atonseOP7mo ago

Refactors, building non-trivial features (you can first write out a spec and have it follow that), understanding my code, writing tests, writing good quality documentation. Reasoning about my existing data model and where to plug into it.

On and on and on. Coming up with test plans, edge cases, accounting for the edge cases in its programming. Programming defensively. Fixing bugs.

octo8887mo ago

Thanks for the detail!

j / k navigate · click thread line to collapse

0 comments

IdealeZahlen7mo ago

weego7mo ago

Agreed, I always give my one pager product briefs to AI to break down into phases and tasks, and then progress trackers. I explicitly prompt for verbose phases, tasks and test plans.

Yesterday without much promoting Claude 4.1 gave me 10 phases, each with 5-12 tasks that could genuinely be used to kanban out a product step by step.

Claude 3.7 sonnet was effectively the same with fewer granular suggestions for programming strategies.

Gemini 2.5 gave me a one pager back with some trivial bullet points in 3 phases, no tasks at all.

o3 did the same as as Gemini, just less coherent.

Claude just has whatever the thing is for now

unshavedyak7mo ago

How are you having claude track these phases/tasks? Eg are you having it write to a TASKS.md and update it after each phase?

1 more reply

SequoiaHope7mo ago

If you have any examples of these one pagers I’d love to see them!

concinds7mo ago

Gemini Pro or Flash?

dudeinhawaii7mo ago

Now, someone will say 'add more tests'. Sure. But that's a bandaid.

dagss7mo ago

I disagree that tests are bandaids. Humans needs tests to avoid doing regressions. If you avoid tests you are giving the AI a much harder task than what human programmers usually have.

atonseOP7mo ago

deadbabe7mo ago

The secret is just better context engineering. There is no other “secret” sauce, all these models are built on the same concepts.

bamboozled7mo ago

Claude is fast too, Gemini isn’t as good and just gets hung up on things Claude doesn’t.

NitpickLawyer7mo ago

jstummbillig7mo ago

Well, since (like you pointed out) using the Anthropic models in different settings is not that exciting anymore, the difference is what Claude Code does. It's a good product.

mlsu7mo ago

Claude Code is good because the Anthropic models are trained/finetuned to be good at using it.

jstummbillig7mo ago

Sure, that might be part of it.

pawelduda7mo ago

Yup, Claude has been kicking GPT's ass for months now

octo8887mo ago

Killing it - at what type of coding task? What "bigger advantages" specifically? What is night and day?

atonseOP7mo ago

On and on and on. Coming up with test plans, edge cases, accounting for the edge cases in its programming. Programming defensively. Fixing bugs.

octo8887mo ago

Thanks for the detail!

j / k navigate · click thread line to collapse