undefined | Better HN

0 pointssimonw2mo ago0 comments

Depends on the participants. If they're cutting-edge LLM users then yes, I think so. If they continue to use LLMs like they would have back in the first half of 2025 I'm not sure if a difference would be noticeable.

0 comments

mkozlows2mo ago

I'm not remotely cutting edge (just switched from Cursor to Codex CLI, have no fancy tooling infrastructure, am not even vaguely considering git worktrees as a means of working), but Opus 4.5 and 5.2 Codex are both so clearly more competent than previous models that I've started just telling them to do high-level things rather than trying to break things down and give them subtasks.

If people are really set in their ways, maybe they won't try anything beyond what old models can do, and won't notice a difference, but who's had time to get set in their ways with this stuff?

christophilus2mo ago

I mostly agree, but today, Opus 4.5 via Claude code did something pretty dumb stuff in my codebase— N queries where one would do, deep array comparison where a reference equality check would suffice, very complex web of nested conditionals which a competent developer would have never written, some edge cases where the backend endpoints didn’t properly verify user permissions before overwriting data, etc.

It’s still hit or miss. The product “worked” when I tested it as a black box, but the code had a lot of rot in it already.

Maybe that stuff no longer matters. Maybe it does. Time will tell.

remich2mo ago

I have had a lot of success lately when working with Opus 4.5 using both the Beads task tracking system and the array of skills under the umbrella of Bad Dave's Robot Army. I don't have a link handy, but you should be able to find it on GitHub. I use the specialized skills for different review tasks (like Architecture Review, Performance Review, Security Review, etc.) on every completed task in addition to my own manual review, and I find that that helps to keep things from getting out of hand.

ManuelKiessling2mo ago

As someone who’s responsible for some very clean codebases and some codebases that grew over many years, warts and all, I always wonder if being subjected to large amounts of not-exactly-wonderful code has the same effect on an LLM that it arguably also has on human developers (myself included occasionally): that they subconsciously lower their normally high bar for quality a bit, as in „well there‘s quite some smells here, let’s go a bit with the flow and not overdo the quality“.

mkozlows2mo ago

I don't think they generally one-shot the tasks; but they do them well enough that you can review the diff and make requests for changes and have it succeed in a good outcome more quickly than if you were spoon-feeding it little tasks and checking them as you go (as you used to have to do).

nineteen9992mo ago

Also not a cutting edge user, but do run my own LLM's at home and have been spending a lot of time with Claude CLI last few months.

It's fine if you want Claude to design your API's without any input, but you'll have less control and when you dig down into the weeds you'll realise it's created a mess.

I like to take both a top-down and bottoms-up approach - design the low level API with Claude fleshing out how it's supposed to work, then design the high level functionality, and then tell it to stop implementing when it hits a problem reconciling the two and the lower level API needs revision.

At least for things I'd like to stand the test of time, if its just a throwaway script or tool I care much less as long as it gets the job done.

drbojingle2mo ago

What's the difference between using llms now vs the first half of 2025 among the best users?

simonwOP2mo ago

Coding agents and much better models. Claude Code or Codex CLI plus Claude Opus 4.5 or GPT 5.2 Codex.

The latest models and harnesses can crunch on difficult problems for hours at a time and get to working solutions. Nothing could do that back in ~March.

I shared some examples in this comment: https://news.ycombinator.com/item?id=46436885

William_BB2mo ago

Ok I will bite.

Every single example you gave is in a hobby project territory. Relatively self-contained, maintainable by 3-4 devs max, within 1k-10k lines of code. I've been successfully using coding agents to create such projects for the past year and it's great, I love it.

However, lots of us here work on codebases that are 100x, 1000x the size of these projects you and Karpathy are talking about. Years of domain specific code. From personal experience, coding agents simply don't work at that scale the same way they do for hobby projects. Over the past year or two, I did not see any significant improvement from any of the newest models.

Building a slightly bigger hobby project is not even close to making these agents work at industrial scale.

7 more replies

epolanski2mo ago

Cool, but most developers do mundane stuff like glueing APIs and implementing business logic, which require oversight and review.

Those crunching hard problems will still review what's produced in search of issues.

1 more reply

mkozlows2mo ago

I was going back and looking at timelines, and was shocked to realize that Claude Code and Cursor's default-to-agentic-mode changes both came out in late February. Essentially the entire history of "mainstream" agentic coding is ten months old.

(This helps me understand better the people who are confused/annoyed/dismissive about it, because I remember how dismissive people were about Node, about Docker, about Postgres, about Linux when those things were new too. So many arguments where people would passionately talk about all those things were irredeemably stupid and only suitable for toy/hobby projects.)

1 more reply

drbojingle2mo ago

Are there techniques though? Tech pairing? Something we know now that we didn't then? Or just better models?

1 more reply

j / k navigate · click thread line to collapse

0 comments

mkozlows2mo ago

If people are really set in their ways, maybe they won't try anything beyond what old models can do, and won't notice a difference, but who's had time to get set in their ways with this stuff?

christophilus2mo ago

It’s still hit or miss. The product “worked” when I tested it as a black box, but the code had a lot of rot in it already.

Maybe that stuff no longer matters. Maybe it does. Time will tell.

remich2mo ago

ManuelKiessling2mo ago

mkozlows2mo ago

nineteen9992mo ago

Also not a cutting edge user, but do run my own LLM's at home and have been spending a lot of time with Claude CLI last few months.

It's fine if you want Claude to design your API's without any input, but you'll have less control and when you dig down into the weeds you'll realise it's created a mess.

At least for things I'd like to stand the test of time, if its just a throwaway script or tool I care much less as long as it gets the job done.

drbojingle2mo ago

What's the difference between using llms now vs the first half of 2025 among the best users?

simonwOP2mo ago

Coding agents and much better models. Claude Code or Codex CLI plus Claude Opus 4.5 or GPT 5.2 Codex.

The latest models and harnesses can crunch on difficult problems for hours at a time and get to working solutions. Nothing could do that back in ~March.

I shared some examples in this comment: https://news.ycombinator.com/item?id=46436885

William_BB2mo ago

Ok I will bite.

Building a slightly bigger hobby project is not even close to making these agents work at industrial scale.

7 more replies

epolanski2mo ago

Cool, but most developers do mundane stuff like glueing APIs and implementing business logic, which require oversight and review.

Those crunching hard problems will still review what's produced in search of issues.

1 more reply

mkozlows2mo ago

1 more reply

drbojingle2mo ago

Are there techniques though? Tech pairing? Something we know now that we didn't then? Or just better models?

1 more reply

j / k navigate · click thread line to collapse