undefined | Better HN

0 pointsmaccard3mo ago0 comments

I feel like we’ve been hearing this for 4 years now. The improvements to programming (IME) haven’t come from improved models, they’ve come from agents, tooling, and environment integrations.

0 comments

bigiain3mo ago

> I feel like we’ve been hearing this for 4 years now.

I feel we were hearing very similar claims 40 years ago, about how the next version of "Fourth Generation Languages" were going to enable business people and managers to write their own software without needing pesky programmers to do it for them. They'll "just" need to learn how to specify the problem sufficiently well.

(Where "just" is used in it's "I don't understand the problem well enough to know how complicated or difficult what I'm about to say next is" sense. "Just stop buying cigarettes, smoker!", "Just eat less and exercise more, fat person!", "Just get a better paying job, poor person!", "Just cheer up, depressed person!")

elAhmo3mo ago

Both is true, models have also been significantly improved in the last year alone, let's not even talk about 4 years ago. Agents, tooling and other sugar on top is just that - enabling more efficient and creative usage, but let's not undermine how much better models today are compared to what was available in the past.

fragmede3mo ago

The code that's generated when given a long leash is still crap. But damned if I didn't use a JIRA mcp and a gitlab mcp, and just have the corporate AI just "do" a couple of well defined and well scoped tickets, including interacting with JIRA to get the ticket contents, update its progress, push to gitlab, and open an MR. Then, the corporate CodeRabbit does a first pass code review against the code so any glaring errors are stomped out before a human can review it. What's more scary though is that the JIRA tickets were created by a design doc that was half AI generated in the first place. The human proposed something, the AI asked clarifying questions, then broke the project down into milestones and then tickets, and then created the epic and issues on JIRA. One of my tradie friends taking an HVAC class tells me that there are a couple of programmers in his class looking to switch careers. I don't know what the future brings, but those programmers (sorry, "software developers") may have the right idea.

llmslave23mo ago

Yes we get it, there is a ton of "work" being done in corporate environments, in which the slop that generative AI churns out is similar to the slop that humans churn out. Congrats.

majormajor3mo ago

How do you judge model improvements vs tooling improvements?

If not working at one of the big players or running your own, it appears that even the APIs these days are wrapped in layers of tooling and abstracting raw model access more than ever.

dwohnitmok3mo ago

> even the APIs these days are wrapped in layers of tooling and abstracting raw model access more than ever.

No, the APIs for these models haven't really changed all that much since 2023. The de facto standard for the field is still the chat completions API that was released in early 2023. It is almost entirely model improvements, not tooling improvements that are driving things forward. Tooling improvements are basically entirely dependent on model improvements (if you were to stick GPT-4, Sonnet 3.5, or any other pre-2025 model in today's tooling, things would suck horribly).

dwohnitmok3mo ago

> The improvements to programming (IME) haven’t come from improved models, they’ve come from agents, tooling, and environment integrations.

I disagree. This almost entirely model capability increases. I've stated this elsewhere: https://news.ycombinator.com/item?id=46362342

Improved tooling/agent scaffolds, whatever, are symptoms of improved model capabilities, not the cause of better capabilities. You put a 2023-era model such as GPT-4 or even e.g. a 2024-era model such as Sonnet 3.5 in today's tooling and they would crash and burn.

The scaffolding and tooling for these models have been tried ever since GPT-3 came out in 2020 in different forms and prototypes. The only reason they're taking off in 2025 is that models are finally capable enough to use them.

kasey_junk3mo ago

Yet when you compare the same model in 2 different agents you can easily see capability differences. But cross (same tier) model in the same agent is much less stark.

My personal opinion is that there was a threshold earlier this year where the models got basically competent enough to be used for serious programming work. But all the major on the ground improvements since then has gone from the agents, and not all agents are equal, while all sota models are effectively.

dwohnitmok3mo ago

> Yet when you compare the same model in 2 different agents you can easily see capability differences.

Yes definitely. But this is to be expected. Heck take the same person and put them in two different environments and they'll have very different performance!

> But cross (same tier) model in the same agent is much less stark.

Unclear what you mean by this. I do agree that the big three companies (OpenAI, Anthropic, Google DeepMind) are all more or less neck and neck in SOTA models, but every new generation has been a leap. They just keep leaping over each other.

If you compare e.g. Opus 4.1 and Opus 4.5 in the same agent harness, Opus 4.5 is way better. If you compare Gemini 3 Pro and Gemini 2.5 Pro in the same agent harness, Gemini 3 is way better. I don't do much coding or benchmarking with OpenAI's family of models, but anecdotally have heard the same thing going from GPT-5 to GPT-5.2.

The on the ground improvements have been coming primarily from model improvements, not harness improvements (the latter is unlocked by the former). Again, it's not that there were breakthroughs in agent frameworks that happened; all the ideas we're seeing now have all been tried before. Models simply weren't capable enough to actually use them. It's just that more and more (pre-tried!) frameworks are starting to make sense now. Indeed, there are certain frameworks and workflows that simply did not make sense with Q2-Q3 2025 models that now make sense with Q4 2025 models.

1 more reply

j / k navigate · click thread line to collapse

0 comments

bigiain3mo ago

> I feel like we’ve been hearing this for 4 years now.

elAhmo3mo ago

fragmede3mo ago

llmslave23mo ago

Yes we get it, there is a ton of "work" being done in corporate environments, in which the slop that generative AI churns out is similar to the slop that humans churn out. Congrats.

majormajor3mo ago

How do you judge model improvements vs tooling improvements?

If not working at one of the big players or running your own, it appears that even the APIs these days are wrapped in layers of tooling and abstracting raw model access more than ever.

dwohnitmok3mo ago

> even the APIs these days are wrapped in layers of tooling and abstracting raw model access more than ever.

dwohnitmok3mo ago

> The improvements to programming (IME) haven’t come from improved models, they’ve come from agents, tooling, and environment integrations.

I disagree. This almost entirely model capability increases. I've stated this elsewhere: https://news.ycombinator.com/item?id=46362342

kasey_junk3mo ago

Yet when you compare the same model in 2 different agents you can easily see capability differences. But cross (same tier) model in the same agent is much less stark.

dwohnitmok3mo ago

> Yet when you compare the same model in 2 different agents you can easily see capability differences.

Yes definitely. But this is to be expected. Heck take the same person and put them in two different environments and they'll have very different performance!

> But cross (same tier) model in the same agent is much less stark.

1 more reply

j / k navigate · click thread line to collapse