undefined | Better HN

0 pointssimonw9mo ago0 comments

The problem with your argument here is that you're effectively saying that developers (like myself) who put effort into figuring out good workflows for coding with LLMs are deceiving themselves, and are effectively wasting their time.

Either I've wasted significant chunks of the past ~3 years of my life or you're missing something here. Up to you to decide which you believe.

I agree that it's hard to take solid measurements due to non-determinism. The same goes for managing people, and yet somehow many good engineering managers can judge if their team is performing well and figure out what levers they can pull to help them perform better.

0 comments

hitarpetar9mo ago

That's not a problem, that is the argument. People are bad at measuring their own productivity. Just because you feel more productive with an LLM does not mean you are. We need more studies and less anecdata

simonwOP9mo ago

I'm afraid all you're going to get from me is anecdata, but I find a lot of it very compelling.

I talk to extremely experienced programmers whose opinions I have valued for many years before the current LLM boom who are now flying with LLMs - I trust their aggregate judgement.

Meanwhile my own https://tools.simonwillison.net/colophon collection has grown to over 120 in just a year and a half, most of which I wouldn't have built at all - and that's a relatively small portion of what I've been getting done with LLMs elsewhere.

Hard to measure productivity on a "wouldn't exist" to "does exist" scale.

kiitos9mo ago

Every time you post about this stuff you get at least as much pushback as you get affirmation, and yet when you discuss anything related to peer responses, you never seem to mention or include any of that negative feedback, only the positive...

1 more reply

0points9mo ago

> my own https://tools.simonwillison.net/colophon collection has grown to over 120

What in the wooberjabbery is this even.

List of single-commit LLM generated stuff. Vibe coded shovelware like animated-rainbow-border [1] or unix-timestamp [2].

Calling these tools seems to be overstating it.

1: https://gist.github.com/simonw/2e56ee84e7321592f79ceaed2e81b...

2: https://gist.github.com/simonw/8c04788c5e4db11f6324ef5962127...

1 more reply

rkomorn9mo ago

The whole debate about LLMs and productivity consistently brings the "don't confuse movement with progress" warning to my mind.

But it was already a warning before LLMs because, as you wrote, people are bad at measuring productivity (among many things).

tptacek9mo ago

Another problem with it is that you could have said the same thing about virtually any advancement in programming over the last 30 years.

Tainnor9mo ago

There have been so many "advances" in software development in the last decades - powerful type systems, null safety, sane error handling, Erlang-style fault tolerance, property testing, model checking, etc. - and yet people continue to write garbage code in unsafe languages with underpowered IDEs.

I think many in the industry have absolutely no clue what they're doing and are bad at evaluating productivity, often prioritising short term delivery over longterm maintenance.

LLMs can absolutely be useful but I'm very concerned that some people just use them to churn out code instead of thinking more carefully about what and how to build things. I wish we had at least the same amount of discussions about those things I mentioned above as we have about whether Opus, Sonnet, GPT5 or Gemini is the best model.

Karrot_Kream9mo ago

> I wish we had at least the same amount of discussions about those things I mentioned above as we have about whether Opus, Sonnet, GPT5 or Gemini is the best model.

I mean we do. I think programmers are more interested in long term maintainable software than its users are. Generally that makes sense, a user doesn't really care how much effort it takes to add features or fix bugs, these are things that programmers care about. Moreover the cost of mistakes of most software is so low that most people don't seem interested in paying extra for more reliable software. The few areas of software that require high reliability are the ones regulated or are sold by companies that offer SLAs or other such reliability agreements.

My observation over the years is that maintainability and reliability are much more important to programmers who comment in online forums than they are to users. It usually comes with the pride of work that programmers have but my observation is that this has little market demand.

2 more replies

troupo9mo ago

> who put effort into figuring out good workflows for coding with LLMs are deceiving themselves, and are effectively wasting their time.

It's quite possible you do. Do you have any hard data justifying the claims of "this works better", or is it just a soft fuzzy feeling?

> The same goes for managing people, and yet somehow many good engineering managers can judge if their team is performing well

It's actually really easy to judge if a team is performing well.

What is hard is finding what actually makes the team perform well. And that is just as much magic as "if you just write the correct prompt everything will just work"

---

wait. why are we fighting again? :) https://dmitriid.com/everything-around-llms-is-still-magical...

KallDrexx9mo ago

I'm not the OP and I"m not saying you are wrong, but I am going to point out that the data doesn't necessarily back up significant productivity improvements with LLMs.

In this video (https://www.youtube.com/watch?v=EO3_qN_Ynsk) they present a slide by the company DX that surveyed 38,880 developers across 184 organizations, and found the surveyed developers claiming a 4 hour average time savings per developer per week. So all of these LLM workflows are only making the average developer 10% more productive in a given work week, with a bunch of developers getting less. Few developers are attaining productivity higher than that.

In this video by stanford researchers actively researching productivity using github commit data for private and public repositories (https://www.youtube.com/watch?v=tbDDYKRFjhk) they have a few very important data points in there:

1. There's zero correlation they've found between how productive respondants claim their productivity is and how it's actually measured, meaning people are poor judges of their own productivity numbers. This does refute the claims on the previous point I made but only if you assume people are wildly more productive then they claim on average.

2. They have been able to measure actual increase in rework and refactoring commits in the repositories measured as AI tools become more in use in those organizations. So even with being able to ship things faster, they are observing increase number of pull requests that need to fix those previous pushes.

3. They have measured that greenfield low complexity systems have pretty good measurements for productivity gains, but once you get more towards higher complexity systems or brownfield systems they start to measure much lower productivity gains, and even negative productivity with AI tools.

This goes hand in hand with this research paper: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o... which had experienced devs in significant long term projects lose productivity when using AI tools, but also completely thought the AI tools were making them even more productivity.

Yes, all of these studies have their flaws and nitpicks we can go over that I'm not interested in rehashing. However, there's a lot more data and studies that show AI having very marginal productivity boost compared to what people claim than vice versa. I'm legitimately interested in other studies that can show significant productivity gains in brownfield projects.

CuriouslyC9mo ago

So far I've found that the people who are hating on AI are stuck maintaining highly coupled that they've invested a significant amount of mental energy internalizing. AI is bad on that type of code, and since they've invested so much energy on understanding the code, it ends up taking longer for them to load context and guide the AI than to just do the work. Their code base is hot coupled garbage, and rather than accept that the tools aren't working because of their own lack of architectural rigor, they just shit on the tools. This is part of the reason that that study of open source maintainers using Cursor didn't consistently produce improvement (also, Cursor is pretty mid).

https://www.youtube.com/watch?v=tbDDYKRFjhk&t=4s is one of the largest studies I've seen so far and it shows that when the codebase is small or engineered for AI use, >20% productivity improvements are normal.

j / k navigate · click thread line to collapse

0 comments

hitarpetar9mo ago

simonwOP9mo ago

I'm afraid all you're going to get from me is anecdata, but I find a lot of it very compelling.

I talk to extremely experienced programmers whose opinions I have valued for many years before the current LLM boom who are now flying with LLMs - I trust their aggregate judgement.

Hard to measure productivity on a "wouldn't exist" to "does exist" scale.

kiitos9mo ago

1 more reply

0points9mo ago

> my own https://tools.simonwillison.net/colophon collection has grown to over 120

What in the wooberjabbery is this even.

List of single-commit LLM generated stuff. Vibe coded shovelware like animated-rainbow-border [1] or unix-timestamp [2].

Calling these tools seems to be overstating it.

1: https://gist.github.com/simonw/2e56ee84e7321592f79ceaed2e81b...

2: https://gist.github.com/simonw/8c04788c5e4db11f6324ef5962127...

1 more reply

rkomorn9mo ago

The whole debate about LLMs and productivity consistently brings the "don't confuse movement with progress" warning to my mind.

But it was already a warning before LLMs because, as you wrote, people are bad at measuring productivity (among many things).

tptacek9mo ago

Another problem with it is that you could have said the same thing about virtually any advancement in programming over the last 30 years.

Tainnor9mo ago

I think many in the industry have absolutely no clue what they're doing and are bad at evaluating productivity, often prioritising short term delivery over longterm maintenance.

Karrot_Kream9mo ago

> I wish we had at least the same amount of discussions about those things I mentioned above as we have about whether Opus, Sonnet, GPT5 or Gemini is the best model.

2 more replies

troupo9mo ago

> who put effort into figuring out good workflows for coding with LLMs are deceiving themselves, and are effectively wasting their time.

It's quite possible you do. Do you have any hard data justifying the claims of "this works better", or is it just a soft fuzzy feeling?

> The same goes for managing people, and yet somehow many good engineering managers can judge if their team is performing well

It's actually really easy to judge if a team is performing well.

What is hard is finding what actually makes the team perform well. And that is just as much magic as "if you just write the correct prompt everything will just work"

---

wait. why are we fighting again? :) https://dmitriid.com/everything-around-llms-is-still-magical...

KallDrexx9mo ago

I'm not the OP and I"m not saying you are wrong, but I am going to point out that the data doesn't necessarily back up significant productivity improvements with LLMs.

CuriouslyC9mo ago

j / k navigate · click thread line to collapse