If people are really set in their ways, maybe they won't try anything beyond what old models can do, and won't notice a difference, but who's had time to get set in their ways with this stuff?
It’s still hit or miss. The product “worked” when I tested it as a black box, but the code had a lot of rot in it already.
Maybe that stuff no longer matters. Maybe it does. Time will tell.
It's fine if you want Claude to design your API's without any input, but you'll have less control and when you dig down into the weeds you'll realise it's created a mess.
I like to take both a top-down and bottoms-up approach - design the low level API with Claude fleshing out how it's supposed to work, then design the high level functionality, and then tell it to stop implementing when it hits a problem reconciling the two and the lower level API needs revision.
At least for things I'd like to stand the test of time, if its just a throwaway script or tool I care much less as long as it gets the job done.
The latest models and harnesses can crunch on difficult problems for hours at a time and get to working solutions. Nothing could do that back in ~March.
I shared some examples in this comment: https://news.ycombinator.com/item?id=46436885
Every single example you gave is in a hobby project territory. Relatively self-contained, maintainable by 3-4 devs max, within 1k-10k lines of code. I've been successfully using coding agents to create such projects for the past year and it's great, I love it.
However, lots of us here work on codebases that are 100x, 1000x the size of these projects you and Karpathy are talking about. Years of domain specific code. From personal experience, coding agents simply don't work at that scale the same way they do for hobby projects. Over the past year or two, I did not see any significant improvement from any of the newest models.
Building a slightly bigger hobby project is not even close to making these agents work at industrial scale.
Those crunching hard problems will still review what's produced in search of issues.
(This helps me understand better the people who are confused/annoyed/dismissive about it, because I remember how dismissive people were about Node, about Docker, about Postgres, about Linux when those things were new too. So many arguments where people would passionately talk about all those things were irredeemably stupid and only suitable for toy/hobby projects.)