I have never heard anybody successfully using LLMs say this before. Most of what I've learned from talking to people about their workflows is counterintuitive and subtle.
It's a really weird way to open up an article concluding that LLMs make one a worse programmer: "I definitely know how to use this tool optimally, and I conclude the tool sucks". Ok then. Also: the piano is a terrible, awful instrument; what a racket it makes.
I find it funny when people ask me if it's true that they can build an app using an LLM without knowing how to code. I think of this... that it took me months before I started feeling like I "got it" with fitting LLMs into my coding process. So, not only do you need to learn how to code, but getting to the point that the LLM feels like a natural extension of you has its own timeline on top.
What does this even mean?
In the first one and half years after ChatGPT released, when I used them there was a 100% rate, when they lied to me, I completely missed this honeymoon phase. The first time when it answered without problems was about 2 months ago. And that time was the first time when it answered one of them (ChatGPT) better than Google/Kagi/DDG could. Even yesterday, I tried to force Claude Opus to answer when is the next concert in Arena Wien, and it failed miserably. I tried other models too from Anthropic, and all failed. It successfully parsed the page of next events from the venue, then failed miserably. Sometimes it answered with events from the past, sometimes events in October. The closest was 21 August. When I asked what’s on 14 August, it said sorry, I’m right. When I asked about “events”, it simply ignored all of the movie nights. When I asked about them specifically, it was like I would have started a new conversation.
The only time when they made anything comparable to my code of quality was when they got a ton of examples of tests which looked almost the same. Even then, it made mistakes… when basically I had to change two lines, so copy pasting would have been faster.
There was an AI advocate here, who was so confident in his AI skill, that he showed something exact, which most of the people here try to avoid: recorded how he works with AIs. Here is the catch: he showed the same thing. There were already examples, he needed minimal modifications for the new code. And even then, copy pasting would have been quicker, and would have contained less mistakes… which he kept in the code, because it didn’t fail right away.
I feel like every time I have a prompt or use a new tool, I'm experimenting with how to make fire for the first time. It's not to say that I'm bad at it. I'm probably better than most people. But knowing how to use this tool is by far the largest challenge, in my opinion.
That's a wild statement. I'm now extremely productive with LLMs in my core codebases, but it took a lot of practice to get it right and repeatable. There's a lot of little contextual details you need to learn how to control so the LLM makes the right choices.
Whenever I start working in a new code base, it takes a a non-trivial amount of time to ramp back up to full LLM productivity.
I am still hesitant using AI for solving problems for me. Either it hallucinates and misleads me. Or it does a great job and I worry that my ability of reasoning through complex problems with rigor will degenerate. When my ability of solving complex problems degenerated, patience diminished, attention span destroyed, I will become so reliant on a service that other entities own to perform in my daily life. Genuine question - are people comfortable with this?
Getting 80% of the benefit of LLMs is trivial. You can ask it for some functions or to write a suite of unit tests and you’re done.
The last 20%, while possible to attain, is ultimately not worth it for the amount of time you spend in context hells. You can just do it yourself faster.
> Whenever I start working in a new code base, it takes a a non-trivial amount of time to ramp back up to full LLM productivity.
Do you find that these details translate between models? Sounds like it doesn't translate across codebases for you?
I have mostly moved away from this sort of fine-tuning approach because of experience a while ago around OpenAI's ChatGPT 3.5 and 4. Extra work on my end necessary with the older model wasn't with the new one, and sometimes counterintuitively caused worse performance by pointing it at what the way I'd do it vs the way it might have the best luck with. ESPECIALLY for the sycophantic models which will heavily index on "if you suggested that this thing might be related, I'll figure out some way to make sure it is!"
So more recently I generally stick to the "we'll handle a lot of the prompt nitty gritty" for you IDE or CLI agent stuff, but I find they still fall apart with large complex codebases and also that the tricks don't translate across codebases.
Because for all our posturing about being skeptical and data driven we all believe in magic.
Those "counterintuitive non-trivial workflows"? They work about as well as just prompting "implement X" with no rules, agents.md, careful lists etc.
Because 1) literally no one actually measures whether magical incarnations work and 2) it's impossible to make such measurements due to non-determinism
Either I've wasted significant chunks of the past ~3 years of my life or you're missing something here. Up to you to decide which you believe.
I agree that it's hard to take solid measurements due to non-determinism. The same goes for managing people, and yet somehow many good engineering managers can judge if their team is performing well and figure out what levers they can pull to help them perform better.
He is actually recommending Copilot for price/performance reasons and his closing statement is "Don’t fall for the hype, but also, they are genuinely powerful tools sometimes."
So, it just seems like he never really gave a try at how to engineer better prompts that these more advanced models can use.
I've tried a few things that have mostly been positive. Starting with copilot in-line "predictive text on steroids" which works really well. It's definitely faster and more accurate than me typing on a traditional intellisense IDE. For me, this level of AI is cant-lose: it's very easy to see if a few lines of prediction is what you want.
I then did Cursor for a while, and that did what I wanted as well. Multi-file edits can be a real pain. Sometimes, it does some really odd things, but most of the time, I know what I want, I just don't want to find the files, make the edits on all of them, see if it compiles, and so on. It's a loop that you have to do as a junior dev, or you'll never understand how to code. But now I don't feel I learn anything from it, I just want the tool to magically transform the code for me, and it does that.
Now I'm on Claude. Somehow, I get a lot fewer excursions from what I wanted. I can do much more complex code edits, and I barely have to type anything. I sort of tell it what I would tell a junior dev. "Hey let's make a bunch of connections and just use whichever one receives the message first, discarding any subsequent copies". If I was talking to a real junior, I might answer a few questions during the day, but he would do this task with a fair bit of mess. It's a fiddly task, and there are assumptions to make about what the task actually is.
Somehow, Claude makes the right assumptions. Yes, indeed I do want a test that can output how often each of the incoming connections "wins". Correct, we need to send the subscriptions down all the connections. The kinds of assumptions a junior would understand and come up with himself.
I spend a lot of time with the LLM critiquing, rather than editing. "This thing could be abstracted, couldn't it?" and then it looks through the code and says "yeah I could generalize this like so..." and it means instead of spending my attention on finding things in files, I look at overall structure. This also means I don't need my highest level of attention, so I can do this sort of thing when I'm not even really able to concentrate, eg late at night or while I'm out with the kids somewhere.
So yeah, I might also say there's very little learning curve. It's not like I opened a manual or tutorial before using Claude. I just started talking to it in natural language about what it should do, and it's doing what I want. Unlike seemingly everyone else.
The blogging output on the other hand ...
That is not what that paper said, lol.
I read these articles and I feel like I am taking crazy pills sometimes. The person, enticed by the hype, makes a transparently half-hearted effort for just long enough to confirm their blatantly obvious bias. They then act like the now have ultimate authority on the subject to proclaim their pre-conceived notions were definitely true beyond any doubt.
Not all problems yield well to LLM coding agents. Not all people will be able or willing to use them effectively.
But I guess "I gave it a try and it is not for me" is a much less interesting article compared to "I gave it a try and I have proved it is as terrible as you fear".
I have also written a C++ code that has to have a runtime of years, meaning there can be absolutely no memory leaks or bugs whatsoever, or TV stops working. I wouldn't have a language model write any of that, at least not without testing the hell out of it and making sure it makes sense to myself.
It's not all or nothing here. These things are tools and should be used as such.
Ahh, sweet summer child, if I had a nickel for every time I've heard "just hack something together quickly, that's throwaway code", that ended up being a critical lynchpin of a production system - well, I'd probably have at least like a buck or so.
Obviously, to emphasize, this kind of thing happens all the time with human-generated code, but LLMs make the issue a lot worse because it lets you generate a ton of eventual mess so much faster.
Also, I do agree with your primary point (my comment was a bit tongue in cheek) - it's very helpful to know what should be core and what can be thrown away. It's just in the real world whenever "throwaway" code starts getting traction and getting usage, the powers that be rarely are OK with "Great, now let's rebuild/refactor with production usage in mind" - it's more like "faster faster faster".
I then manually declare some functions, JSDoc comments for the return types, imports and stop halfway. By then the agent is able to think, ha!, you plan to replace all the api calls to this composable under the so and so namespace.
It's iterations and context. I don't use them for everything but I find that they help when my brain bandwidth begins to lag or I just need a boilerplate code before engineering specific use cases.
└── Dey well
> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.
Learning how to use LLMs in a coding workflow is trivial to start, but you find you get a bad taste early if you don't learn how to adapt both your workflow and its workflow. It is easy to get a trivially good result and then be disappointed in the followup. It is easy to try to start on something it's not good at and think it's worthless.
The pure dismissal of cursor, for example, means that the author didn't learn how to work with it. Now, it's certainly limited and some people just prefer Claude code. I'm not saying that's unfair. However, it requires a process adaptation.
Not everyone with a different opinion is dumber than you.
With LLMs, the point is to eliminate tedious work in a trivial way. If it’s tedious to get an LLM to do tedious work, you have not accomplished anything.
If the work is not trivial enough for you to do yourself, then using an LLM will probably be a disaster, as you will not be able to judge the final output yourself without spending nearly the same amount of time it takes for you to develop the code on your own. So again, nothing is gained, only the illusion of gain.
The reason people think they are more productive using LLMs to tackle non-trivial problems is because LLMs are pretty good at producing “office theatre”. You look like you’re busy more often because you are in a tight feedback loop of prompting and reading LLM output, vs staring off into space thinking deeply about a problem and occasionally scribbling or typing something out.
We are learning that this is not going to be magic. There are some cases where it shines. If I spend the time, I can put out prototypes that are magic and I can test with users in a fraction of the time. That doesn't mean I can use that for production.
I can try three or four things during a meeting where I am generally paying attention, and look afterwards to see if it's pursuing.
I can have it work through drudgery if I provide it an example. I can have it propose a solution to a problem that is escaping me, and I can use it as a conversational partner for the best rubber duck I've ever seen.
But I'm adapting myself to the tool and I'm adapting the tool to me through learning how to prompt and how to develop guardrails.
Outside of coding, I can write chicken scratch and provide an example of what I want, and have it write a proposal for a PRD. I can have it break down a task, generate a list of proposed tickets, and after I've went through them have it generate them in jira (or anything else with an API). But the more I invest into learning how to use the tool, the less I have to clean up after.
Maybe one day in the future it will be better. However, the time invested into the tool means that 40 bucks of investment (20 into cursor, 20 into gpt) can add 10-15% boost in productivity. Putting 200 into claude might get you another 10% and it can get you 75% in greenfield and prototyping work. I bet that agency work can be sped up as much as 40% for that 200 bucks investment into claude.
That's a pretty good ROI.
And maybe some workloads can do even better. I haven't seen it yet but some people are further ahead than me.
Improving LLM output through better inputs is neither an illusion, nor as easy as learning how to google (entire companies are being built around improving llm outputs and measuring that improvement)
Keep in mind that the first reasoning model (o1) was released less than 8 months ago and Claude Code was released less than 6 months ago.
I have used neural networks since the 1980s, and modern LLM tech simply makes me happy, but there are strong limits to what I will use the current tech for.
Pseudo-random number generators remain one of the most amazing things in computing IMO. Knuth volume 2. One of my favourite books.
LLMs will always suck at writing code that has not be written millions of times before. As soon as you venture slightly offroad, they falter.
That right there is your learning curve! Getting LLMs to write code that's not heavily represented in their training data takes experience and skill and isn't obvious to learn.
People are claiming that it takes time to build the muscles and train the correct footing to push, while I'm here learning mechanical theory and drawing up levers. If one managed to push the rock for one meter, he comes clamoring, ignoring the many who was injured by doing so, saying that one day he will be able to pick the rock up and throw it at the moon.
Also I started in the pre-agents era and so I ended up with a pair-programming paradigm. Now everytime I conceptualize a new task in my head -- whether it is a few lines of data wrangling within a function, or generating an entire feature complete with integration tests -- I instinctively do a quick prompt-vs-manual coding evaluation and seamlessly jump to AI code generation if the prompt "feels" more promising in terms of total time and probability of correctness.
I think one of the skills is learning this kind of continuous evaluation and the judgement that goes with it.
Effective LLM usage these days is about a lot more than just the prompts.
it helps dramatically on finding bugs and issues. perhaps that's trivial to you, but it feels novel as we've only had effective agents in the last couple weeks.
I recently started with fresh project, and until I got to the desired structure I only used AI to ask questions or suggestions. I organized and written most of the code.
Once it started to get into the shape that felt semi-permanent to me, I started a lot of queries like:
```
- Look at existing service X at folder services/x
- see how I deploy the service using k8s/services/x
- see how the docker file for service X looks like at services/x/Dockerfile
- now, I started service Y that does [this and that]
- create all that is needed for service Y to be skaffolded and deployed, follow the same pattern as service X
```
And it would go, read existing stuff for X, then generate all of the deployment/monitoring/readme/docker/k8s/helm/skaffold for Y
With zero to none mistakes. Both claude and gemini are more than capable to do such task. I had both of them generate 10-15 files with no errors, with code being able to be deployed right after (of course service will just answer and not do much more than that)
Then, I will take over again for a bit, do some business logic specific to Y, then again leverage AI to fill in missing bits, review, suggest stuff etc.
It might look slow, but it actually cuts most boring and most error prone steps when developing medium to large k8s backed project.
Whipping up greenfield projects is almost magical, of course. But that’s not most of my work.
My personal experience has been that AI has trouble keeping the scope of the change small and targeted. I have only been using Gemini 2.5 pro though, as we don’t have access to other models at my work. My friend tells me he uses Claud for coding and Gemini for documentation.
Most people I've seen espousing LLMs and agentic workflows as a silver bullet have limited experience with the frameworks and languages they use with these workflows.
My view currently is one of cautious optimism; that LLM workflows will get to a more stable point whereby they ARE close to what the hype suggests. For now, that quote that "LLMs raise the floor, not the ceiling" I think is very apt.
LinkedIn is full of BS posturing, ignore it.
If you go by MBA types on LinkedIn that aren’t really developers or haven’t been in a long time, now they can vibe out some react components or a python script so it’s a revolution.
I tend to strongly agree with the "unpopular opinion" about the IDEs mentioned versus CLI (specifically, aider.chat and Claude Code).
Assuming (this is key) you have mastery of the language and framework you're using, working with the CLI tool in 25 year old XP practices is an incredible accelerant.
Caveats:
- You absolutely must bring taste and critical thinking, as the LLM has neither.
- You absolutely must bring systems thinking, as it cannot keep deep weirdness "in mind". By this I mean the second and third order things that "gotcha" about how things ought to work but don't.
- Finally, you should package up everything new about your language or frameworks since a few months or year before the knowledge cutoff date, and include a condensed synthesis in your context (e.g., Swift 6 and 6.1 versus the 5.10 and 2024's WWDC announcements that are all GPT-5 knows).
For this last one I find it useful to (a) use OpenAI's "Deep Research" to first whitepaper the gaps, then another pass to turn that into a Markdown context prompt, and finally bring that over to your LLM tooling to include as needed when doing a spec or in architect mode. Similarly, (b) use repomap tools on dependencies if creating new code that leverages those dependencies, and have that in context for that work.
I'm confused why these two obvious steps aren't built into leading agentic tools, but maybe handling the LLM as a naive and outdated "Rain Man" type doesn't figure into mental models at most KoolAid-drinking "AI" startups, or maybe vibecoders don't care, so it's just not a priority.
Either way, context based development beats Leroy Jenkins.
It seems to me that currently there are 2 schools of thought:
1. Use repomap and/or LSP to help the models navigate the code base
2. Let the models figure things out with grep
Personally, I am 100% a grep guy, and my editor doesn't even have LSP enabled. So, it is very interesting to see how many of these agentic tools do exactly the same thing.
And Claude Code /init is a great feature that basically writes down the current mental model after the initial round of grep.
On the management side, however, we have all sorts of AI mandates, workshops, social media posts hyping our AI stuff, our whole "product vision" is some AI-hallucinated nightmare that nobody understands, you'd genuinely think we've been doing nothing but AI for the last decade the way we're contorting ourselves to shove "AI" into every single corner of the product. Every day I see our CxOs posting on LinkedIn about the random topic-of-the-hour regarding AI. When GPT-5 launched, it was like clockwork, "How We're Using GPT-5 At $COMPANY To Solve Problems We've Never Solved Before!" mere minutes after it was released (we did not have early access to it lol). Hilarious in retrospect, considering what a joke the launch was like with the hallucinated graphs and hilarious errors like in the Bernoulli's Principle slide.
Despite all the mandates and mandatory shoves coming from management, I've noticed the teams I'm close with (my team included) are starting to push back themselves a bit. They're getting rid of the spam generating PR bots that have never, not once, provided a useful PR comment. People are asking for the various subscriptions they were granted be revoked because they're not using them and it's a waste of money. Our own customers #1 piece of feedback is to focus less on stupid AI shit nobody ever asked for, and to instead improve the core product (duh). I'm even seeing our CTO who was fanboy number 1 start dialing it back a bit and relenting.
It's good to keep in mind that HN is primarily an advertisement platform for YC and their startups. If you check YC's recent batches, you would think that the 1 and only technology that exists in the world is AI, every single one of them mentions AI in one way or another. The majority of them are the lowest effort shit imaginable that just wraps some AI APIs and is calling it a product. There is a LOT of money riding on this hype wave, so there's also a lot of people with vested interests in making it seem like these systems work flawlessly. The less said about LinkedIn the better, that site is the epitome of the dead internet theory.
> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.
How much of your workflow or intuition from 6 months ago is still relevant today? How long would it take to learn the relevant bits today?
Keep in mind that Claude Code was released less than 6 months ago.
If I was starting from fresh today I expect it would take me months of experimentation to get back to where I am now.
Working thoughtfully with LLMs has also helped me avoid a lot of the junk tips ("Always start with 'you are the greatest world expert in X', offer to tip it, ...") that are floating around out there.
Speaking mostly from experience of building automated, dynamic data processing workflows that utilize LLMs:
Things that work with one model, might hurt performance or be useless with another.
Many tricks that used to be necessary in the past are no longer relevant, or only applicable for weaker models.
This isn't me dimissing anyone's experience. It's ok to do things that become obsolete fairly quickly, especially if you derive some value from it. If you try to stay on top of a fast moving field, it's almost inevitable. I would not consider it a waste of time.
The reach is big enough to not care about our feelings. I wish it wasn't this way.
I recall The Mythical Man-Month stating a rough calculation that the average software developer writes about 10 net lines of new, production-ready code per day. For a tool like this going up an order of magnitude to about 100 lines of pretty good internal tooling seems reasonable.
OP sounds a few cuts above the 'average' software developer in terms of skill level. But here we also need to point out a CLI log viewer and querier is not the kind of thing you actually needed to be a top tier developer to crank out even in the pre-LLM era, unless you were going for lnav [1] levels of polish.
[1]: https://lnav.org/
I would rather qualify this statement a bit more - I would say "you can safely ignore if you are not building anything green field or build tools for self". In my experiments in the last one month or so, it is very efficient for building new components (small & medium). Making it efficient for the existing code base is a bit more tricky - you need to make sure it adheres to the way things are coded already, not to leak .env contents to LLMs, building a context from the existing components so that it does not read code every time (leading to cost and time escalations) and so on.
My main issue so far has been understanding the code that is generated. As of now that is the biggest bottleneck in increasing the productivity - i.e it takes a long time to review the code and push. In usual workflow of building, normally by the time the code complexity has increased in the system I would have sufficient mental construction to handle that complexity. I would know the inner workings of code. However if AI generates large piece of code getting into that code is taking a long time
The other thing I disagree with is the coverage of gemnini-cli: if you use gemini-cli for a single long work session, then you must set your Google API key as an environment variable when starting gemini-cli, otherwise you end up after a short while using Gemini-2.5-flash, and that leads to unhappy results. So, use gemini-cli for free for short and focused 3 or 4 minute work sessions and you are good, or pay for longer work sessions, and you are good.
I do have a random off topic comment: I just don’t get it: why do people live all day in an LLM-infused coding environment? LLM based tooling is great, but I view it as something I reach for a few times a day for coding and that feels just right. Separately, for non-coding tasks, reaching for LLM chat environments for research and brainstorming is helpful, but who really needs to do that more than once or twice a day?
The current state of LLM-driven development is already several steps down the path of an end-game where the overwhelming majority of code is written by the machine; our entire HCI for "building" is going to be so far different to how we do it now that we'll look back at the "hand-rolling code era" in a similar way to how we view programming by punch-cards today. The failure modes, the "but it SUCKS for my domain", the "it's a slot machine" etc etc are not-even-wrong. They're intermediate states except where they're not.
The exceptions to this end-game will be legion and exist only to prove the end-game rule.
Do they? I’ve found Clojure-MCP[1] to be very useful. OTOH, I’m not attempting to replace myself, only augment myself.
I like your phrasing of “OTOH, I’m not attempting to replace myself, only augment myself.” because that is my personal philosophy also.
I work mostly in C/C++.
The most valuable improvement of using this kind of tools, for me, is to easily find help when I have to work on boring/tedious tasks or when I want to have a Socratic conversation about a design idea with a not-so-smart but extremely knowledgeable colleague.
But for anything requiring a brain, it is almost useless.
* I let the AI do something
* I find bad bug or horrifying code
* I realize I have it too much slack
* hand code for a while
* go back to narrow prompts
* get lazy, review code a bit less add more complexity
* GOTO 1, hopefully with a better instinct for where/how to trust this model
Then over time you hone your instinct on what to delegate and what to handle yourself. And how deeply to pay attention.
It makes your existing strength and mobility greater, but don't be surprised if you fly into space that you will suffocate,
or if you fly over an ocean and run out gas, that you'll sink to the bottom,
or if you fly the suit in your fine glassware shop with patrons in the store, that your going to break and burn everything/everyone in there.
In case it matters, I was using Copilot that is for 'free' because my dayjob is open source, and the model was Claude Sonnet 3.7. I've not yet heard anyone else saying the same as me which is kind of peculiar.
I haven't found that to be true with my most recent usage of AI. I do a lot of programming in D, which is not popular like Python or Javascript, but Copilot knows it well enough to help me with things like templates, metaprogramming, and interoperating with GCC-produced DLL's on Windows. This is true in spite of the lack of a big pile of training data for these tasks. Importantly, it gets just enough things wrong when I ask it to write code for me that I have to understand everything well enough to debug it.
Either I'm extremely lucky or I was lucky to find the guy who said it must all be test driven and guided by the usual principles of DRY etc. Claude Code works absolutely fantastically nine out of 10 times and when it doesn't we just roll back the three hours of nonsense it did postpone this feature or give it extra guidance.
If there's a test suite for the thing to run it's SO much less likely to break other features when it's working. Plus it can read the tests and use them to get a good idea about how everything is supposed to work already.
Telling Claude to write the test first, then execute it and watch it fail, then write the implementation has been giving me really great results.
Almost like hiring and scaling a team? There are also benchmarks that specifically measure this, and its in theory a very temporary problem (Aider Polyglot Benchmark is one such).
It’s mostly on point though. Although, in recent years I’ve been assigned to manage and plan projects at work, and the skills I’ve learnt from that greatly help to get effective results from an LLM I think.
It’s not perfect but it’s okay.
Like if you need to crap out a UI based on a JSON payload, make a service call, add a server endpoint, LLMs will typically do this correctly in one shot. These are common operations that are easily extrapolated from their training data. Where they tend to fail are tasks like business logic which have specific requirements that aren’t easily generalized.
I’ve also found that writing the scaffolding for the code yourself really helps focus the agent. I’ll typically add stubs for the functions I want, and create overall code structure, then have the agent fill the blanks. I’ve found this is a really effective approach for preventing the agent from going off into the weeds.
I also find that if it doesn’t get things right on the first shot, the chances are it’s not going to fix the underlying problems. It tends to just add kludges on top to address the problems you tell it about. If it didn’t get it mostly right at the start, then it’s better to just do it yourself.
All that said, I find enjoyment is an important aspect as well and shouldn’t be dismissed. If you’re less productive, but you enjoy the process more, then I see that as a net positive. If all LLMs accomplish is to make development more fun, that’s a good thing.
I also find that there's use for both terminal based tools and IDEs. The terminal REPL is great for initially sketching things out, but IDE based tooling makes it much easier to apply selective changes exactly where you want.
As a side note, got curious and asked GLM-4.5 to make a token field widget with React, and it did it in one shot.
It's also strange not to mention DeepSeek and GLM as options given that they cost orders of magnitude less per token than Claude or Gemini.
I use clojure for my day-to-day work, and I haven't found this to be true. Opus and GPT-5 are great friends when you start pushing limits on Clojure and the JVM.
> Or 4.1 Opus if you are a millionaire and want to pollute as much possible
I know this was written tongue-in-cheek, but at least in my opinion it's worth it to use the best model if you can. Opus is definitely better on harder programming problems.
> GPT 4.1 and 5 are mostly bad, but are very good at following strict guidelines.
This was interesting. At least in my experience GPT-5 seemed about as good as Opus. I found it to be _less_ good at following strict guidelines though. In one test Opus avoided a bug by strictly following the rules, while GPT-5 missed.
I'm sprry, but I disagree with this claim. That is not my experience, nor many others. It's true that you can make them do something without learning anything. However, it takes time to learn what they are good amd bad at, what information they need, and what nonsense they'll do without express guidance. It also takes time to know what to look for when reviewing results.
I also find that they work fine for languages without static types. You need need tests, yes, but you need them anyway.
Some comments here are reminiscent of antiquated discourse: "how many angels dance on the head of a pin?"
We somehow are trying to agree on some factual ramp-up time required for a dev to become competent coding with LLM's. This is inherently subjective! Why bother?
Perhaps certain LLMs are blessed with disproportionately more angels (nee "bugs") in the machines.
I enjoyed reading the article:
"The model looks good, but Google’s enshittification has won and it looks like no competent software developers are left. I would know, many of my friends work there."
Yikes!
Credit to the author for having the courage to post publically.
That was an unnecessary guilt-shaming remark.
It becomes farcical when not only are you missing the big thing but you're also proud of your ignorance and this guy is both.
When it is mentioned that LLMs "have terrible code organization skills", I think they are referring mainly to the size of the context. It is not the same to develop a module with hundreds of LoCs, one with thousands or one with tens of thousands of LoCs.
I am not very much in favor of skill degradation; I am not aware of a study that validates it in this regard. On the other hand, it is true that agents are constantly evolving, and I don't see any difficulties that cannot be overcome with the current evolutionary race, given that, in the end, coding is one of the most accessible functions for artificial intelligence.