The current state of LLM-driven development (opens in new tab)

(blog.tolki.dev)

222 pointsSignez9mo ago233 comments

233 comments

Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.

I have never heard anybody successfully using LLMs say this before. Most of what I've learned from talking to people about their workflows is counterintuitive and subtle.

It's a really weird way to open up an article concluding that LLMs make one a worse programmer: "I definitely know how to use this tool optimally, and I conclude the tool sucks". Ok then. Also: the piano is a terrible, awful instrument; what a racket it makes.

credit_guy9mo ago

Fully agree. It takes months to learn how to use LLMs properly. There is an initial honeymoon where the LLMs blow your mind out. Then you get some disappointments. But then you start realizing that there are some things that LLMs are good at and some that they are bad at. You start creating a feel for what you can expect them to do. And more importantly, you get into the habit of splitting problems into smaller problems that the LLMs are more likely to solve. You keep learning how to best describe the problem, and you keep adjusting your prompts. It takes time.

physPop9mo ago

it really doesn't take that long. Maybe if you're super junior and never coded before? In that case I'm glad its helping you get into the field. Also, if its taking you months there are whole new models that will get released and you need to learn those quirks again.

2 more replies

gexla9mo ago

Love this, and it's so true. A lot of people don't get this, because it's so nuanced. It's not something that's slowing you down. It's not learning a technical skill. Rather, it's building an intuition.

I find it funny when people ask me if it's true that they can build an app using an LLM without knowing how to code. I think of this... that it took me months before I started feeling like I "got it" with fitting LLMs into my coding process. So, not only do you need to learn how to code, but getting to the point that the LLM feels like a natural extension of you has its own timeline on top.

1 more reply

ruszki9mo ago

> There is an initial honeymoon where the LLMs blow your mind out.

What does this even mean?

In the first one and half years after ChatGPT released, when I used them there was a 100% rate, when they lied to me, I completely missed this honeymoon phase. The first time when it answered without problems was about 2 months ago. And that time was the first time when it answered one of them (ChatGPT) better than Google/Kagi/DDG could. Even yesterday, I tried to force Claude Opus to answer when is the next concert in Arena Wien, and it failed miserably. I tried other models too from Anthropic, and all failed. It successfully parsed the page of next events from the venue, then failed miserably. Sometimes it answered with events from the past, sometimes events in October. The closest was 21 August. When I asked what’s on 14 August, it said sorry, I’m right. When I asked about “events”, it simply ignored all of the movie nights. When I asked about them specifically, it was like I would have started a new conversation.

The only time when they made anything comparable to my code of quality was when they got a ton of examples of tests which looked almost the same. Even then, it made mistakes… when basically I had to change two lines, so copy pasting would have been faster.

There was an AI advocate here, who was so confident in his AI skill, that he showed something exact, which most of the people here try to avoid: recorded how he works with AIs. Here is the catch: he showed the same thing. There were already examples, he needed minimal modifications for the new code. And even then, copy pasting would have been quicker, and would have contained less mistakes… which he kept in the code, because it didn’t fail right away.

thefourthchime9mo ago

I'm glad you feel like you've nailed it. I've been using models to help me code for over two years, and I still feel like I have no idea what I'm doing.

I feel like every time I have a prompt or use a new tool, I'm experimenting with how to make fire for the first time. It's not to say that I'm bad at it. I'm probably better than most people. But knowing how to use this tool is by far the largest challenge, in my opinion.

throwawaybob4209mo ago

Months? That’s actually an insanely long time

otabdeveloper49mo ago

I dunno, man. I think you could have spent that time, you know, learning to code instead.

1 more reply

SkyPuncher9mo ago

> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.

That's a wild statement. I'm now extremely productive with LLMs in my core codebases, but it took a lot of practice to get it right and repeatable. There's a lot of little contextual details you need to learn how to control so the LLM makes the right choices.

Whenever I start working in a new code base, it takes a a non-trivial amount of time to ramp back up to full LLM productivity.

uvdn79mo ago

Is the non-trivial amount of time significantly less than you trying to ramp up yourself?

I am still hesitant using AI for solving problems for me. Either it hallucinates and misleads me. Or it does a great job and I worry that my ability of reasoning through complex problems with rigor will degenerate. When my ability of solving complex problems degenerated, patience diminished, attention span destroyed, I will become so reliant on a service that other entities own to perform in my daily life. Genuine question - are people comfortable with this?

4 more replies

deadbabe9mo ago

He’s not wrong.

Getting 80% of the benefit of LLMs is trivial. You can ask it for some functions or to write a suite of unit tests and you’re done.

The last 20%, while possible to attain, is ultimately not worth it for the amount of time you spend in context hells. You can just do it yourself faster.

2 more replies

majormajor9mo ago

> That's a wild statement. I'm now extremely productive with LLMs in my core codebases, but it took a lot of practice to get it right and repeatable. There's a lot of little contextual details you need to learn how to control so the LLM makes the right choices.

> Whenever I start working in a new code base, it takes a a non-trivial amount of time to ramp back up to full LLM productivity.

Do you find that these details translate between models? Sounds like it doesn't translate across codebases for you?

I have mostly moved away from this sort of fine-tuning approach because of experience a while ago around OpenAI's ChatGPT 3.5 and 4. Extra work on my end necessary with the older model wasn't with the new one, and sometimes counterintuitively caused worse performance by pointing it at what the way I'd do it vs the way it might have the best luck with. ESPECIALLY for the sycophantic models which will heavily index on "if you suggested that this thing might be related, I'll figure out some way to make sure it is!"

So more recently I generally stick to the "we'll handle a lot of the prompt nitty gritty" for you IDE or CLI agent stuff, but I find they still fall apart with large complex codebases and also that the tricks don't translate across codebases.

1 more reply

troupo9mo ago

> I have never heard anybody successfully using LLMs say this before. Most of what I've learned from talking to people about their workflows is counterintuitive and subtle.

Because for all our posturing about being skeptical and data driven we all believe in magic.

Those "counterintuitive non-trivial workflows"? They work about as well as just prompting "implement X" with no rules, agents.md, careful lists etc.

Because 1) literally no one actually measures whether magical incarnations work and 2) it's impossible to make such measurements due to non-determinism

simonw9mo ago

The problem with your argument here is that you're effectively saying that developers (like myself) who put effort into figuring out good workflows for coding with LLMs are deceiving themselves, and are effectively wasting their time.

Either I've wasted significant chunks of the past ~3 years of my life or you're missing something here. Up to you to decide which you believe.

I agree that it's hard to take solid measurements due to non-determinism. The same goes for managing people, and yet somehow many good engineering managers can judge if their team is performing well and figure out what levers they can pull to help them perform better.

5 more replies

roxolotl9mo ago

On top of this a lot of the “learning to work with LLMs” is breaking down tasks into small pieces with clear instructions and acceptance criteria. That’s just part of working efficiently but maybe don’t want to be bothered to do it.

1 more reply

prerok9mo ago

I agree with your assessment about this statement. I actually had to reread it a few times to actually understand it.

He is actually recommending Copilot for price/performance reasons and his closing statement is "Don’t fall for the hype, but also, they are genuinely powerful tools sometimes."

So, it just seems like he never really gave a try at how to engineer better prompts that these more advanced models can use.

rocqua9mo ago

The OPs point seems to be: it's very quick for LLMs to be a net benefit to your skills, if it is a benefit at all. That is, he's only speaking of the very beginning part of the learning curve.

edfletcher_t1379mo ago

The first two points directly contradict each other, too. Learning a tool should have the outcome that one is productive with it. If getting to "productive" is non-trivial, then learning the tool is non-trivial.

enraged_camel9mo ago

Agreed. This is an astonishingly bad article. It's clear that the only reason it made it to the front page is because people who view AI with disdain or hatred upvoted it. Because as you say: how can anyone make authoritative claims about a set of tools not just without taking the time to learn to use them properly, but also believing that they don't even need to bother?

hislaziness9mo ago

Would it be more appropriate to compare LLMs to Autotunes rather than pianos?

lordnacho9mo ago

I've said it before, I feel like I'm some sort of lottery winner when it comes to LLM usage.

I've tried a few things that have mostly been positive. Starting with copilot in-line "predictive text on steroids" which works really well. It's definitely faster and more accurate than me typing on a traditional intellisense IDE. For me, this level of AI is cant-lose: it's very easy to see if a few lines of prediction is what you want.

I then did Cursor for a while, and that did what I wanted as well. Multi-file edits can be a real pain. Sometimes, it does some really odd things, but most of the time, I know what I want, I just don't want to find the files, make the edits on all of them, see if it compiles, and so on. It's a loop that you have to do as a junior dev, or you'll never understand how to code. But now I don't feel I learn anything from it, I just want the tool to magically transform the code for me, and it does that.

Now I'm on Claude. Somehow, I get a lot fewer excursions from what I wanted. I can do much more complex code edits, and I barely have to type anything. I sort of tell it what I would tell a junior dev. "Hey let's make a bunch of connections and just use whichever one receives the message first, discarding any subsequent copies". If I was talking to a real junior, I might answer a few questions during the day, but he would do this task with a fair bit of mess. It's a fiddly task, and there are assumptions to make about what the task actually is.

Somehow, Claude makes the right assumptions. Yes, indeed I do want a test that can output how often each of the incoming connections "wins". Correct, we need to send the subscriptions down all the connections. The kinds of assumptions a junior would understand and come up with himself.

I spend a lot of time with the LLM critiquing, rather than editing. "This thing could be abstracted, couldn't it?" and then it looks through the code and says "yeah I could generalize this like so..." and it means instead of spending my attention on finding things in files, I look at overall structure. This also means I don't need my highest level of attention, so I can do this sort of thing when I'm not even really able to concentrate, eg late at night or while I'm out with the kids somewhere.

So yeah, I might also say there's very little learning curve. It's not like I opened a manual or tutorial before using Claude. I just started talking to it in natural language about what it should do, and it's doing what I want. Unlike seemingly everyone else.

bgwalter9mo ago

Pianists' results are well known to be proportional to their talent/effort. In open source hardly anyone is even using LLMs and the ones that do have barely any output, In many cases less output than they had before using LLMs.

The blogging output on the other hand ...

FeepingCreature9mo ago

> In open source hardly anyone is even using LLMs and the ones that do have barely any output, In many cases less output than they had before using LLMs.

That is not what that paper said, lol.

1 more reply

stillpointlab9mo ago

I agree with you and I have seen this take a few times now in articles on HN, which amounts to the classic: "We've tried nothing and we're all out of ideas" Simpson's joke.

I read these articles and I feel like I am taking crazy pills sometimes. The person, enticed by the hype, makes a transparently half-hearted effort for just long enough to confirm their blatantly obvious bias. They then act like the now have ultimate authority on the subject to proclaim their pre-conceived notions were definitely true beyond any doubt.

Not all problems yield well to LLM coding agents. Not all people will be able or willing to use them effectively.

But I guess "I gave it a try and it is not for me" is a much less interesting article compared to "I gave it a try and I have proved it is as terrible as you fear".

throwawaybob4209mo ago

Judging from all the comments here, it’s going to be amazing seeing the fallout of all the LLM generated code in a year or so. The amount of people who seemingly relish the ability to stop thinking and let the model generate giant chunks of their code base, is uh, something else lol.

thefourthchime9mo ago

It entirely depends on the exposure and reliability the code needs. Some code is just a one-off to show a customer what something might look like. I don't care at all how well the code works or what it looks like for something like that. Rapid prototyping is a valid use case for that.

I have also written a C++ code that has to have a runtime of years, meaning there can be absolutely no memory leaks or bugs whatsoever, or TV stops working. I wouldn't have a language model write any of that, at least not without testing the hell out of it and making sure it makes sense to myself.

It's not all or nothing here. These things are tools and should be used as such.

hn_throwaway_999mo ago

> It entirely depends on the exposure and reliability the code needs.

Ahh, sweet summer child, if I had a nickel for every time I've heard "just hack something together quickly, that's throwaway code", that ended up being a critical lynchpin of a production system - well, I'd probably have at least like a buck or so.

Obviously, to emphasize, this kind of thing happens all the time with human-generated code, but LLMs make the issue a lot worse because it lets you generate a ton of eventual mess so much faster.

Also, I do agree with your primary point (my comment was a bit tongue in cheek) - it's very helpful to know what should be core and what can be thrown away. It's just in the real world whenever "throwaway" code starts getting traction and getting usage, the powers that be rarely are OK with "Great, now let's rebuild/refactor with production usage in mind" - it's more like "faster faster faster".

3 more replies

memorylane9mo ago

Dunno about you, but I find thinking hard… when I offload boilerplate code to Claude, I have more cycles left over to hold the problem in my head and effectively direct the agent in detail.

mockingloris9mo ago

This makes sense. I find that after 15 to 20 iterations, I get better understanding of what is being done and possible simplifications.

I then manually declare some functions, JSDoc comments for the return types, imports and stop halfway. By then the agent is able to think, ha!, you plan to replace all the api calls to this composable under the so and so namespace.

It's iterations and context. I don't use them for everything but I find that they help when my brain bandwidth begins to lag or I just need a boilerplate code before engineering specific use cases.

└── Dey well

candiddevmike9mo ago

Software "engineering" at it's finest

dogcomplex9mo ago

lol yep we've never had codebases hacked together by juniors before running major companies in production - nope, never

varispeed9mo ago

I think you are over estimating the quality of code humans generate. I take LLM over any output of junior - to mid level developer (if they were given the same prompt / ask)

ebiester9mo ago

I disagree from almost the first sentence:

> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.

Learning how to use LLMs in a coding workflow is trivial to start, but you find you get a bad taste early if you don't learn how to adapt both your workflow and its workflow. It is easy to get a trivially good result and then be disappointed in the followup. It is easy to try to start on something it's not good at and think it's worthless.

The pure dismissal of cursor, for example, means that the author didn't learn how to work with it. Now, it's certainly limited and some people just prefer Claude code. I'm not saying that's unfair. However, it requires a process adaptation.

mkozlows9mo ago

"There's no learning curve" just means this guy didn't get very far up, which is definitely backed up by thinking that Copilot and other tools are all basically the same.

rustybolt9mo ago

> "There's no learning curve" just means this guy didn't get very far up

Not everyone with a different opinion is dumber than you.

3 more replies

leptons9mo ago

Basically, they are the same, they are all LLMs. They all have similar limitations. They all produce "hallucinations". They can also sometimes be useful. And they are all way overhyped.

1 more reply

deadbabe9mo ago

If it’s not trivial, it’s worthless, because writing things out manually yourself is usually trivial, but tedious.

With LLMs, the point is to eliminate tedious work in a trivial way. If it’s tedious to get an LLM to do tedious work, you have not accomplished anything.

If the work is not trivial enough for you to do yourself, then using an LLM will probably be a disaster, as you will not be able to judge the final output yourself without spending nearly the same amount of time it takes for you to develop the code on your own. So again, nothing is gained, only the illusion of gain.

The reason people think they are more productive using LLMs to tackle non-trivial problems is because LLMs are pretty good at producing “office theatre”. You look like you’re busy more often because you are in a tight feedback loop of prompting and reading LLM output, vs staring off into space thinking deeply about a problem and occasionally scribbling or typing something out.

ebiester9mo ago

So, I'd like you to talk to a fair number of emacs and vim users. They have spent hours and hours learning their tools, tweaking their configurations, and learning efficiencies. They adapt their tool to them and themselves to the tool.

We are learning that this is not going to be magic. There are some cases where it shines. If I spend the time, I can put out prototypes that are magic and I can test with users in a fraction of the time. That doesn't mean I can use that for production.

I can try three or four things during a meeting where I am generally paying attention, and look afterwards to see if it's pursuing.

I can have it work through drudgery if I provide it an example. I can have it propose a solution to a problem that is escaping me, and I can use it as a conversational partner for the best rubber duck I've ever seen.

But I'm adapting myself to the tool and I'm adapting the tool to me through learning how to prompt and how to develop guardrails.

Outside of coding, I can write chicken scratch and provide an example of what I want, and have it write a proposal for a PRD. I can have it break down a task, generate a list of proposed tickets, and after I've went through them have it generate them in jira (or anything else with an API). But the more I invest into learning how to use the tool, the less I have to clean up after.

Maybe one day in the future it will be better. However, the time invested into the tool means that 40 bucks of investment (20 into cursor, 20 into gpt) can add 10-15% boost in productivity. Putting 200 into claude might get you another 10% and it can get you 75% in greenfield and prototyping work. I bet that agency work can be sped up as much as 40% for that 200 bucks investment into claude.

That's a pretty good ROI.

And maybe some workloads can do even better. I haven't seen it yet but some people are further ahead than me.

1 more reply

donperignon9mo ago

LLM’s are basically glorified slot machines. Some people try very hard to come up with techniques or theories about when the slot machine is hot, it’s only an illusion, let me tell you, it’s random and arbitrary, maybe today is your lucky day maybe not. Same with AI, learning the “skill” is as difficult as learning how to google or how to check stackoverflow, trivial. All the rest is luck and how many coins do you have in your pocket.

mikeshi429mo ago

There's plenty of evidence that good prompts (prompt engineering, tuning) can result in better outputs.

Improving LLM output through better inputs is neither an illusion, nor as easy as learning how to google (entire companies are being built around improving llm outputs and measuring that improvement)

Palmik9mo ago

Sure, but tricks & techniques that work with one model often don't translate or are actively harmful with others. Especially when you compare models from today and 6 or more months ago.

Keep in mind that the first reasoning model (o1) was released less than 8 months ago and Claude Code was released less than 6 months ago.

1 more reply

gloomyday9mo ago

This is not a good analogy. The parameters of slot machines can be changed to make the casino lose money. Just because something is random, doesn't mean it is useless. If you get 7 good outputs out of 10 from an LLM, you can still use it for your benefit. The frequency of good outputs and how much babysitting it requires determine whether it is worth using or not. Humans make mistakes too, although way less often.

donperignon9mo ago

I didn’t say it’s useless.

simonw9mo ago

Learning how to Google is not trivial.

mark_l_watson9mo ago

So true! About ten years ago Peter Norvig recommended the short Google online course on how to use Google Search: amazing how much one hour of structured learning permanently improved my search skills.

I have used neural networks since the 1980s, and modern LLM tech simply makes me happy, but there are strong limits to what I will use the current tech for.

donperignon9mo ago

Do you have an entry in your CV saying: proficiency in googling? It difficult not because it is complex, it difficult because Google want it to be opaque and as harder as possible to figure out.

1 more reply

jstummbillig9mo ago

We know what random* looks like: a coin toss, the roll of a die. Token generation is neither.

globular-toast9mo ago

Neither are slot machines. But there is a random element and that is more than enough to keep people hooked.

Pseudo-random number generators remain one of the most amazing things in computing IMO. Knuth volume 2. One of my favourite books.

simonw9mo ago

Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. [...]

LLMs will always suck at writing code that has not be written millions of times before. As soon as you venture slightly offroad, they falter.

That right there is your learning curve! Getting LLMs to write code that's not heavily represented in their training data takes experience and skill and isn't obvious to learn.

skydhash9mo ago

If you have a big rock (a software project), there's quite a difference between pushing it uphill (LLM usage) and hauling it up with a winch (traditional tooling and methods).

People are claiming that it takes time to build the muscles and train the correct footing to push, while I'm here learning mechanical theory and drawing up levers. If one managed to push the rock for one meter, he comes clamoring, ignoring the many who was injured by doing so, saying that one day he will be able to pick the rock up and throw it at the moon.

noidesto9mo ago

Then there are those who are augmenting their winch with LLM usage.

simonw9mo ago

I'd describe LLM usage as the winch and LLM avoidance as insisting on pushing it up hill without one.

1 more reply

donperignon9mo ago

I’m still waiting that someone claiming how prompting is such an skill to learn, explain just once a single technique that is not obvious, like: storing checkpoint to go back to working version (already a good practice without using Llm see:git) or launch 10 tabs with slightly different prompts and choose the best, or ask the Llm to improve my prompt, or adding more context … is that an skill? I remember when I was a child that my mom thought that programming a vcr to record the night show to be such a feat…

keeda9mo ago

In my experience, it's not just prompting that needs to be figured out, it's a whole new workstyle that works for you, your technologies and even your current project. As an example, I write almost all my code functional-programming style, which I rarely did before. This lets me keep my prompts and context very focused and it essentially elminates hallucinations.

Also I started in the pre-agents era and so I ended up with a pair-programming paradigm. Now everytime I conceptualize a new task in my head -- whether it is a few lines of data wrangling within a function, or generating an entire feature complete with integration tests -- I instinctively do a quick prompt-vs-manual coding evaluation and seamlessly jump to AI code generation if the prompt "feels" more promising in terms of total time and probability of correctness.

I think one of the skills is learning this kind of continuous evaluation and the judgement that goes with it.

simonw9mo ago

See my comment here about designing environments for coding agents to operate in: https://news.ycombinator.com/item?id=44854680

Effective LLM usage these days is about a lot more than just the prompts.

AndyNemmity9mo ago

You may not consider it a skill, but I train multiple programming agents on different production and quality code bases, and have all of them pr review a change, with a report given at the end.

it helps dramatically on finding bugs and issues. perhaps that's trivial to you, but it feels novel as we've only had effective agents in the last couple weeks.

1 more reply

kodisha9mo ago

LLM driven coding can yield awesome results, but you will be typing a lot and, as article states, requires already well structured codebase.

I recently started with fresh project, and until I got to the desired structure I only used AI to ask questions or suggestions. I organized and written most of the code.

Once it started to get into the shape that felt semi-permanent to me, I started a lot of queries like:

```

- Look at existing service X at folder services/x

- see how I deploy the service using k8s/services/x

- see how the docker file for service X looks like at services/x/Dockerfile

- now, I started service Y that does [this and that]

- create all that is needed for service Y to be skaffolded and deployed, follow the same pattern as service X

```

And it would go, read existing stuff for X, then generate all of the deployment/monitoring/readme/docker/k8s/helm/skaffold for Y

With zero to none mistakes. Both claude and gemini are more than capable to do such task. I had both of them generate 10-15 files with no errors, with code being able to be deployed right after (of course service will just answer and not do much more than that)

Then, I will take over again for a bit, do some business logic specific to Y, then again leverage AI to fill in missing bits, review, suggest stuff etc.

It might look slow, but it actually cuts most boring and most error prone steps when developing medium to large k8s backed project.

manmal9mo ago

My workflow with a medium sized iOS codebase is a bit like that. By the time everything works and is up to my standards, I‘ve usually taken longer, or almost as long, as if I‘d written everything manually. That’s with Opus-only Claude Code. It’s complicated stuff (structured concurrency and lots of custom AsyncSequence operators) which maybe CC just isn‘t suitable for.

Whipping up greenfield projects is almost magical, of course. But that’s not most of my work.

randfish9mo ago

Deeply curious to know if this is an outlier opinion, a mainstream but pessimistic one, or the general consensus. My LinkedIn feed and personal network certainly suggests that it's an outlier, but I wonder if the people around me are overly optimistic or out of synch with what the HN community is experiencing more broadly.

MobiusHorizons9mo ago

My impression has been that in corporate settings (and I would include LinkedIn in that) AI optimism is basically used as virtue signaling, making it very hard to distinguish people who are actually excited about the tech from people wanting to be accepted.

My personal experience has been that AI has trouble keeping the scope of the change small and targeted. I have only been using Gemini 2.5 pro though, as we don’t have access to other models at my work. My friend tells me he uses Claud for coding and Gemini for documentation.

bGl2YW5j9mo ago

I reckon this opinion is more prevalent than the hyped blog posts and news stories suggest; I've been asking this exact question of colleagues and most share the sentiment, myself included, albeit not as pessimistic.

Most people I've seen espousing LLMs and agentic workflows as a silver bullet have limited experience with the frameworks and languages they use with these workflows.

My view currently is one of cautious optimism; that LLM workflows will get to a more stable point whereby they ARE close to what the hype suggests. For now, that quote that "LLMs raise the floor, not the ceiling" I think is very apt.

LinkedIn is full of BS posturing, ignore it.

WD-429mo ago

I think it’s pretty common among people whose job it is to provide working, production software.

If you go by MBA types on LinkedIn that aren’t really developers or haven’t been in a long time, now they can vibe out some react components or a python script so it’s a revolution.

danielbln9mo ago

Hi, my job is building working production software (these days heavily LLM assisted). The author of the article doesn't know what they're talking about.

1 more reply

Terretta9mo ago

Which part of the opinion?

I tend to strongly agree with the "unpopular opinion" about the IDEs mentioned versus CLI (specifically, aider.chat and Claude Code).

Assuming (this is key) you have mastery of the language and framework you're using, working with the CLI tool in 25 year old XP practices is an incredible accelerant.

Caveats:

- You absolutely must bring taste and critical thinking, as the LLM has neither.

- You absolutely must bring systems thinking, as it cannot keep deep weirdness "in mind". By this I mean the second and third order things that "gotcha" about how things ought to work but don't.

- Finally, you should package up everything new about your language or frameworks since a few months or year before the knowledge cutoff date, and include a condensed synthesis in your context (e.g., Swift 6 and 6.1 versus the 5.10 and 2024's WWDC announcements that are all GPT-5 knows).

For this last one I find it useful to (a) use OpenAI's "Deep Research" to first whitepaper the gaps, then another pass to turn that into a Markdown context prompt, and finally bring that over to your LLM tooling to include as needed when doing a spec or in architect mode. Similarly, (b) use repomap tools on dependencies if creating new code that leverages those dependencies, and have that in context for that work.

I'm confused why these two obvious steps aren't built into leading agentic tools, but maybe handling the LLM as a naive and outdated "Rain Man" type doesn't figure into mental models at most KoolAid-drinking "AI" startups, or maybe vibecoders don't care, so it's just not a priority.

Either way, context based development beats Leroy Jenkins.

throwdbaaway9mo ago

> use repomap tools on dependencies if creating new code that leverages those dependencies, and have that in context for that work.

It seems to me that currently there are 2 schools of thought:

1. Use repomap and/or LSP to help the models navigate the code base

2. Let the models figure things out with grep

Personally, I am 100% a grep guy, and my editor doesn't even have LSP enabled. So, it is very interesting to see how many of these agentic tools do exactly the same thing.

And Claude Code /init is a great feature that basically writes down the current mental model after the initial round of grep.

1 more reply

procaryote9mo ago

Linkedin posts seems like an awful source. The people I see posting for themselves there are either pre-successful or just very fond of personal branding

sensanaty9mo ago

Speaking to actual humans IRL (as in, non-management colleagues and friends in the field), people are pretty lukewarm on AI, with a decent chunk of them who find AI tooling makes them less productive. I know a handful of people who are generally very bullish on AI, but even they are nowhere near the breathless praise and hype you read about here and on LinkedIn, they're much more measured about it and approach it with what I would classify as common sense. Of course this is entirely anecdotal, and probably depends where you are and what kind of business you're in, though I will say I'm in a field where AI even makes some amount of sense (customer support software), and even then I'm definitely noticing a trend of disillusionment.

On the management side, however, we have all sorts of AI mandates, workshops, social media posts hyping our AI stuff, our whole "product vision" is some AI-hallucinated nightmare that nobody understands, you'd genuinely think we've been doing nothing but AI for the last decade the way we're contorting ourselves to shove "AI" into every single corner of the product. Every day I see our CxOs posting on LinkedIn about the random topic-of-the-hour regarding AI. When GPT-5 launched, it was like clockwork, "How We're Using GPT-5 At $COMPANY To Solve Problems We've Never Solved Before!" mere minutes after it was released (we did not have early access to it lol). Hilarious in retrospect, considering what a joke the launch was like with the hallucinated graphs and hilarious errors like in the Bernoulli's Principle slide.

Despite all the mandates and mandatory shoves coming from management, I've noticed the teams I'm close with (my team included) are starting to push back themselves a bit. They're getting rid of the spam generating PR bots that have never, not once, provided a useful PR comment. People are asking for the various subscriptions they were granted be revoked because they're not using them and it's a waste of money. Our own customers #1 piece of feedback is to focus less on stupid AI shit nobody ever asked for, and to instead improve the core product (duh). I'm even seeing our CTO who was fanboy number 1 start dialing it back a bit and relenting.

It's good to keep in mind that HN is primarily an advertisement platform for YC and their startups. If you check YC's recent batches, you would think that the 1 and only technology that exists in the world is AI, every single one of them mentions AI in one way or another. The majority of them are the lowest effort shit imaginable that just wraps some AI APIs and is calling it a product. There is a LOT of money riding on this hype wave, so there's also a lot of people with vested interests in making it seem like these systems work flawlessly. The less said about LinkedIn the better, that site is the epitome of the dead internet theory.

Palmik9mo ago

People that comment on and get defensive about this bit:

> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.

How much of your workflow or intuition from 6 months ago is still relevant today? How long would it take to learn the relevant bits today?

Keep in mind that Claude Code was released less than 6 months ago.

pyb9mo ago

A fraction of the LLM maximalists are being defensive, because they don't want to consider that they've maybe invested too much time in those tools ; considering what said tools are currently genuinely good at.

simonw9mo ago

Pretty much all of the intuition I've picked up about getting good results from LLMs has stayed relevant.

If I was starting from fresh today I expect it would take me months of experimentation to get back to where I am now.

Working thoughtfully with LLMs has also helped me avoid a lot of the junk tips ("Always start with 'you are the greatest world expert in X', offer to tip it, ...") that are floating around out there.

Palmik9mo ago

All of the intuition? Definitely not my experience. I have found that optimal prompting differs significantly between models, especially when you look at models that are 6months old or older (the first reasoning model, o1, is less than 8 months old).

Speaking mostly from experience of building automated, dynamic data processing workflows that utilize LLMs:

Things that work with one model, might hurt performance or be useless with another.

Many tricks that used to be necessary in the past are no longer relevant, or only applicable for weaker models.

This isn't me dimissing anyone's experience. It's ok to do things that become obsolete fairly quickly, especially if you derive some value from it. If you try to stay on top of a fast moving field, it's almost inevitable. I would not consider it a waste of time.

AndyNemmity9mo ago

Hell, my workflow isn't the same two weeks ago when subagents were released.

jamboca9mo ago

Have built many pipelines integrating LLMs to drive real $ results. I think this article boils it down too simply. But i always remember, if the LLM is the most interesting part of your work, something is severely wrong and you probably aren’t adding much value. Context management based on some aspects of your input is where LLMs get good, but you need to do lots of experimentation to tune something. Most cases i have seen are about developing one pipeline to fit 100s of extremely different cases; LLM does not solve this problem but basically serves as an approximator for you to discretize previously large problems in to some information sub space where you can treat the infinite set of inputs as something you know. LLMs are like a lasso (and a better/worse one than traditional lassos depending on use case) but once you get your catch you still need to process it, deal with it progammatically to solve some greater problem. I hate how so many LLM related articles/comments say “ai is useless throw it away dont use it” or “ai is the future if we dont do it now we’re doomed lets integrate it everywhere it can solve all our problems” like can anyone pick a happy medium? Maybe thats what being in a bubble looks like

spenrose9mo ago

So many articles should prepend “My experience with ...” to their title. Here is OP's first sentence: “I spent the past ~4 weeks trying out all the new and fancy AI tools for software development.” Dude, you have had some experiences and they are worth writing up and sharing. But your experiences are not a stand-in for "the current state." This point applies to a significant fraction of HN articles, to the point that I wish the headlines were flagged “blog”.

mettamage9mo ago

Clickbait gets more reach. It's an unfortunate thing. I remember Veritasium in a video even saying something along the lines of him feeling forced to do clickbaity YouTube because it works so well.

The reach is big enough to not care about our feelings. I wish it wasn't this way.

hiAndrewQuinn9mo ago

>I made a CLI logs viewers and querier for my job, which is very useful but would have taken me a few days to write (~3k LoC)

I recall The Mythical Man-Month stating a rough calculation that the average software developer writes about 10 net lines of new, production-ready code per day. For a tool like this going up an order of magnitude to about 100 lines of pretty good internal tooling seems reasonable.

OP sounds a few cuts above the 'average' software developer in terms of skill level. But here we also need to point out a CLI log viewer and querier is not the kind of thing you actually needed to be a top tier developer to crank out even in the pre-LLM era, unless you were going for lnav [1] levels of polish.

[1]: https://lnav.org/

JimDabell9mo ago

A lot of the Mythical Man-Month is timeless, but for a stat like that, it really is worth bearing in mind the book was written half a century ago about developers working on 1970s mainframes.

myhf9mo ago

Yeah, I think that metric has grown to about 20 lines per day using 2010s-era languages and methods. So maybe we could think of LLM usage as an attempt to bring it back down to 10 per day.

nvbalaji9mo ago

>>You can safely ignore them if they don’t fit your workflows at the moment

I would rather qualify this statement a bit more - I would say "you can safely ignore if you are not building anything green field or build tools for self". In my experiments in the last one month or so, it is very efficient for building new components (small & medium). Making it efficient for the existing code base is a bit more tricky - you need to make sure it adheres to the way things are coded already, not to leak .env contents to LLMs, building a context from the existing components so that it does not read code every time (leading to cost and time escalations) and so on.

My main issue so far has been understanding the code that is generated. As of now that is the biggest bottleneck in increasing the productivity - i.e it takes a long time to review the code and push. In usual workflow of building, normally by the time the code complexity has increased in the system I would have sufficient mental construction to handle that complexity. I would know the inner workings of code. However if AI generates large piece of code getting into that code is taking a long time

dezmou9mo ago

OP did miss the vscode extension for claude code, it is still terminal based but: - it show you the diff of the incoming changes in vscode ( like git ) - it know the line you selected in the editor for context

mark_l_watson9mo ago

Interesting read, but strange to totally ignore the macOS ChatGPT app that optionally integrates with a terminal session, the currently opened VSCode editor tab, XCode. etc. I use this combination at least 2 or 3 times a month, and even if my monthly use is less that 40 minutes total, it is a really good tool to have in your toolbelt.

The other thing I disagree with is the coverage of gemnini-cli: if you use gemini-cli for a single long work session, then you must set your Google API key as an environment variable when starting gemini-cli, otherwise you end up after a short while using Gemini-2.5-flash, and that leads to unhappy results. So, use gemini-cli for free for short and focused 3 or 4 minute work sessions and you are good, or pay for longer work sessions, and you are good.

I do have a random off topic comment: I just don’t get it: why do people live all day in an LLM-infused coding environment? LLM based tooling is great, but I view it as something I reach for a few times a day for coding and that feels just right. Separately, for non-coding tasks, reaching for LLM chat environments for research and brainstorming is helpful, but who really needs to do that more than once or twice a day?

itsalotoffun9mo ago

I think we're still in the gray zone of the "Incessant Obsolescence Postulate" (the Wait Calculation). Are you better off "skilling up" on the tech as it is today, or waiting for it to just "get better" so by the time you kick off, you benefit from the solved-problems X years from now. I also think this calculation differs by domain, skill level, and your "soft skill" abilities to communicate, explain and teach. In some domains, if you're not already on this train, you won't even get hired anymore.

The current state of LLM-driven development is already several steps down the path of an end-game where the overwhelming majority of code is written by the machine; our entire HCI for "building" is going to be so far different to how we do it now that we'll look back at the "hand-rolling code era" in a similar way to how we view programming by punch-cards today. The failure modes, the "but it SUCKS for my domain", the "it's a slot machine" etc etc are not-even-wrong. They're intermediate states except where they're not.

The exceptions to this end-game will be legion and exist only to prove the end-game rule.

fnordsensei9mo ago

> By being particularly bad at anything outside of the most popular languages and frameworks, LLMs force you to pick a very mainstream stack if you want to be efficient.

Do they? I’ve found Clojure-MCP[1] to be very useful. OTOH, I’m not attempting to replace myself, only augment myself.

1: https://github.com/bhauman/clojure-mcp

mark_l_watson9mo ago

Thanks for the link! I used to use Clojure a lot professionally, but now just for fun projects, and to occasionally update my old Clojure book. I had bookmarked Clojure-MCP a while ago, but never got back to it but I will give it a try.

I like your phrasing of “OTOH, I’m not attempting to replace myself, only augment myself.” because that is my personal philosophy also.

eric-burel9mo ago

Good read. I just want to pinpoint that LLMs seems to write better React code, but as an experienced frontend developers my opinion is that it's also bad at React. Its approach is outdated as it doesn't follow the latest guidelines. It writes React as I would have written it in 2020. So as usual, you need to feed the right context to get proper results.

OldfieldFund9mo ago

I don't agree. Cursor is mind-blowingly good with the new agentic updates.

stephc_int139mo ago

I have not tried every IDE/CLI or models, only a few, mostly Claude and Qwen.

I work mostly in C/C++.

The most valuable improvement of using this kind of tools, for me, is to easily find help when I have to work on boring/tedious tasks or when I want to have a Socratic conversation about a design idea with a not-so-smart but extremely knowledgeable colleague.

But for anything requiring a brain, it is almost useless.

softwaredoug9mo ago

I find all AI coding goes something like this algorithm

* I let the AI do something

* I find bad bug or horrifying code

* I realize I have it too much slack

* hand code for a while

* go back to narrow prompts

* get lazy, review code a bit less add more complexity

* GOTO 1, hopefully with a better instinct for where/how to trust this model

Then over time you hone your instinct on what to delegate and what to handle yourself. And how deeply to pay attention.

d_silin9mo ago

Relying on LLM for any skill, especially programming, is like cutting your own healthy legs and buying crutches to walk. Plus you now have to pay $49/month for basic walking ability and $99/month for "Walk+" plan, where you can also (clumsily) jog.

aeonik9mo ago

It's more like strapping on a exoskeleton suit with a jetpack.

It makes your existing strength and mobility greater, but don't be surprised if you fly into space that you will suffocate,

or if you fly over an ocean and run out gas, that you'll sink to the bottom,

or if you fly the suit in your fine glassware shop with patrons in the store, that your going to break and burn everything/everyone in there.

derektank9mo ago

There are a lot of skills which I haven't developed because I rely on external machines to handle it for me; memorization, fire-starting, navigation. On net, my life is better for it. LLMs may or may not be as effective at replacing code development as books have been at replacing memorization and GPS has been at replacing navigation, but eventually some tool will be and I don't think I'll be worse off for developing other skills.

d_silin9mo ago

GPS is particularly good analog... Lose it for any reason and suddenly you are helpless without backup navigation aids. But compass, paper map, watch and sextant will still work!

candiddevmike9mo ago

Why would I pay you to walk with crutches when I can just get crutches and walk myself?

reitanuki9mo ago

I would actually disagree with the final conclusion here; despite claiming to offer the same models, Copilot seems very much nerfed — cross-comparing the Copilotified LLM and the same LLM through OpenRouter, the Copilot one seems to fail much harder. I'm not an expert in the details of LLMs but I guess there might be some extra system prompt, I also notice the context window limit is much lower, which kinda suggests it's been partially pre-consumed.

In case it matters, I was using Copilot that is for 'free' because my dayjob is open source, and the model was Claude Sonnet 3.7. I've not yet heard anyone else saying the same as me which is kind of peculiar.

bachmeier9mo ago

> By being particularly bad at anything outside of the most popular languages and frameworks, LLMs force you to pick a very mainstream stack if you want to be efficient.

I haven't found that to be true with my most recent usage of AI. I do a lot of programming in D, which is not popular like Python or Javascript, but Copilot knows it well enough to help me with things like templates, metaprogramming, and interoperating with GCC-produced DLL's on Windows. This is true in spite of the lack of a big pile of training data for these tasks. Importantly, it gets just enough things wrong when I ask it to write code for me that I have to understand everything well enough to debug it.

knlam9mo ago

Opening the essay with ~~Learning how to use LLMs in a coding workflow is trivial.~~ and closing with suggestion ~~ Copilot ~~ for AI agent is the worst take of LLM coding I ever saw

sudhirb9mo ago

I have a biased opinion since I work for a background agent startup currently - but there are more (and better!) out there than Jules and Copilot that might address some of the author's issues.

troupo9mo ago

And those mythical better tools tools that you didn't even bother to mention are?

Palmik9mo ago

Presumably if they did, they would be accused of promoting their startup :)

1 more reply

singularity20019mo ago

"LLMs won’t magically make you deliver production-ready code"

Either I'm extremely lucky or I was lucky to find the guy who said it must all be test driven and guided by the usual principles of DRY etc. Claude Code works absolutely fantastically nine out of 10 times and when it doesn't we just roll back the three hours of nonsense it did postpone this feature or give it extra guidance.

simonw9mo ago

I'm beginning to suspect robust automated tests may be one of the single strongest indicators for if you're going to have a good time with LLM coding agents or not.

If there's a test suite for the thing to run it's SO much less likely to break other features when it's working. Plus it can read the tests and use them to get a good idea about how everything is supposed to work already.

Telling Claude to write the test first, then execute it and watch it fail, then write the implementation has been giving me really great results.

mcprwklzpq9mo ago

Does not mention the actual open source solution that has autocomplete, chat, planer and agents, lets you bring your own keys, connect to any llm provider, customize anything, rewrite all the prompts and tools.

https://github.com/continuedev/continue

kketch9mo ago

This article makes me wanna try building a token field in Flutter using a LLM chat or agent. Chat should be enough. A few iterations to get the behaviour and the tests right. A bit of style to make it look Apple-nice. As if a regular dev would do much better/quicker for this use case, such a bad example imo I don't buy it

joshuamoyers9mo ago

> By being particularly bad at anything outside of the most popular languages and frameworks, LLMs force you to pick a very mainstream stack if you want to be efficient.

Almost like hiring and scaling a team? There are also benchmarks that specifically measure this, and its in theory a very temporary problem (Aider Polyglot Benchmark is one such).

bubblebeard9mo ago

Strange post. It reads in part like an incoherent rant and in part as a well made analysis.

It’s mostly on point though. Although, in recent years I’ve been assigned to manage and plan projects at work, and the skills I’ve learnt from that greatly help to get effective results from an LLM I think.

infoseek129mo ago

There are kind of a lot of errors in this piece. For instance, the problem the author had with Gemini CLI running out of tokens in ten minutes is what happens when you don’t set up (a free) API key in your environment.

Vektorceraptor9mo ago

I agree. I had a similar experience.

https://speculumx.at/pages/read_post.html?post=59

MitziMoto9mo ago

My favorite setup so far is using the Claude code extension in VScode. All the power of CC, but it opens files and diffs in VScode. Easy to read and modify as needed.

philipwhiuk9mo ago

There’s an IntelliJ extension for GitHub CoPilot.

It’s not perfect but it’s okay.

Tainnor9mo ago

It may not be perfect, but IntelliJ beats VS Code on so many other levels that I don't understand why everyone keeps creating clones of the latter.

joks9mo ago

Yeah for my uses it works fine. Not sure why OP thinks Copilot Chat doesn't exist anywhere but VSCode...

yogthos9mo ago

Personally, I’ve had a pretty positive experience with the coding assistants, but I had to spend some time to develop intuition for the types of tasks they’re likely to do well. I would not say that this was trivial to do.

Like if you need to crap out a UI based on a JSON payload, make a service call, add a server endpoint, LLMs will typically do this correctly in one shot. These are common operations that are easily extrapolated from their training data. Where they tend to fail are tasks like business logic which have specific requirements that aren’t easily generalized.

I’ve also found that writing the scaffolding for the code yourself really helps focus the agent. I’ll typically add stubs for the functions I want, and create overall code structure, then have the agent fill the blanks. I’ve found this is a really effective approach for preventing the agent from going off into the weeds.

I also find that if it doesn’t get things right on the first shot, the chances are it’s not going to fix the underlying problems. It tends to just add kludges on top to address the problems you tell it about. If it didn’t get it mostly right at the start, then it’s better to just do it yourself.

All that said, I find enjoyment is an important aspect as well and shouldn’t be dismissed. If you’re less productive, but you enjoy the process more, then I see that as a net positive. If all LLMs accomplish is to make development more fun, that’s a good thing.

I also find that there's use for both terminal based tools and IDEs. The terminal REPL is great for initially sketching things out, but IDE based tooling makes it much easier to apply selective changes exactly where you want.

As a side note, got curious and asked GLM-4.5 to make a token field widget with React, and it did it in one shot.

It's also strange not to mention DeepSeek and GLM as options given that they cost orders of magnitude less per token than Claude or Gemini.

revskill9mo ago

"If an(y) LLM could operate on your codebase without much critical issues, then your architecture is sound" - revskill

stopachka9mo ago

> By being particularly bad at anything outside of the most popular languages and frameworks, LLMs force you to pick a very mainstream stack if you want to be efficient.

I use clojure for my day-to-day work, and I haven't found this to be true. Opus and GPT-5 are great friends when you start pushing limits on Clojure and the JVM.

> Or 4.1 Opus if you are a millionaire and want to pollute as much possible

I know this was written tongue-in-cheek, but at least in my opinion it's worth it to use the best model if you can. Opus is definitely better on harder programming problems.

> GPT 4.1 and 5 are mostly bad, but are very good at following strict guidelines.

This was interesting. At least in my experience GPT-5 seemed about as good as Opus. I found it to be _less_ good at following strict guidelines though. In one test Opus avoided a bug by strictly following the rules, while GPT-5 missed.

dwheeler9mo ago

> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.

I'm sprry, but I disagree with this claim. That is not my experience, nor many others. It's true that you can make them do something without learning anything. However, it takes time to learn what they are good amd bad at, what information they need, and what nonsense they'll do without express guidance. It also takes time to know what to look for when reviewing results.

I also find that they work fine for languages without static types. You need need tests, yes, but you need them anyway.

gkuhl219mo ago

The following is half serious. Please enjoy.

Some comments here are reminiscent of antiquated discourse: "how many angels dance on the head of a pin?"

We somehow are trying to agree on some factual ramp-up time required for a dev to become competent coding with LLM's. This is inherently subjective! Why bother?

Perhaps certain LLMs are blessed with disproportionately more angels (nee "bugs") in the machines.

I enjoyed reading the article:

"The model looks good, but Google’s enshittification has won and it looks like no competent software developers are left. I would know, many of my friends work there."

Yikes!

Credit to the author for having the courage to post publically.

abrookewood9mo ago

"Google’s enshittification has won and it looks like no competent software developers are left. I would know, many of my friends work there". Ouch ... I hope his friends are in marketing!

dash29mo ago

They missed OpenAI Codex, maybe deliberately? It's less llm-development and more vibe-coding, or maybe "being a PHB of robots". I'm enjoying it for my side project this week.

Mystery-Machine9mo ago

> Claude 4 Sonnet > Or 4.1 Opus if you are a millionaire and want to pollute as much possible

That was an unnecessary guilt-shaming remark.

itsalotoffun9mo ago

Yeah, this moralizing is like side-eyeing your fellow soldiers for killing "too much" because your level of killing is fine.

weeksie9mo ago

Yet another developer who is too full of themselves to admit that they have no idea how to use LLMs for development. There's an arrogance that can set in when you get to be more senior and unless you're capable of force feeding yourself a bit of humility you'll end up missing big, important changes in your field.

It becomes farcical when not only are you missing the big thing but you're also proud of your ignorance and this guy is both.

ontigola9mo ago

I think that beyond the language used, the article does have some points I agree with. In general, LLMs code better in languages that are more easily available online, where they can be trained on a larger amount of source code. Python is not the same as PL/I (I don't know if you've tried it, but with the latter, they don't know the most basic conventions used in its development).

When it is mentioned that LLMs "have terrible code organization skills", I think they are referring mainly to the size of the context. It is not the same to develop a module with hundreds of LoCs, one with thousands or one with tens of thousands of LoCs.

I am not very much in favor of skill degradation; I am not aware of a study that validates it in this regard. On the other hand, it is true that agents are constantly evolving, and I don't see any difficulties that cannot be overcome with the current evolutionary race, given that, in the end, coding is one of the most accessible functions for artificial intelligence.

SadErn9mo ago

It's all about the Kilo Code extension.

j / k navigate · click thread line to collapse

233 comments

tptacek9mo ago

Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.

I have never heard anybody successfully using LLMs say this before. Most of what I've learned from talking to people about their workflows is counterintuitive and subtle.

credit_guy9mo ago

physPop9mo ago

2 more replies

gexla9mo ago

1 more reply

ruszki9mo ago

> There is an initial honeymoon where the LLMs blow your mind out.

What does this even mean?

thefourthchime9mo ago

I'm glad you feel like you've nailed it. I've been using models to help me code for over two years, and I still feel like I have no idea what I'm doing.

throwawaybob4209mo ago

Months? That’s actually an insanely long time

otabdeveloper49mo ago

I dunno, man. I think you could have spent that time, you know, learning to code instead.

1 more reply

SkyPuncher9mo ago

> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.

Whenever I start working in a new code base, it takes a a non-trivial amount of time to ramp back up to full LLM productivity.

uvdn79mo ago

Is the non-trivial amount of time significantly less than you trying to ramp up yourself?

4 more replies

deadbabe9mo ago

He’s not wrong.

Getting 80% of the benefit of LLMs is trivial. You can ask it for some functions or to write a suite of unit tests and you’re done.

The last 20%, while possible to attain, is ultimately not worth it for the amount of time you spend in context hells. You can just do it yourself faster.

2 more replies

majormajor9mo ago

> Whenever I start working in a new code base, it takes a a non-trivial amount of time to ramp back up to full LLM productivity.

Do you find that these details translate between models? Sounds like it doesn't translate across codebases for you?

1 more reply

troupo9mo ago

> I have never heard anybody successfully using LLMs say this before. Most of what I've learned from talking to people about their workflows is counterintuitive and subtle.

Because for all our posturing about being skeptical and data driven we all believe in magic.

Those "counterintuitive non-trivial workflows"? They work about as well as just prompting "implement X" with no rules, agents.md, careful lists etc.

Because 1) literally no one actually measures whether magical incarnations work and 2) it's impossible to make such measurements due to non-determinism

simonw9mo ago

Either I've wasted significant chunks of the past ~3 years of my life or you're missing something here. Up to you to decide which you believe.

5 more replies

roxolotl9mo ago

1 more reply

prerok9mo ago

I agree with your assessment about this statement. I actually had to reread it a few times to actually understand it.

He is actually recommending Copilot for price/performance reasons and his closing statement is "Don’t fall for the hype, but also, they are genuinely powerful tools sometimes."

So, it just seems like he never really gave a try at how to engineer better prompts that these more advanced models can use.

rocqua9mo ago

The OPs point seems to be: it's very quick for LLMs to be a net benefit to your skills, if it is a benefit at all. That is, he's only speaking of the very beginning part of the learning curve.

edfletcher_t1379mo ago

enraged_camel9mo ago

hislaziness9mo ago

Would it be more appropriate to compare LLMs to Autotunes rather than pianos?

lordnacho9mo ago

I've said it before, I feel like I'm some sort of lottery winner when it comes to LLM usage.

bgwalter9mo ago

The blogging output on the other hand ...

FeepingCreature9mo ago

> In open source hardly anyone is even using LLMs and the ones that do have barely any output, In many cases less output than they had before using LLMs.

That is not what that paper said, lol.

1 more reply

stillpointlab9mo ago

I agree with you and I have seen this take a few times now in articles on HN, which amounts to the classic: "We've tried nothing and we're all out of ideas" Simpson's joke.

Not all problems yield well to LLM coding agents. Not all people will be able or willing to use them effectively.

But I guess "I gave it a try and it is not for me" is a much less interesting article compared to "I gave it a try and I have proved it is as terrible as you fear".

throwawaybob4209mo ago

thefourthchime9mo ago

It's not all or nothing here. These things are tools and should be used as such.

hn_throwaway_999mo ago

> It entirely depends on the exposure and reliability the code needs.

Obviously, to emphasize, this kind of thing happens all the time with human-generated code, but LLMs make the issue a lot worse because it lets you generate a ton of eventual mess so much faster.

3 more replies

memorylane9mo ago

Dunno about you, but I find thinking hard… when I offload boilerplate code to Claude, I have more cycles left over to hold the problem in my head and effectively direct the agent in detail.

mockingloris9mo ago

This makes sense. I find that after 15 to 20 iterations, I get better understanding of what is being done and possible simplifications.

It's iterations and context. I don't use them for everything but I find that they help when my brain bandwidth begins to lag or I just need a boilerplate code before engineering specific use cases.

└── Dey well

candiddevmike9mo ago

Software "engineering" at it's finest

dogcomplex9mo ago

lol yep we've never had codebases hacked together by juniors before running major companies in production - nope, never

varispeed9mo ago

I think you are over estimating the quality of code humans generate. I take LLM over any output of junior - to mid level developer (if they were given the same prompt / ask)

ebiester9mo ago

I disagree from almost the first sentence:

> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.

mkozlows9mo ago

"There's no learning curve" just means this guy didn't get very far up, which is definitely backed up by thinking that Copilot and other tools are all basically the same.

rustybolt9mo ago

> "There's no learning curve" just means this guy didn't get very far up

Not everyone with a different opinion is dumber than you.

3 more replies

leptons9mo ago

Basically, they are the same, they are all LLMs. They all have similar limitations. They all produce "hallucinations". They can also sometimes be useful. And they are all way overhyped.

1 more reply

deadbabe9mo ago

If it’s not trivial, it’s worthless, because writing things out manually yourself is usually trivial, but tedious.

With LLMs, the point is to eliminate tedious work in a trivial way. If it’s tedious to get an LLM to do tedious work, you have not accomplished anything.

ebiester9mo ago

I can try three or four things during a meeting where I am generally paying attention, and look afterwards to see if it's pursuing.

But I'm adapting myself to the tool and I'm adapting the tool to me through learning how to prompt and how to develop guardrails.

That's a pretty good ROI.

And maybe some workloads can do even better. I haven't seen it yet but some people are further ahead than me.

1 more reply

donperignon9mo ago

mikeshi429mo ago

There's plenty of evidence that good prompts (prompt engineering, tuning) can result in better outputs.

Improving LLM output through better inputs is neither an illusion, nor as easy as learning how to google (entire companies are being built around improving llm outputs and measuring that improvement)

Palmik9mo ago

Sure, but tricks & techniques that work with one model often don't translate or are actively harmful with others. Especially when you compare models from today and 6 or more months ago.

Keep in mind that the first reasoning model (o1) was released less than 8 months ago and Claude Code was released less than 6 months ago.

1 more reply

gloomyday9mo ago

donperignon9mo ago

I didn’t say it’s useless.

simonw9mo ago

Learning how to Google is not trivial.

mark_l_watson9mo ago

I have used neural networks since the 1980s, and modern LLM tech simply makes me happy, but there are strong limits to what I will use the current tech for.

donperignon9mo ago

Do you have an entry in your CV saying: proficiency in googling? It difficult not because it is complex, it difficult because Google want it to be opaque and as harder as possible to figure out.

1 more reply

jstummbillig9mo ago

We know what random* looks like: a coin toss, the roll of a die. Token generation is neither.

globular-toast9mo ago

Neither are slot machines. But there is a random element and that is more than enough to keep people hooked.

Pseudo-random number generators remain one of the most amazing things in computing IMO. Knuth volume 2. One of my favourite books.

simonw9mo ago

Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. [...]

LLMs will always suck at writing code that has not be written millions of times before. As soon as you venture slightly offroad, they falter.

That right there is your learning curve! Getting LLMs to write code that's not heavily represented in their training data takes experience and skill and isn't obvious to learn.

skydhash9mo ago

If you have a big rock (a software project), there's quite a difference between pushing it uphill (LLM usage) and hauling it up with a winch (traditional tooling and methods).

noidesto9mo ago

Then there are those who are augmenting their winch with LLM usage.

simonw9mo ago

I'd describe LLM usage as the winch and LLM avoidance as insisting on pushing it up hill without one.

1 more reply

donperignon9mo ago

keeda9mo ago

I think one of the skills is learning this kind of continuous evaluation and the judgement that goes with it.

simonw9mo ago

See my comment here about designing environments for coding agents to operate in: https://news.ycombinator.com/item?id=44854680

Effective LLM usage these days is about a lot more than just the prompts.

AndyNemmity9mo ago

You may not consider it a skill, but I train multiple programming agents on different production and quality code bases, and have all of them pr review a change, with a report given at the end.

it helps dramatically on finding bugs and issues. perhaps that's trivial to you, but it feels novel as we've only had effective agents in the last couple weeks.

1 more reply

kodisha9mo ago

LLM driven coding can yield awesome results, but you will be typing a lot and, as article states, requires already well structured codebase.

I recently started with fresh project, and until I got to the desired structure I only used AI to ask questions or suggestions. I organized and written most of the code.

Once it started to get into the shape that felt semi-permanent to me, I started a lot of queries like:

```

- Look at existing service X at folder services/x

- see how I deploy the service using k8s/services/x

- see how the docker file for service X looks like at services/x/Dockerfile

- now, I started service Y that does [this and that]

- create all that is needed for service Y to be skaffolded and deployed, follow the same pattern as service X

```

And it would go, read existing stuff for X, then generate all of the deployment/monitoring/readme/docker/k8s/helm/skaffold for Y

Then, I will take over again for a bit, do some business logic specific to Y, then again leverage AI to fill in missing bits, review, suggest stuff etc.

It might look slow, but it actually cuts most boring and most error prone steps when developing medium to large k8s backed project.

manmal9mo ago

Whipping up greenfield projects is almost magical, of course. But that’s not most of my work.

randfish9mo ago

MobiusHorizons9mo ago

bGl2YW5j9mo ago

Most people I've seen espousing LLMs and agentic workflows as a silver bullet have limited experience with the frameworks and languages they use with these workflows.

LinkedIn is full of BS posturing, ignore it.

WD-429mo ago

I think it’s pretty common among people whose job it is to provide working, production software.

If you go by MBA types on LinkedIn that aren’t really developers or haven’t been in a long time, now they can vibe out some react components or a python script so it’s a revolution.

danielbln9mo ago

Hi, my job is building working production software (these days heavily LLM assisted). The author of the article doesn't know what they're talking about.

1 more reply

Terretta9mo ago

Which part of the opinion?

I tend to strongly agree with the "unpopular opinion" about the IDEs mentioned versus CLI (specifically, aider.chat and Claude Code).

Assuming (this is key) you have mastery of the language and framework you're using, working with the CLI tool in 25 year old XP practices is an incredible accelerant.

Caveats:

- You absolutely must bring taste and critical thinking, as the LLM has neither.

- You absolutely must bring systems thinking, as it cannot keep deep weirdness "in mind". By this I mean the second and third order things that "gotcha" about how things ought to work but don't.

Either way, context based development beats Leroy Jenkins.

throwdbaaway9mo ago

> use repomap tools on dependencies if creating new code that leverages those dependencies, and have that in context for that work.

It seems to me that currently there are 2 schools of thought:

1. Use repomap and/or LSP to help the models navigate the code base

2. Let the models figure things out with grep

Personally, I am 100% a grep guy, and my editor doesn't even have LSP enabled. So, it is very interesting to see how many of these agentic tools do exactly the same thing.

And Claude Code /init is a great feature that basically writes down the current mental model after the initial round of grep.

1 more reply

procaryote9mo ago

Linkedin posts seems like an awful source. The people I see posting for themselves there are either pre-successful or just very fond of personal branding

sensanaty9mo ago

Palmik9mo ago

People that comment on and get defensive about this bit:

> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.

How much of your workflow or intuition from 6 months ago is still relevant today? How long would it take to learn the relevant bits today?

Keep in mind that Claude Code was released less than 6 months ago.

pyb9mo ago

simonw9mo ago

Pretty much all of the intuition I've picked up about getting good results from LLMs has stayed relevant.

If I was starting from fresh today I expect it would take me months of experimentation to get back to where I am now.

Working thoughtfully with LLMs has also helped me avoid a lot of the junk tips ("Always start with 'you are the greatest world expert in X', offer to tip it, ...") that are floating around out there.

Palmik9mo ago

Speaking mostly from experience of building automated, dynamic data processing workflows that utilize LLMs:

Things that work with one model, might hurt performance or be useless with another.

Many tricks that used to be necessary in the past are no longer relevant, or only applicable for weaker models.

AndyNemmity9mo ago

Hell, my workflow isn't the same two weeks ago when subagents were released.

jamboca9mo ago

spenrose9mo ago

mettamage9mo ago

Clickbait gets more reach. It's an unfortunate thing. I remember Veritasium in a video even saying something along the lines of him feeling forced to do clickbaity YouTube because it works so well.

The reach is big enough to not care about our feelings. I wish it wasn't this way.

hiAndrewQuinn9mo ago

>I made a CLI logs viewers and querier for my job, which is very useful but would have taken me a few days to write (~3k LoC)

[1]: https://lnav.org/

JimDabell9mo ago

A lot of the Mythical Man-Month is timeless, but for a stat like that, it really is worth bearing in mind the book was written half a century ago about developers working on 1970s mainframes.

myhf9mo ago

Yeah, I think that metric has grown to about 20 lines per day using 2010s-era languages and methods. So maybe we could think of LLM usage as an attempt to bring it back down to 10 per day.

nvbalaji9mo ago

>>You can safely ignore them if they don’t fit your workflows at the moment

dezmou9mo ago

mark_l_watson9mo ago

itsalotoffun9mo ago

The exceptions to this end-game will be legion and exist only to prove the end-game rule.

fnordsensei9mo ago

> By being particularly bad at anything outside of the most popular languages and frameworks, LLMs force you to pick a very mainstream stack if you want to be efficient.

Do they? I’ve found Clojure-MCP[1] to be very useful. OTOH, I’m not attempting to replace myself, only augment myself.

1: https://github.com/bhauman/clojure-mcp

mark_l_watson9mo ago

I like your phrasing of “OTOH, I’m not attempting to replace myself, only augment myself.” because that is my personal philosophy also.

eric-burel9mo ago

OldfieldFund9mo ago

I don't agree. Cursor is mind-blowingly good with the new agentic updates.

stephc_int139mo ago

I have not tried every IDE/CLI or models, only a few, mostly Claude and Qwen.

I work mostly in C/C++.

But for anything requiring a brain, it is almost useless.

softwaredoug9mo ago

I find all AI coding goes something like this algorithm

* I let the AI do something

* I find bad bug or horrifying code

* I realize I have it too much slack

* hand code for a while

* go back to narrow prompts

* get lazy, review code a bit less add more complexity

* GOTO 1, hopefully with a better instinct for where/how to trust this model

Then over time you hone your instinct on what to delegate and what to handle yourself. And how deeply to pay attention.

d_silin9mo ago

aeonik9mo ago

It's more like strapping on a exoskeleton suit with a jetpack.

It makes your existing strength and mobility greater, but don't be surprised if you fly into space that you will suffocate,

or if you fly over an ocean and run out gas, that you'll sink to the bottom,

or if you fly the suit in your fine glassware shop with patrons in the store, that your going to break and burn everything/everyone in there.

derektank9mo ago

d_silin9mo ago

GPS is particularly good analog... Lose it for any reason and suddenly you are helpless without backup navigation aids. But compass, paper map, watch and sextant will still work!

candiddevmike9mo ago

Why would I pay you to walk with crutches when I can just get crutches and walk myself?

reitanuki9mo ago

bachmeier9mo ago

> By being particularly bad at anything outside of the most popular languages and frameworks, LLMs force you to pick a very mainstream stack if you want to be efficient.

knlam9mo ago

Opening the essay with ~~Learning how to use LLMs in a coding workflow is trivial.~~ and closing with suggestion ~~ Copilot ~~ for AI agent is the worst take of LLM coding I ever saw

sudhirb9mo ago

I have a biased opinion since I work for a background agent startup currently - but there are more (and better!) out there than Jules and Copilot that might address some of the author's issues.

troupo9mo ago

And those mythical better tools tools that you didn't even bother to mention are?

Palmik9mo ago

Presumably if they did, they would be accused of promoting their startup :)

1 more reply

singularity20019mo ago

"LLMs won’t magically make you deliver production-ready code"

simonw9mo ago

I'm beginning to suspect robust automated tests may be one of the single strongest indicators for if you're going to have a good time with LLM coding agents or not.

Telling Claude to write the test first, then execute it and watch it fail, then write the implementation has been giving me really great results.

mcprwklzpq9mo ago

https://github.com/continuedev/continue

kketch9mo ago

joshuamoyers9mo ago

> By being particularly bad at anything outside of the most popular languages and frameworks, LLMs force you to pick a very mainstream stack if you want to be efficient.

Almost like hiring and scaling a team? There are also benchmarks that specifically measure this, and its in theory a very temporary problem (Aider Polyglot Benchmark is one such).

bubblebeard9mo ago

Strange post. It reads in part like an incoherent rant and in part as a well made analysis.

infoseek129mo ago

Vektorceraptor9mo ago

I agree. I had a similar experience.

https://speculumx.at/pages/read_post.html?post=59

MitziMoto9mo ago

My favorite setup so far is using the Claude code extension in VScode. All the power of CC, but it opens files and diffs in VScode. Easy to read and modify as needed.

philipwhiuk9mo ago

There’s an IntelliJ extension for GitHub CoPilot.

It’s not perfect but it’s okay.

Tainnor9mo ago

It may not be perfect, but IntelliJ beats VS Code on so many other levels that I don't understand why everyone keeps creating clones of the latter.

joks9mo ago

Yeah for my uses it works fine. Not sure why OP thinks Copilot Chat doesn't exist anywhere but VSCode...

yogthos9mo ago

As a side note, got curious and asked GLM-4.5 to make a token field widget with React, and it did it in one shot.

It's also strange not to mention DeepSeek and GLM as options given that they cost orders of magnitude less per token than Claude or Gemini.

revskill9mo ago

"If an(y) LLM could operate on your codebase without much critical issues, then your architecture is sound" - revskill

stopachka9mo ago

> By being particularly bad at anything outside of the most popular languages and frameworks, LLMs force you to pick a very mainstream stack if you want to be efficient.

I use clojure for my day-to-day work, and I haven't found this to be true. Opus and GPT-5 are great friends when you start pushing limits on Clojure and the JVM.

> Or 4.1 Opus if you are a millionaire and want to pollute as much possible

I know this was written tongue-in-cheek, but at least in my opinion it's worth it to use the best model if you can. Opus is definitely better on harder programming problems.

> GPT 4.1 and 5 are mostly bad, but are very good at following strict guidelines.

dwheeler9mo ago

> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.

I also find that they work fine for languages without static types. You need need tests, yes, but you need them anyway.

gkuhl219mo ago

The following is half serious. Please enjoy.

Some comments here are reminiscent of antiquated discourse: "how many angels dance on the head of a pin?"

We somehow are trying to agree on some factual ramp-up time required for a dev to become competent coding with LLM's. This is inherently subjective! Why bother?

Perhaps certain LLMs are blessed with disproportionately more angels (nee "bugs") in the machines.

I enjoyed reading the article:

"The model looks good, but Google’s enshittification has won and it looks like no competent software developers are left. I would know, many of my friends work there."

Yikes!

Credit to the author for having the courage to post publically.

abrookewood9mo ago

"Google’s enshittification has won and it looks like no competent software developers are left. I would know, many of my friends work there". Ouch ... I hope his friends are in marketing!

dash29mo ago

They missed OpenAI Codex, maybe deliberately? It's less llm-development and more vibe-coding, or maybe "being a PHB of robots". I'm enjoying it for my side project this week.

Mystery-Machine9mo ago

> Claude 4 Sonnet > Or 4.1 Opus if you are a millionaire and want to pollute as much possible

That was an unnecessary guilt-shaming remark.

itsalotoffun9mo ago

Yeah, this moralizing is like side-eyeing your fellow soldiers for killing "too much" because your level of killing is fine.

weeksie9mo ago

It becomes farcical when not only are you missing the big thing but you're also proud of your ignorance and this guy is both.

ontigola9mo ago

SadErn9mo ago

It's all about the Kilo Code extension.

j / k navigate · click thread line to collapse