I assume until LLMs are 100% better than humans in all cases, as long as I have to be in the loop there will be a pretty hard upper bound on what I can do and it seems like we’ve roughly hit that limit.
Funny enough, I get this feeling with a lot of modern technology. iPhones, all the modern messaging apps, etc make it much too easy to fragment your attention across a million different things. It’s draining. Much more draining than the old days
If your consciousness is driving, your brain is internally aligned. You type as you think. You can get flow state, or at least find a way to think around a problem.
If you're working with someone else and having to discuss everything as you go, then it's just a different activity. I've collaboratively written better code this way in the past. But it's slower and more exhausting.
Like pair programming, I hope people realise that there's a place for both, and doing exclusively one or the other full time isn't in everyone's best interests.
I do as well, so totally know what you're talking about. There's part of me that thinks it will become less exhausting with time and practice.
In high school and college I worked at this Italian place that did dine in, togo, and delivery orders. I got hired as a delivery driver and loved it. A couple years in there was a spell where they had really high turnover so the owners asked me to be a waiter for a little while. The first couple months I found the small talk and the need to always be "on" absolutely exhausting, but overtime I found my routine and it became less exhausting. I definitely loved being a delivery driver far more, but eventually I did hit a point where I didn't feel completely drained after every shift of waiting tables.
I can't help but think coding with LLMs will follow a similar pattern. I don't think I'll ever like it more than writing the code myself, but I have to believe at some point I'll have done it enough that it doesn't feel completely draining.
With the rise of open source, there started to be more black-box compositing, you grabbed some big libraries like Django or NumPy and honestly just hoped there weren't any bugs, but if there were, you could plausibly step through the debugger and figure out what was going wrong and file a bug report.
Now, the LLMs are generating so many orders of magnitude more code than any human could ever have the chance to debug, you're basically just firing this stuff out like a firehose on a house fire, giving it as much control as you can muster but really just trusting the raw power of the thing to get the job done. And, bafflingly, it works pretty well, except in those cases where it doesn't, so you can't stop using the tool but you can't really ever get comfortable with it either.
When I first started dabbling in the use of LLMs for coding, I almost went overboard trying to build all kinds of tools to maximize their use: parallel autonomous worktree-based agents, secure sandboxing for agents to do as they like, etc.
I now find it much more effective to use LLMs in a target and minimalist manner. I still architecturally important and tricky code by hand, using LLMs to do several review passes. When I do write code with LLMs, I almost never allow them to do it without me in the loop, approving every single edit. I limit the number of simultaneous sessions I manage to at most 3 or 4. Sometimes, I take a break of a few days from using LLMs (and ofter from writing any code at all), and just think and update the specs of the project(s) I'm working on at a high level, to ensure I not doing busy-work in the wrong direction.
I don't think I'm missing anything by this approach. If anything, I think I am more productive.
The code part is trivial and a waste of time in some ways compared to time spent making decisions about what to build. And sometimes even a procrastination to avoid thinking about what to build, like how people who polish their game engine (easy) to avoid putting in the work to plan a fun game (hard).
The more clarity you have about what you’re building, then the larger blocks of work you can delegate / outsource.
So I think one overwhelming part of LLMs is that you don’t get the downtime of working on implementation since that’s now trivial; you are stuck doing the hard part of steering and planning. But that’s also a good thing.
And when you make the decisions it is you who is responsible for them. Whereas if you just do the coding the decisions about the code are left largely to you nobody much sees them, only how they affect the outcome. Whereas now the LLM is in that role, responsible only for what the code does not how it does it.
This is such a weird statement. Game engines are among the most complicated pieces of software in existence. Furthermore, a game that doesn't run smoothly increases the chances that your player base doesn't stick around to see what you've built.
LLMs will do pretty much exactly what you tell them, and if you don't tell them something they'll make up something based on what they've been trained to do. If you have rules for what good code looks like, and those are a higher bar than 'just what's in the training data' then you need to build a clear context and write an unambiguous prompt that gets you what you want. That's a lot of work once to build a good agent or skill, but then the output will be much better.
AI is not that, it's a casino. Every time you put words into the prompt you're left with a cortisol spike as you hope the LLM lottery gives you a good answer. You get a little dopamine spike when it does, but it's not the same as when you do it yourself because it's punctuated by anxiety, which is addictive but draining. And I personally have never gotten into a state of LLM-induced "flow", but maybe others have and can explain that experience. But to me there's too much anxiety around the LLM from the randomness of what it produces.
I'm not talking about "oh, this function is deprecated, have to use this other one, but more "this approach is wrong, maybe delete it all and try a different approach"?
Because IME an AI never discards an approach, they just continue adding band aids and conditional to make the wrong approach work.
Like, did we think waterfall suddenly works now just because typing can be automated? No.
The result is that I could say that it was code that I myself approved of. I can't imagine a time when I wouldn't read all of it, when you just let them go the results are so awful. If you're letting them go and reviewing at the end, like a post-programming review phase, I don't even know if that's a skill that can be mastered while the LLMs are still this bad. Can you really master Where's Waldo? Everything's a mess, but you're just looking for the part of the mess that has the bug?
I'm not reviewing after I ask it to write some entire thing. I'm getting it to accomplish a minimal function, then layering features on top. If I don't understand where something is happening, or I see it's happening in too many places, I have to read the code in order to tell it how to refactor the code. I might have to write stubs in order to show it what I want to happen. The reading happens as the programming is happening.
"If you don't know what you want your code to do, the computer sure as heck won't know either." I keep this with me today. Before I run my code for the first time or turn on my hardware for the first time, I ask myself, "What _exactly_ am I expecting to see here?" and if I can't answer that it makes me take a closer and more adversarial look at my own output before running it.
But that's not what the above comment said.
> Just let it run, check debugger/stdout/localhost page and adjust: "Oh, right, the entries are missing canonical IDs, but at the same time there are already all the comments in them, forgot they would be there
So you did have an expectation that the entries should have some canonical IDs, and anticipated/desired a certain specific behavior of the system.
Which is basically the meaning of "what will the output be?" when simplified for programming novices at university.
But I absolutely loathe reviewing these generated PRs - more so when I know the submitter themselves has barely looked at the code. Now corporate has mandated AI usage and is asking people to do 10k LOC PRs every day. Reviewing this junk has become exhausting.
I don’t want to read your code if you haven’t bothered to read it yourselves. My stance is: reviewing this junk is far more exhausting. Coding is actually the fun part.
That's a big red flag if I ever saw one. Corporate should be empowering the engineering team to use AI tooling to improve their own process organically. Is this true or exaggeration? If it's true I'd start looking for a more balanced position at more disciplined org.
On a different note: something I just discovered is that if you google "my condolences", the AI summary will thank you for the kindness before defining its meaning, fun.
Nitpick it to death. Ask the reviewer questions on how everything works. Even if it looks good, flip a coin and reject it anyway. Drag that review time out. You don't want unlucky PRs going through after all.
Corporate is not going to wake up and do the sensible thing on its own.
Also, there is no point in asking questions when you know that they just yoloed it and won't be able to answer anything.
We have collectively lost our common sense and reasonable people are doing unreasonable things because there's an immense amount of pressure from the top.
I’ve certainly seen my share of what I call slot driven development where a developer just throws things at the wall until something mostly works. And plenty if cut and paste development.
But it’s far from the majority. It’s usually the same few developers at a company doing it, while the people who know what they’re doing furiously work to keep things from falling apart.
If the majority of devs were doing this nothing would work. My worry is that AI lets the bad devs produce this kind of work on a massive scale that overwhelms the good devs ability to fight back or to even comprehend the system.
I think you just need to add a /complexify one with the same pattern, ask the AI to make everything as complex and long-winded as possible, LOC over clarity
For context, I started an experiment to rebuild a previous project entirely with LLMs back in June '25 ("fully vibecoded" - not even reading the source).
After iterating and finally settling on a design/plan/debug loop that works relatively well, I'm now experiencing an old problem like new: doing too much!
As a junior engineer, it's common to underestimate the scope of some task, and to pile on extra features/edge cases/etc. until you miss your deadline. A valuable lesson any new programmer/software engineer necessarily goes though.
With "agentic engineering," it's like I'm right back at square one. Code is so cheap/fast to write, I find myself doing it the "right way" from the get go, adding more features even though I know I shouldn't, and ballooning projects until they reach a state of never launching.
I feel like a kid again (:
If I give it anything resembling anything that I'm not an expert on, it will make a mess of things.
Admittedly I'm knowledgable in most of the domains I use LLMs for, but even so, my prompts are much longer now than they used to be.
LLMs are token happy, especially Claude, so if you give it a short 1-2 sentence prompt, your results will be wildly variable.
I now spend a lot of mental energy on my prompting, and resist the urge to use less-than-professional language.
Instead of "build me an app to track fitness" it's more like:
> "We're building a companion app for novice barbell users, roughly inspired by the book 'Starting Strength.' The app should be entirely local, with no back-end. We're focusing on iOS, and want to use SwiftUI. Users should [..] Given this high-level description, let's draft a high-level design doc, including implementation decisions, open questions, etc. Before writing any code, we'll review and iterate on this spec."
I've found success in this method for building apps/tools in languages I'm not proficient in (Rust, Swift, etc.).
Is that why it's in quotes because it's the opposite of the right way?
If there's one thing I learned in a decade+ of professional programming, it's that we can't predict the future. That's it, that simple. YANGNI. (also: model the data, but I'm trying to make a point here)
We got into coding because we like to code; we invent reasons and justifications to code more, ship more, all the world's problems can be solved if only developers shipped more code.
Nirvana is reached when they that love and care about the shipping of the code know also that it's not the shipping of the code that matters.
The most important thing is shipping/getting feedback, everything else is theatre at best, or a project-killing distraction at worst.
As a concrete example, I wanted to update my personal website to show some of these fully-vibecoded projects off. That seemed too simple, so instead I created a Rotten Tomatoes-inspired web app where I could list the projects. Cool, should be an afternoon or two.
A few yak shaves later, and I'm adding automatic repo import[0] from Github...
Totally unnecessary, because I don't actually expect anyone to use the site other than me!
You know you can leave abusive relationships. Ditch the clanker and free your mind.
As somebody who has been coding for just shy of 40 years and has gone through the actual pain on learning to run a high level and productive dev team, your experience does not match mine. Even great devs will forget some of the basics and make mistakes and I wish every junior (hell even seniors) were as effective as the LLMs are turning out to be. Put the LLM in the hands of a seasoned engineer who also has the skills to manage projects and mentor junior devs and you have a powerful accelerator. I'm seeing the outcome of that every day on my team. The velocity is up AND the quality is up.
This is not my experience on a team of experienced SWEs working on a product worth 100m/year.
Agents are a great search engine for a codebase and really nice for debugging but anytime we have it write feature code it makes too many mistakes. We end up spending more time tuning the process than it takes to just write the code AND you are trading human context with agent context that gets wiped.
It's clear to me as a more seasoned engineer that I can prompt the LLM to do what I want (more or less) and it will catch generally small errors in my approach before I spend time trying them. I don't often feel like I ended up in a different place than I would have on my own. I just ended up there faster, making fewer concessions along the way.
I do worry I'll become lazy and spoiled. And then lose access to the LLM and feel crippled. That's concerning. I also worry that others aren't reading the patches the AI generates like I am before opening PRs, which is also concerning.
You might think that the "constant" task switching is draining, but I don't switch that frequently. Often I keep the main focus on one task and use the waiting time to draft some related ideas/thoughts/next prompt. Or browse through the code for light review/understanding. It also helps to have one big/complex task and a few simpler things concurrently. And since the number of details required to keep "loaded" in your head per task is fewer, switching has less cost I think. You can also "reload" much quicker by simply chatting with the agent for a minute or two, if some detail have faded.
I think a key thing is to NOT chase after keeping the agents running at max efficiency. It's ok to let them be idle while you finish up what your doing. (perhaps bad of KV cache efficiency though - I'm not sure how long they keep the cache)
(And obviously you should run the agent in a sandbox to limit how many approvals you need to consider)
[1] I use the urgent-window hint to get a subtle hint of which workspace contain an agent ready for input.
EDIT: disclaimer - I'm relative new to using them, and have so far not used them for super complex tasks.
I can finally do my preferred workflow: Research, (design, critique), (plan, critique, design), implement.
Design and planning has a quick enough turnaround cycle to not get annoying. By the time the agent is writing code, I have no involvement anymore. Just set it and forget it, come back in half an hour or so to see if it's done yet. Meanwhile, I look at the bigger picture and plan out my next prompt cycles as it churns out code.
For example, this project was entirely written by LLM:
https://github.com/kstenerud/yoloai
I never wrote a single line of this code (I do review it, of course, but even then the heavy lifting for that can be offloaded to an LLM so that I can focus on wider issues, which most often are architectural).
In particular, take a look at the docs/dev subdir to see the planning and design. Once the agent has that, it's MUCH harder for it to screw things up.
Is it as tight as it could be? Nope, but it has a solid architecture, does its job well, and has good debugging infrastructure so fixes are fast. I wouldn't use this approach for embedded or projects requiring maximum performance, but for regular code it's great!
yoloAI is intended to be consumed both as a library and as a standalone binary, and the public 80/20 API surface isn't yet mature enough to make a -filter list worthwhile.
This did give me ideas for some other checks I can add, though! :)
I find LLMs are great for building ideas, improving understanding and basic prototyping. This is more useful at the start of the project lifecycle, however when getting toward release it's much more about refactoring and dealing with large numbers of files and resources, making very specific changes e.g. from user feedback.
For those of us with decades of muscle memory who can fix a bug in 30 seconds with a few Vim commands, LLMs are very likely to be slower in most coding tasks, excepting prototyping and obscure bug spotting.
Another way you can read this is as a new cult member that his chiding himself whenever he might have an intrusive thought that Dear Leader may not be perfect, after all.
My pet theory is we haven't figured out what the best way to use these tools are, or even seen all the options yet. But that's a bigger topic for another day.
The only time I've felt something akin to this with a compiler is when I was learning Rust. But that went away after a week or two.
Maybe the right answer is to sometimes slow down, explore and think a little more instead of just letting it try something until it (eventually, sort of) works.
I mostly use YOLO mode which means I'm not constantly watching them and approving things they want to do... but also means I'm much more likely to have 2-3 agent sessions running in parallel, resulting in constant switching which is very mentally taxing.
What impacts cognition for me, and IMO for a lot of folks, is how well we end up defining our outcomes. Agents are tremendous at working towards the outcome (hence by TDD red-green works wonderfully), but if you point them to a goal slightly off, then you'll have to do the work of getting them on track, demanding cognition.
So the better you're at your initial research/plan phase, where you document all of your direction and constraints, the lesser effort is needed in the review.
The other thing impacting cognition is how many parallel threads you're running. I have defaulted to major/minor system - at any time I have 1 major project (higher cognition) and 1 minor agent (lower cognition) going. It's where managing this is comfortable.
same thing happened with crypto - the underlying technology is cool but the community is what makes it so hated
If you look at them through that lens then they are less exhausting in my opinion, but I hear ya.
I feel the burn out too. It's because of all the hype and people out there (most of whom have no programming experience at all mind you) believing these tools can do something they cannot. Then everyone seems to intent on doing better here that they start trying to run multiple agents, etc. Ultimately this results in less productivity.
I'm going to be honest. At work, I've seen the team begin to build custom internal apps and dashboards that literally do the same thing as Jira and observability tools that we already pay for. It just happens to...OMG...put the data that used to be on two different browser tabs onto the same one! Woah! Amazing! It only took two weeks to build too! Jira is so cooked! Except. It's not. Because this little reporting app doesn't do anything and it has bugs to maintain. Oh right and it didn't go through the regular SDLC or follow any code review process so it's a violation of SOC 2. But you know what? They get a pat on the back.
This industry is as the kids like to say - cooked.
It's like with regular non-llm assisted coding. Sometimes you gotta sleep on it and make a new /plan with a fresh direction.
I imagine I will greatly reduce my job prospects as a hold out, but honestly, from what I've read I think I'd rather take a hefty pay hit and not go there. It sounds like a mental heath disaster and fast track to serious burnout.
YMMV, I realize I'm in the minority, this is unproductive ranting, yada yada yada
I use them when I find them helpful, and that’s the case in plenty of situations. Figuring out architecture and design, finding bugs, analyzing and explaining a codebase, writing little scripts and utilities (especially in areas where you lack familiarity), etc. are all pure wins, imo. They increase my productivity and quality of output without any real downside.
When it comes to writing the bulk of a codebase or doing ongoing maintenance on a nontrivial system, a lot of ymmv comes into play. There’s no real reason (yet!) to believe that if you’re not committing 10k lines of generated slop per day, you’re going to be left behind. People doing that are on a bleeding edge that may have already cut them deeper than they realize.
In short, there’s an enormous middle ground between Yegge’s Gas Town and “I refuse to use LLMs for development”. I’m enjoying working in that middle ground. It’s interesting and stimulating, it makes a lot of things easier and quicker, and I’m growing and learning. If that stops, I’ll just change what I’m doing.
Many top labs [1] [2] already have heavily automated code review already and it's not slowing down. That doesn't mean I'm trusting everything blindly, but yes, over time, it should handle less and less "lower level" tasks and it's a good thing if it can.
[1] https://openai.com/index/harness-engineering/ [2] https://claude.com/blog/code-review
Further I want to vent about two things:
- Things can be improved.
- You are allowed to complain about anything, while not improving things yourself.
I think the mid 2010s really popularized self improvement in a way that you can't really argue with (if you disagree with "put in more effort and be more focused", you're obviously just lazy!). It's funny because the point of engineering is to find better solutions, but technically yes, an always valid solution is just "suck it up".
But moreover, if you do not allow these two premises, what ends up happening in practice for a lot of people, is that basically you can just interpret any slightly pushback as "oh they're just a whiner", and if they're not doing something to fix their problem this instant, that "obviously" validates your claim (and even if they are, it doesn't count, they should still not be a "debbie downer", etc.).
Sometimes a premise can sound extreme, but people forget that premises are not in a complete logical vaccuum, you actually live out and believe said premises, and by taking on a certain position, it's often more about what follows downstream from the behavior than the actual words themselves.
That said, it helps to be in tune with your own body and mind. You need breaks now and then and with AI interactions, you will be "ON" more than just working through problems on your own. The AI can work through the boilerplate that lets your mind rest at a relatively blazing pace, leaving you to evaluate and iterate relatively quickly. You will find yourself more "worn out" from the constant thinking faster.
IIRC most people burn out after 4-6 hours of heavy thought work... take a long meal break, then consider getting back into it or not. Identify when it's okay to stop for the day... you may be getting good progress, but if you aren't in the right mindset it's you that may well be introducing mistakes into things.
Beyond this, I tend to plan/track things in TODO.md files as I work/plan through things... keeping track of what needs to be done, combined with history, and even the "why" along the way... AI makes it easy to completely swap out a backend library pretty quickly, especially with a good testing surface in place. But it helps to track why you're doing certain things... why you're making the changes you are on a technical level.
I think the exhausting part is more probably more tied to the evaluation of the work the agent is doing, understanding its thought process and catching the hang up can be tedious in the current state of AI reasoning.
Working with an agent coding all day can be exhilarating but also exhausting - maybe it’s because consequential decisions are packed more tightly together. And yes cognition still matters for now.
Another thing I found is that it is too easy to keep going. I would work for too long and get even more exhausted. It feels rude to just stop a conversation. LLMs don't really care about social norms like that, but it still felt awkward to me and I would worry about losing the context I had.
To help with that, I wrote my own little plugin that reminds me to start winding down at the end of the work day and starts prompting me (pardon my phrasing) to take the off-ramp; to relay any thoughts and todos I still have in mind and put them down to pick up the next day.
This is in no way production ready, but it might be an inspiration: https://github.com/pindab0ter/wind-down
If a problem is a continuation of the current or other chat, switch to it. If it is a new problem or sub-problem requiring something more extensive than a tiny refactor, a new chat is started.
From there,
Start in Ask mode. Ask about existing code I'm trying to modify. If I am interfacing with someone else's code, that is put in reach of the project and ask questions about how it produces certain results or what a function does. Ask the foundational 'bottom-up' questions, how does this work? what routine produces x? Call out specific external sources from the web if it contains relevant information, like an API. Iterate until I feel I have a grasp of what I can build with. Not only does this help me comprehend the boundaries in terms of existing capability and/or shortcomings, it seeds the context.
Move to Plan mode. Provide a robust problem statement and incorporate findings from the Ask into the problem statement, and what the desired output is. Throw in some guard rails to narrow the search path used by the LLM as it seeks the solution. Disqualify certain approaches if necessary. If the LLMs plan isn't aligned with my goals, or I remember that thing I skipped, I amend the plan. The plan prompt I typed is saved to a blank file in the text editor.
Implement.
Validate. If it works, great. Read the code and approve each change, usually I speed read this.
If it doesn't work, I tell the LLM the difference between the expected and actual result and instruct it to instrument the code to produce trace output. Then feed the trace output back into it with explanations of where the output doesn't match my expectations (often times revealing weaknesses in my problem statement). Sometimes when it is a corner case that is problematic, several iterations are required then screen for regressions. If I reach the point where I know I screwed up the planning prompt, I trash the changeds, then I revise the copypasta saved earlier and start a new Planning session.
Another trick I learnt is you can ask Claude to ask you comprehensive questions for clarification. Usually, it will then offer you a choice of 3 options per question that it might have and you can steer it towards the right implementation.
So I'm writing code by hand today and using Claude to track down type and dependency errors. It feels good, I might do this for a while.
I have also had to step back, for my own sanity, and approach how I am using these tools. They are very strong slot machines, especially Claude models which require more steering, and that's not a good match with my brain and work style. You're not alone! Keep on trying to work better :)
My limits are now many of the same things that are have always been core to software dev, but are now even more obvious:
- what is the thing we are building? What is the core product or bug fix or feature?
- what are we _not_ building? What do we not care about?
- do I understand the code enough to guide design and architecture?
- can I guide dev and make good choices when it’s far outside my expertise but I know enough to “smell” when things are going off the rails
It’s a weird time
There's probably a Codex equivalent, but I don't know what it is.
I learned years ago that I when I write code after 10 PM, I'm go backward instead of forward. It was easy to see, because the test just wouldn't pass, or I'd introduce several bugs that each took 30 minutes to fix.
I'm learning now that it's no different, working with agents.
If AI is doing the coding then it gets to solve the problems and I don’t get the satisfaction/dopamine/motivation you get when you solve a programming problem in a clever way.
Its amazing how right and wrong LLMs can be in the output produced. Personally the variance for me is too much... I cant stand when it gets things wrong on the most basic of stuff. I much prefer doing things without output from an LLM.
Man, I envy you. For me, the joy comes from writing good code that I can be proud of. I never got ANY joy from writing a prompt.
I mean, it is a means to an end (getting the LLMs to do the boring stuff) and so it is a necessary evil. Also, the LLMs are at times amazing and at times dumb as rocks even for very similar prompts. That drives me crazy because it feels I have no control over those things.
Coldtea's law: "Never attribute to context rot that which is adequately explained by cost-cutting".