I'm a CS teacher, so this is where I see a huge danger right now and I'm explicit with my students about it: you HAVE to write the code. You CAN'T let the machines write the code. Yes, they can write the code: you are a student, the code isn't hard yet. But you HAVE to write the code.
This is the ultimate problem with AI in academia. We all inherently know that “no pain no gain” is true for physical tasks, but the same is true for learning. Struggling through the new concepts is essentially the point of it, not just the end result.
Of course this becomes a different thing outside of learning, where delivering results is more important in a workplace context. But even then you still need someone who does the high level thinking.
It was a bizarre disconnect having someone be both highly educated and yet crippled by not doing.
A good analogy here is programming in assembler. Manually crafting programs at the machine code level was very common when I got my first computer in the 1980s. Especially for games. By the late 90s that had mostly disappeared. Games like Roller Coaster Tycoon were one of the last ones with huge commercial success that were coded like that. C/C++ took over and these days most game studios license an engine and then do a lot of work with languages like C# or LUA.
I never did any meaningful amount of assembler programming. It was mostly no longer a relevant skill by the time I studied computer science (94-99). I built an interpreter for an imaginary CPU at some point using a functional programming language in my second year. Our compiler course was taught by people like Eric Meyer (later worked on things like F# at MS) who just saw that as a great excuse to teach people functional programming instead. In hindsight, that was actually a good skill to have as functional programming interest heated up a lot about 10 years later.
The point of this analogy: compilers are important tools. It's more important to understand how they work than it is to be able to build one in assembler. You'll probably never do that. Most people never work on compilers. Nor do they build their own operating systems, databases, etc. But it helps to understand how they work. The point of teaching how compilers work is understanding how programming languages are created and what their limitations are.
Even while vibe-coding, I often found myself getting annoyed just having to explain things. The amount of patience I have for anything that doesn't "just work" the first time has drifted toward zero. If I can't get AI to do the right thing after three tries, "welp, I guess this project isn't getting finished!"
It's not just laziness, it's like AI eats away at your pride of ownership. You start a project all hyped about making it great, but after a few cycles of AI doing the work, it's easy to get sucked into, "whatever, just make it work". Or better yet, "pretend to make it work, so I can go do something else."
The idea was to develop a feel for cutting metal, and to better understand what the machine tools were doing.
--
My wood shop teacher taught me how to use a hand plane. I could shave off wood with it that was so thin it was transparent. I could then join two boards together with a barely perceptible crack between them. The jointer couldn't do it that well.
I've hired and trained tons of junior devs out of university. They become 20x productive after a year of experience. I think vibe coding is getting new devs to 5x productivity, which seems amazing, but then they get stuck there because they're not learning. So after year one, they're a 5x developer, not a 20x developer like they should be.
I have some young friends who are 1-3 years into software careers I'm surprised by how little they know.
Recently in comments people were claiming that working with LLMs has sharpened their ability to organize thoughts, and that could be a real effect that would be interesting to study. It could be that watching an LLM organize a topic could provide a useful example of how to approach organizing your own thoughts.
But until you do it unassisted you haven’t learned how to do it.
I do Windows development and GDI stuff still confuses me. I'm talking about memory DC, compatible DC, DIB, DDB, DIBSECTION, bitblt, setdibits, etc... AIs also suck at this stuff. I'll ask for help with a relatively straightforward task and it almost always produces code that when you ask it to defend the choices it made, it finds problems, apologizes, and goes in circles. One AI (I forget which) actually told me I should refer to Petzold's Windows Programming book because it was unable to help me further.
https://www.slater.dev/2025/08/llms-are-not-bicycles-for-the...
I actually fear more for the middle-of-career dev who has shunned AI as worthless. It's easier than ever for juniors to learn and be productive.
Without the clarity that comes from thinking with code, a programmer using AI is the blind leading the blind.
The social aspect of a dialogue is relaxing, but very little improvement is happening. It's like a study group where one (relatively) incompetent student tries to advise another, and then test day comes and they're outperformed by the weirdo that worked alone.
I learned more about programming in a weekend badly copying hack modules for Minecraft than I learned in 5+ years in university.
All that stuff I did by hand back then I haven't used it a single time after.
If AI is used by the student to get the task done as fast as possible the student will miss out on all the learning (too easy).
If no AI is used at all, students can get stuck for long periods of time on either due to mismatches between instructional design and the specific learning context (missing prereq) or by mistakes in instructional design.
AI has the potential to keep all learners within an ideal difficulty for optimal rate of learning so that students learn faster. We just shouldn't be using AI tools for productivity in the learning context, and we need more AI tools designed for optimizing learning ramps.
People said this about compilers. It depends what layer you care to learn/focus on. AI at least gives us the option to move up another level.
Your curriculum may be different than it is around here, but here it's frankly the same stuff I was taught 30 years ago. Except most of the actual computer science parts are gone, replaced with even more OOP, design pattern bullshit.
That being said. I have no idea how you'd actually go about teaching students CS these days, considering a lot of them will probably use ChatGPT or Claude regardless of what you do. That is what I see in the statistic for grades around here. For the first 9 years I was a well calibrated grader, but these past 1,5ish years it's usually either top marks or bottom marks with nothing in between. Which puts me outside where I should be, but it matches the statistical calibration for everyone here. I obviously only see the product of CS educations, but even though I'm old, I can imagine how many corners I would have cut myself if I had LLM's available back then. Not to mention all the distractions the internet has brought.
> you give it a simple task. You’re impressed. So you give it a large task. You’re even more impressed.
That has _never_ been the story for me. I've tried, and I've got some good pointers and hints where to go and what to try, a result of LLM's extensive if shallow reading, but in the sense of concrete problem solving or code/script writing, I'm _always_ disappointed. I've never gotten satisfactory code/script result from them without a tremendous amount of pushback, "do this part again with ...", do that, don't do that.
Maybe I'm just a crank with too many preferences. But I hardly believe so. The minimum requirement should be for the code to work. It often doesn't. Feedback helps, right. But if you've got a problem where a simple, contained feedback loop isn't that easy to build, the only source of feedback is yourself. And that's when you are exposed to the stupidity of current AI models.
> There should be a TaskManager that stores Task objects in a sorted set, with the deadline as the sort key. There should be methods to add a task and pop the current top task. The TaskManager owns the memory when the Task is in the sorted set, and the caller to pop should own it after it is popped. To enforce this, the caller to pop must pass in an allocator and will receive a copy of the Task. The Task will be freed from the sorted set after the pop.
> The payload of the Task should be an object carrying a pointer to a context and a pointer to a function that takes this context as an argument.
> Update the tests and make sure they pass before completing. The test scenarios should relate to the use-case domain of this project, which is home automation (see the readme and nearby tests).
You can’t deny the fact that someone like Ryan dhal creator of nodejs declared that he no longer writes code is objectively contrary to your own experience. Something is different.
I think you and other deniers try one prompt and then they see the issues and stop.
Programming with AI is like tutoring a child. You teach the child, tell it where it made mistakes and you keep iterating and monitoring the child until it makes what you want. The first output is almost always not what you want. It is the feedback loop between you and the AI that cohesively creates something better than each individual aspect of the human-AI partnership.
I think uncritical AI enthusiasts are just essentially making the bet that the rising mountains of tech debt they are leaving in their wake can be paid off later on with yet more AI. And you know, that might even work out. Until such a time, though, and as things currently stand, I struggle to understand how one can view raw LLM code and find it acceptable by any professional standard.
I have ones that describe what kinds of functions get unit vs integration tests, how to structure them, and the general kinds of test cases to check for (they love writing way too many tests IME). It has reduced the back and forth I have with the LLM telling it to correct something.
Usually the first time it does something I don't like, I have it correct it. Once it's in a satisfactory state, I tell it to write a Cursor rule describing the situation BRIEFLY (it gets way to verbose by default) and how to structure things.
That has made writing LLM code so much more enjoyable for me.
For example, someone may ask an LLM to write a simple http web server, and it can do that fine, and they consider that complex, when in reality its really not.
Because of Beads I can have Claude do a code review for serious bugs and issues and sure enough it finds some interesting things I overlooked.
I have also seen my peers in the reverse engineering field make breakthroughs emulating runtimes that have no or limited existing runtimes, all from the ground up mind you.
I think the key is thinking of yourself as an architect / mentor for a capable and promising Junior developer.
A complete exercise in frustration that has turned me off of all agentic code bullshit. The only reason I still have Claude Code installed is because I like the `/multi-commit` skill I made.
That's exactly the point. Modern coding agents aren't smart software engineers per se; they're very very good goal-seekers whose unit of work is code. They need automatable feedback loops.
You try a gamut of sample inputs and observe where its going awry? Describe the error to it and see what it does
A trivial example is your happy path git workflow. I want:
- pull main
- make new branch in user/feature format
- Commit, always sign with my ssh key
- push
- open pr
but it always will
- not sign commits
- not pull main
- not know to rebase if changes are in flight
- make a million unnecessary commits
- not squash when making a million unnecessary commits
- have no guardrails when pushing to main (oops!)
- add too many comments
- commit message too long
- spam the pr comment with hallucinated test plans
- incorrectly attribute itself as coauthor in some gorilla marketing effort (fixable with config, but whyyyyyy -- also this isn't just annoying, it breaks compliance in alot of places and fundamentally misunderstands the whole point of authorship, which is copyright --- and AIs can't own copyright )
- not make DCO compliant commits ...
Commit spam is particularly bad for bisect bug hunting and ref performance issues at scale. Sure I can enforce Squash and Merge on my repo but why am I relying on that if the AI is so smart?
All of these things are fixed with aliases / magit / cli usage, using the thing the way we have always done it.
Yet. Most of my criticism is not after running the code, but after _reading_ the code. It wrote code. I read it. And I am not happy with it. No even need to run it, it's shit at glance.
My theory is that the people who are impressed are trying to build CRUD apps or something like that.
It is hands down good for code which is laborious or tedious to write, but once done, obviously correct or incorrect (with low effort inspection). Tests help but only if the code comes out nicely structured.
I made plenty of tools like this, a replacement REPL for MS-SQL, a caching tool in Python, a matplotlib helper. Things that I know 90% how to write anyway but don't have the time, but once in front of me, obviously correct or incorrect. NP code I suppose.
But business critical stuff is rarely like this, for me anyway. It is complex, has to deal with various subtle edge cases, be written defensively (so it fails predictably and gracefully), well structured etc. and try as I might, I can't get Claude to write stuff that's up to scratch in this department.
I'll give it instructions on how to write some specific function, it will write this code but not use it, and use something else instead. It will pepper the code with rookie mistakes like writing the same logic N times in different places instead of factoring it out. It will miss key parts of the spec and insist it did it, or tell me "Yea you are right! Let me rewrite it" and not actually fix the issue.
I also have a sense that it got a lot dumber over time. My expectations may have changed of course too, but still. I suspect even within a model, there is some variability of how much compute is used (eg how deep the beam search is) and supply/demand means this knob is continuously tuned down.
I still try to use Claude for tasks like this, but increasingly find my hit rate so low that the whole "don't write any code yet, let's build a spec" exercise is a waste of time.
I still find Claude good as a rubber duck or to discuss design or errors - a better Stack Exchange.
But you can't split your software spec into a set of SE questions then paste the code from top answers.
> It is hands down good for code which is laborious or tedious to write, but once done, obviously correct or incorrect (with low effort inspection).
The problem here is, that it fills in gaps that shouldn't be there in the first place. Good code isn't laborious. Good code is small. We learn to avoid unnecessary abstractions. We learn to minimize "plumbing" such that the resulting code contains little more than clear and readable instructions of what you intend for the computer to do.
The perfect code is just as clear as the design document in describing the intentions, only using a computer language.
If someone is gaining super speeds by providing AI clear design documents compared to coding themselves, maybe they aren't coding the way they should.
You can't dispense with yourself in those scenarios. You have to read, think, investigate, break things down into smaller problems. But I employ LLM's to help with that all the time.
Granted, that's not vibe coding at all. So I guess we are pretty much in agreement up to this point. Except I still think LLMs speed up this process significantly, and the models and tools are only going to get better.
Also, there are a lot of developers that are just handed the implementation plan.
That's your job.
The great thing about coding agents is that you can tell them "change of design: all API interactions need to go through a new single class that does authentication and retries and rate-limit throttling" and... they'll track down dozens or even hundreds of places that need updating and fix them all.
(And the automated test suite will help them confirm that the refactoring worked properly, because naturally you had them construct an automated test suite when they built those original features, right?)
Going back to typing all of the code yourself (my interpretation of "writing by hand") because you don't have the agent-managerial skills to tell the coding agents how to clean up the mess they made feels short-sighted to me.
I dunno, maybe I have high standards but I generally find that the test suites generated by LLMs are both over and under determined. Over-determined in the sense that some of the tests are focused on implementation details, and under-determined in the sense that they don't test the conceptual things that a human might.
That being said, I've come across loads of human written tests that are very similar, so I can see where the agents are coming from.
You often mention that this is why you are getting good results from LLMs so it would be great if you could expand on how you do this at some point in the future.
> [Agents write] units of changes that look good in isolation.
I have only been using agents for coding end-to-end for a few months now, but I think I've started to realise why the output doesn't feel that great to me.
Like you said; "it's my job" to create a well designed code base.
Without writing the code myself however, without feeling the rough edges of the abstractions I've written, without getting a sense of how things should change to make the code better architected, I just don't know how to make it better.
I've always worked in smaller increments, creating the small piece I know I need and then building on top of that. That process highlights the rough edges, the inconsistent abstractions, and that leads to a better codebase.
AI (it seems) decides on a direction and then writes 100s of LOC at one. It doesn't need to build abstractions because it can write the same piece of code a thousand times without caring.
I write one function at a time, and as soon I try to use it in a different context I realise a better abstraction. The AI just writes another function with 90% similar code.
I increasingly feel a sort of "guilt" when going back and forth between agent-coding and writing it myself. When the agent didn't structure the code the way I wanted, or it just needs overall cleanup, my frustration will get the best of me and I will spend too much time writing code manually or refactoring using traditional tools (IntelliJ). It's clear to me that with current tooling some of this type of work is still necessary, but I'm trying to check myself about whether a certain task really requires my manual intervention, or whether the agent could manage it faster.
Knowing how to manage this back and forth reinforces a view I've seen you espouse: we have to practice and really understand agentic coding tools to get good at working with them, and it's a complete error to just complain and wait until they get "good enough" - they're already really good right now if you know how to manage them.
> So I’m back to writing by hand for most things. Amazingly, I’m faster, more accurate, more creative, more productive, and more efficient than AI, when you price everything in, and not just code tokens per hour
At least he said "most things". I also did "most things" by hand, until Opus 4.5 came out. Now it's doing things in hours I would have worked an entire week on. But it's not a prompt-and-forget kind of thing, it needs hand holding.
Also, I have no idea _what_ agent he was using. OpenAI, Gemini, Claude, something local? And with a subscription, or paying by the token?
Because the way I'm using it, this only pays off because it's the 200$ Claude Max subscription. If I had to pay for the token (which once again: are hugely marked up), I would have been bankrupt.
No, that isn't. To quote your own blog, his job is to "deliver code [he's] proven to work", not to manage AI agents. The author has determined that managing AI agents is not an effective way to deliver code in the long term.
> you don't have the agent-managerial skills to tell the coding agents how to clean up the mess they made
The author has years of experience with AI assisted coding. Is there any way we can check to see if someone is actually skilled at using these tools besides whether they report/studies measure that they do better with them than without?
Or those skills are a temporary side effect of the current SOTA and will be useless in the future, so honing them is pointless right now.
Agents shouldn't make messes, if they did what they say on the tin at least, and if folks are wasting considerable time cleaning them up, they should've just written the code themselves.
Exactly.
AI assisted development isn't all or nothing.
We as a group and as individuals need to figure out the right blend of AI and human.
Well yea, but you can guard against this in several ways. My way is to understand my own codebase and look at the output of the LLM.
LLMs allow me to write code faster and it also gives a lot of discoverability of programming concepts I didn't know much about. For example, it plugged in a lot of Tailwind CSS, which I've never used before. With that said, it does not absolve me from not knowing my own codebase, unless I'm (temporarily) fine with my codebase being fractured conceptually in wonky ways.
I think vibecoding is amazing for creating quick high fidelity prototypes for a green field project. You create it, you vibe code it all the way until your app is just how you want it to feel. Then you refactor it and scale it.
I'm currently looking at 4009 lines of JS/JSX combined. I'm still vibecoding my prototype. I recently looked at the codebase and saw some ready made improvements so I did them. But I think I'll start to need to actually engineer anything once I reach the 10K line mark.
Then you are not vibe coding. The core, almost exclusive requirement for "vibe coding" is that you DON'T look at the code. Only the product outcome.
This is the bit I think enthusiasts need to argue doesn't apply.
Have you ever read a 200 page vibewritten novel and found it satisfying?
So why do you think a 10 kLoC vibecoded codebase will be any good engineering-wise?
I've been coding a side-project for a year with full LLM assistance (the project is quite a bit older than that).
Basically I spent over a decade developing CAD software at Trimble and now have pivoted to a different role and different company. So like an addict, I of course wanted to continue developing CAD technology.
I pretty much know how CAD software is supposed to work. But it's _a lot of work_ to put together. With LLMs I can basically speedrun through my requirements that require tons of boilerplate.
The velocity is incredible compared to if I would be doing this by hand.
Sometimes the LLM outputs total garbage. Then you don't accept the output, and start again.
The hardest parts are never coding but design. The engineer does the design. Sometimes I pain weeks or months over a difficult detail (it's a sideproject, I have a family etc). Once the design is crystal clear, it's fairly obvious if the LLM output is aligned with the design or not. Once I have good design, I can just start the feature / boilerplate speedrun.
If you have a Windows box you can try my current public alpha. The bugs are on me, not on the LLM:
https://github.com/AdaShape/adashape-open-testing/releases/t...
—
I would never use, let alone pay for, a fully vibe-coded app whose implementation no human understands.
Whether you’re reading a book or using an app, you’re communicating with the author by way of your shared humanity in how they anticipate what you’re thinking as you explore the work. The author incorporates and plans for those predicted reactions and thoughts where it makes sense. Ultimately the author is conveying an implicit mental model (or even evoking emotional states or sensations) to the reader.
The first problem is that many of these pathways and edge cases aren’t apparent until the actual implementation, and sometimes in the process the author realizes that the overall product would work better if it were re-specified from the start. This opportunity is lost without a hands on approach.
The second problem is that, the less human touch is there, the less consistent the mental model conveyed to the user is going to be, because a specification and collection of prompts does not constitute a mental model. This can create subconscious confusion and cognitive friction when interacting with the work.
Have you ever read a 200 page vibewritten novel and found it satisfying?
I haven't, but my son has. For two separate novels authored by GPT 4.5.(The model was asked to generate a chapter at a time. At each step, it was given the full outline of the novel, the characters, and a summary of each chapter so far.)
I suspect part of the reason we see such a wide range of testimonies about vibe-coding is some people are actually better at it, and it would be useful to have some way of measuring that effectiveness.
If you’re writing novel algorithms all day, then I get your point. But are you? Or have you ever delegated work? If you find the AI losing its train of thought all it takes is to try again with better high level instructions.
There's been such a massive leap in capabilities since claude code came out, which was middle/end of 2025.
2 years ago I MAYBE used an LLM to take unstructured data and give me a json object of a specific structure. Only about 1 year ago did I start using llms for ANY type of coding and I would generally use snippets, not whole codebases. It wasn't until September when I started really leveraging the LLM for coding.
Best case is still operationally correct but nightmare fuel on the inside. So maybe good for one off tools where you control inputs and can vibe check outputs without diaster if you forget to carry the one.
It wasn't fully autonomous (the reliability was a bit low -- e.g. had to get the code out of code fences programmatically), and it wasn't fully original (I stole most of it from Auto-GPT, except that I was operating on the AST directly due to the token limitations).
My key insight here was that I allowed GPT to design the apis that itself was going to use. This makes perfect sense to me based on how LLMs work. You tell it to reach for a function that doesn't exist, and then you ask it to make it exist based on how it reached for it. Then the design matches its expectations perfectly.
GPT-4 now considers self modifying AI code to be extremely dangerous and doesn't like talking about it. Claude's safety filters began shutting down similar conversations a few months ago, suggesting the user switch to a dumber model.
It seems the last generation or two of models passed some threshold regarding self replication (which is a distinct but highly related concept), and the labs got spooked. I haven't heard anything about this in public though.
Edit: It occurs to me now that "self modification and replication" is a much more meaningful (and measurable) benchmark for artificial life than consciousness is...
BTW for reference the thing that spooked Claude's safety trigger was "Did PKD know about living information systems?"
You don't need a "fully agentic" tool like Claude Code to write code. Any of the AI chatbots can write code too, obviously doing so better since the advent of "thinking" models, and RL post-training for coding. They also all have had built-in "code interpreter" functionality for about 2 years where they can not only write code but also run and test it in a sandbox, at least for Python.
Recently at least, the quality of code generation (at least if you are asking for something smallish) is good enough that cut and pasting chatbot output (e.g. C++, not Python) to compile and run yourself is still a productivity boost, although this was always an option.
Just more FUD from devs that think they're artisans.
On a personal note, vibe coding leaves me with that same empty hollow sort of tiredness, as a day filled with meetings.
And as a added benefit: I feel accomplished and proud of the feature.
I also like to think that I'm utilising the training done on many millions of lines of code while still using my experience/opinions to arrive at something compared to just using my fallible thinking wherein I could have missed some interesting ideas. Its like me++. Sure, it does a lot of heavy lifting but I never leave the steering wheel. I guess I'm still at the pre-agentic stage and not ready to letting go fully.
I’m not sure if this counts as “vibe coding” per se, but I like that this mentality keeps my workday somewhat similar to how it was for decades. Finding/creating holes that the agent can fill with minimal adult supervision is a completely new routine throughout my day, but I think obsessing over maintainability will pay off, like it always has.
It's crazy to me nevertheless that some people can afford the luxury to completely renounce AI-assisted coding.
What's the luxury? The luxury is AI-assisted coding considering how expensive it is for tokens/$.
My habit now: always get a 2nd or 3rd opinion before assuming one LLM is correct.
All code written by an LLM is reviewed by an additional LLM. Then I verify that review and get one of the agents to iterate on everything.
Might be my skills but I can tell you right now I will not be as fast as the AI especially in new codebases or other languages or different environments even with all the debugging and hell that is AI pull request review.
I think the answer here is fast AI for things it can do on its own, and slow, composed, human in the loop AI for the bigger things to make sure it gets it right. (At least until it gets most things right through innovative orchestration and model improvement moving forward.)
I have AI build self-contained, smallish tasks and I check everything it does to keep the result consistent with global patterns and vision.
I stay in the loop and commit often.
Looks to me like the problem a lot of people are having is that they have AI do the whole thing.
If you ask it "refactor code to be more modern", it might guess what you mean and do it in a way you like it or not, but most likely it won't.
If you keep tasks small and clearly specced out it works just fine. A lot better than doing it by hand in many cases, specially for prototyping.
it'll be really interesting to see in the decades to come what happens when a whole industry gets used to releasing black boxes by vb coding the hell out of it
I have to go out of my way to get this out of llms. But with enough persuasion, they produce roughly what I would have written myself.
Otherwise they default to adding as much bloat and abstraction as possible. This appears to be the default mode of operation in the training set.
I also prefer to use it interactively. I divide the problem to chunks. I get it to write each chunk. The whole makes sense. Work with its strengths and weaknesses rather than against them.
For interactive use I have found smaller models to be better than bigger models. First of all because they are much faster. And second because, my philosophy now is to use the smallest model that does the job. Everything else by definition is unnecessarily slow and expensive!
But there is a qualitative difference at a certain level of speed, where something goes from not interactive to interactive. Then you can actually stay in flow, and then you can actually stay consciously engaged.
And I also might "vibe code" when I need to add another endpoint on a deadline to earn a living. To be fair - I review and test the code so not sure it's really vibe coding.
For me it's not that binary.
We're modern day factory workers.
We can identify 3 levels of "vibe coding":
1. GenAI Autocomplete
2. Hyperlocal prompting about a specific function. (Copilot's orginal pitch)
3. Developing the app without looking at code.
Level 3 is hardly considered "vibe" coding, and Level 2 is iffy.
"90% of code written by AI" in some non-trivial contexts only very recently reached level 3.
I don't think it ever reached Level 2, because that's just a painfully tedious way of writing code.
I tried minimalist example where it totally failed few years back, and still, ChatGPT 5 produced 2 examples for "Async counter in Rust" - using Atomics and another one using tokio::sync::Mutex. I learned it was wrong then the hard way, by trying to profile high latency. To my surprise, here's quote from Tokio Mutex documentation:
Contrary to popular belief, it is ok and often preferred to use the ordinary Mutex from the standard library in asynchronous code.
The feature that the async mutex offers over the blocking mutex is the ability to keep it locked across an .await point.
Once I mastered the finite number of operations and behaviors, I knew how to tell "it" what to do and it would work. The only thing different about vibe coding is the scale of operations and behaviors. It is doing exactly what you're telling it to do. And also expectations need to be aligned. Don't think you can hand over architecture and design to the LLM; that's still your job. The gain is, the LLM will deal with the proper syntax, api calls, etc. and work as a reserach tool on steroids if you also (from another mentor later in life) ask good questions.
I am writing a game in Monogame, I am not primarily a game dev or a c sharp dev. I find AI is fantastic here for "Set up a configuration class for this project that maps key bindings" and have it handle the boiler plate and smaller configuration. Its great at give me an A start implementation for this graph. But when it becomes x -> y -> z without larger contexts and evolutions it falls flat. I still need creativity. I just don't worry too much about boiler plate, utility methods, and figuring out specifics of wiring a framework together.
I will have a conversation with the agent. I will present it with a context, an observed behavior, and a question... often tinged with frustration.
What I get out of this interaction at the end of it is usually a revised context that leads me figure out a better outcome. The AI doesn't give me the outcome. It gives me alternative contexts.
On the other hand, when I just have AI write code for me, I lose my mental model of the project and ultimately just feel like I'm delaying some kind of execution.
As a PRODUCT person, it writes code 100x faster than I can, and I treat anything it writes as a "throwaway" prototype. I've never been able to treat my own code as throwaway, because I can't just throw away multiple weeks of work.
It doesn't aid in my learning to code, but it does aid in me putting out much better, much more polished work that I'm excited to use.
If what you're doing is proprietary, or even a little bit novel. There is a really good chance that AI will screw it up. After all, how can it possibly know how to solve a problem it has never seen before?
It came very close to success, but there were 2 or 3 big show-stopping bugs such as it forgetting to update the spatial partitioning when the entities moved, so it would work at the start but then degrade over time.
It believed and got stuck on thinking that it must be the algorithm itself that was the problem, so at some point it just stuck a generic boids solution into the middle of the rest. To make it worse, it didn't even bother to use the spatial partitioning and they were just brute force looking at their neighbours.
Had this been a real system it might have made its way into production, which makes one think about the value of the AI code out there. As it was I pointed out that bit and asked about it, at which point it admitted that it was definitely a mistake and then it removed it.
I had previously implement my own version of the algorithm and it took me quite a bit of time, but during that I built up the mental code model and understood both the problem and solution by the end. In comparison it easily implemented it 10-30x faster than I did but would never have managed to complete the project on its own. Also if I hadn't previously implemented it myself and had just tried to have it do the heavy lifting then I wouldn't have understood enough of what it was doing to overcome its issues and get the code working properly.
Nobody forces you to completely let go of the code and do pure vibe coding. You can also do small iterations.
What AI (LLMs) do is raises the level of abstraction to human language via translation. The problem is human language is imprecise in general. You can see this with legal or science writing. Legalese is almost illegible to laypeople because there are precise things you need to specify and you need be precise in how you specify it. Unfortunately the tech community is misleading the public and telling laypeople they can just sit back and casually tell AI what you want and it is going to give you exactly what you wanted. Users are just lying to themself, because most-likely they did not take the time to think through what they wanted and they are rationalizing (after the fact) that the AI is giving them exactly what they wanted.
Examples.
Thanks to Claude I've finally been able to disable the ssh subsystem of the GNOME keyring infrastructure that opens a modal window asking for ssh passhprases. What happened is that I always had to cancel the modal, look for the passhprase in my password manager, restart what made the modal open. What I have now is either a password prompt inside a terminal or a non modal dialog. Both ssh-add to a ssh agent.
However my new emacs windows still open in an about 100x100 px window on my new Debian 13 install, nothing suggested by Claude works. I'll have to dig into it but I'm not sure that's important enough. I usually don't create new windows after emacs starts with the saved desktop configuration.
Then, I can reason through the AI agent's responses and decide what if anything I need to do about them.
I just did this for one project so far, but got surprisingly useful results.
It turns out that the possible bugs identified by the AI tool were not bugs based on the larger context of the code as it exists right now. For example, it found a function that returns a pointer, and it may return NULL. Call sites were not checking for a NULL return value. The code in its current state could never in fact return a NULL value. However, future-proofing this code, it would be good practice to check for this case in the call sites.
Option 1: The cost/benefit delta of agentic engineering never improves past net-zero, and bespoke hand-written code stays as valuable as ever.
Option 2: The cost/benefit becomes net positive, and economics of scale forever tie the cost of code production directly to the cost of inference tokens.
Given that many are saying option #2 is already upon us, I'm gonna keep challenging myself to engineer a way past the hurdles I run into with agent-oriented programming.
The deeper I get, the more articles like this feel like the modern equivalent of saying "internet connections are too slow to do real work" or "computers are too expensive to be useful for regular people".
I work on game engines which do some pretty heavy lifting, and I'd be loath to let these agents write the code for me.
They'd simply screw too much of it up and create a mess that I'm going to have to go through by hand later anyway, not just to ensure correctness but also performance.
I want to know what the code is doing, I want control over the fine details, and I want to have as much of the codebase within my mental understanding as possible.
Not saying they're not useful - obviously they are - just that something smells fishy about the success stories.
So while there’s no free lunch, if you are willing to pay - your lunch will be a delicious unlimited buffet for a fraction of the cost.
I think coding with an AI changes our role from code writer to code reviewer, and you have to treat it as a comprehensive review where you comment not just on code "correctness" but these other aspects the author mentions, how functions fits together, codebase patterns, architectural implications. While I feel like using AI might have made me a lazier coder, it's made me a me a significantly more active reviewer which I think at least helps to bridge the gap the author is referencing.
I admit I could be an outlier though.
It is quite scary that junior devs/college kids are more into vibe coding than putting in the effort to actually learn the fundamentals properly. This will create at least 2-3 generations of bad programmers down the line.
In order to get high accuracy PRs with AI (small, tested commits that follow existing patterns efficiently), you need to spend time adding agents (claude.md, agents.md), skills, hooks, and tools specific to your setup.
This is why so much development is happening at the plugin layer right now, especially with Claude code.
The juice is worth the squeeze. Once accuracy gets high enough you don't need to edit and babysit what is generated, you can horizontally scale your output.
Like it not, as a friend observed, we are N months away a world where most engineers never looks at source code; and the spectrum of reasons one would want to will inexorably narrow.
It will never be zero.
But people who haven't yet typed a word of code never will.
"AI can be good -- very good -- at building parts. For now, it's very bad at the big picture."
The opener is 100% true. Our current approach with AI code is "draft a design in 15mins" and have AI implement it. The contrasts with the thoughtful approach a human would take with other human engineers. Plan something, pitch the design, get some feedback, take some time thinking through pros and cons. Begin implementing, pivot, realizations, improvements, design morphs.
The current vibe coding methodology is so eager to fire and forget and is passing incomplete knowledge unto an AI model with limited context, awareness and 1% of your mental model and intent at the moment you wrote the quick spec.
This is clearly not a recipe for reliable and resilient long-lasting code or even efficient code. Spec-driven development doesn't work when the spec is frozen and the builder cannot renegotiate intent mid-flight..
The second point made clearer in the video is the kind of learned patterns that can delude a coder, who is effectively 'doing the hard part', into thinking that the AI is the smart one. Or into thinking that the AI is more capable than it actually is.
I say this as someone who uses Claude Code and Codex daily. The claims of the article (and video) aren't strawman.
Can we progress past them? Perhaps, if we find ways to have agents iteratively improve designs on the fly rather than sticking with the original spec that, let's be honest, wasn't given the rigor relative to what we've asked the LLMs to accomplish. If our workflows somehow make the spec a living artifact again -- then agents can continuously re-check assumptions, surface tradeoffs, and refactor toward coherence instead of clinging to the first draft.
Perhaps that is the distinction between reports of success with AI and reports of abject failure. Your description of "Our current approach" is nothing like how I have been working with AI.
When I was making some code to do a complex DMA chaining, the first step with the AI was to write an emulator function that produced the desired result from the parameters given in software. Then a suite of tests with memory to memory operations that would produce a verifiable output. Only then started building the version that wrote to the hardware registers ensuring that the hardware produced the same memory to memory results as the emulator. When discrepancies occurred, checking the test case, the emulator and the hardware with the stipulation that the hardware was the ground truth of behaviour and the test case should represent the desired result.
I occasionally ask LLMs to one shot full complex tasks, but when I do so it is more as a test to see how far it gets. I'm not looking to use the result, I'm just curious as to what it might be. The amount of progress it makes before getting lost is advancing at quite a rate.
It's like seeing an Atari 2600 and expecting it to be a Mac. People want to fly to the moon with Atari 2600 level hardware. You can use hardware at that level to fly to the moon, and flying to the moon is an impressive achievement enabled by the hardware, but to do so you have to wrangle a vast array of limitations.
They are no panacea, but they are not nothing. They have been, and will remain, somewhere between for some time. Nevertheless they are getting better and better.
That's exactly why this whole (nowadays popular) notion of AI replacing senior devs who are capable of understanding large codebases is nonsense and will never become reality.
I disagree though. There’s no good reason that careful use of this new form of tooling can’t fully respect the whole, respect structural integrity, and respect neighboring patterns.
As always, it’s not the tool.
this is such an individualized technology that two people at the same starting point two years ago, could've developed wildly different workflows.
That's a very bad way to look at these tools. They legit know nothing, they hallucinate APIs all the time.
The only value they have at least in my book is they type super fast.
It's just a tool with a high level of automation. That becomes clear when you have to guide it to use more sane practices, simple things like don't overuse HTTP headers when you don't need them.
Good take though.
2026: "If I can just write the specs so that the machine understands them it will write me code that works."
You should never just let AI "figure it out." It's the assistant, not the driver.
his points about why he stopped using AI: these are the things us reluctant AI adopters have been saying since this all started.
I just bootstrapped a 500k loc MVP with AI Generator, Community and Zapier integration.
www.clases.community
And is my 3rd project that size, fully vibe coded
This is vibe argumenting.
I have been tolerably successful. However, I have almost 30 years of coding experience, and have the judgement on how big a component should be - when I push that myself _or_ with AI, things go hairy.
ymmv.
You gotta have a better argument than "AI Labs are eating their own dogfood". Are there any other big software companies doing that successfully? I bet yes, and think those stories carry more weight.
Have people always been this easy to market to?
I think the most I can say I've dove in was in the last week. I wrangled some resources to build myself a setup with a completely self-hosted and agentic workflow and used several open-weight models that people around me had specifically recommended, and I had a work project that was self-contained and small enough to work from scratch. There were a few moving pieces but the models gave me what looked like a working solution within a few iterations, and I was duly impressed until I realized that it wasn't quite working as expected.
As I reviewed and iterated on it more with the agents, eventually this rube-goldberg machine started filling in gaps with print statements designed to trick me and sneaky block comments that mentioned that it was placeholder code not meant for production in oblique terms three lines into a boring description of what the output was supposed to be. This should have been obvious, but even at this point four days in I was finding myself missing more things, not understanding the code because I wasn't writing it. This is basically the automation blindness I feared from proprietary workflows that could be changed or taken away at any time, but much faster than I had assumed, and the promise of being able to work through it at this higher level, this new way of working, seemed less and less plausible the more I iterated, even starting over with chunks of the problem in new contexts as many suggest didn't really help.
I had deadlines, so I gave up and spent about half of my weekend fixing this by hand, and found it incredibly satisfying when it worked, but all-in this took more time and effort and perhaps more importantly caused more stress than just writing it in the first place probably would have
My background is in ML research, and this makes it perhaps easier to predict the failure modes of these things (though surprisingly many don't seem to), but also makes me want to be optimistic, to believe this can work, but I also have done a lot of work as a software engineer and I think my intuition remains that doing precision knowledge work of any kind at scale with a generative model remains A Very Suspect Idea that comes more from the dreams of the wealthy executive class than a real grounding in what generative models are capable of and how they're best employed.
I do remain optimistic that LLMs will continue to find use cases that better fit a niche of state-of-the-art natural language processing that is nonetheless probabilistic in nature. Many such use cases exist. Taking human job descriptions and trying to pretend they can do them entirely seems like a poorly-thought-out one, and we've to my mind poured enough money and effort into it that I think we can say it at the very least needs radically new breakthroughs to stand a chance of working as (optimistically) advertised
I chuckled at this. This describes pretty much every large piece of software I've ever worked on. You don't need an LLM to create a giant piece of slop. To avoid it takes tons of planning, refinement, and diligence whether it's LLM's or humans writing it.
There are many instances where I get to the final part of the feature and realize I spent far more time coercing AI to do the right thing than it would have taken me to do it myself.
It is also sometimes really enjoyable and sometimes a horrible experience. Programming prior to it could also be frustrating at times, but not in the same way. Maybe it is the expectation of increased efficiency that is now demanded in the face of AI tools.
I do think AI tools are consistently great for small POCs or where very standard simple patterns are used. Outside of that, it is a crapshoot or slot machine.
"Amazingly, I’m faster, more accurate, more creative, more productive, and more efficient than AI, when you price everything in, and not just code tokens per hour."
For 99.99% of developers this just won't be true.
Homelab is my hobby where I run Proxmox, Debian VM, DNS, K8s, etc, all managed via Ansible.
For what it is worth, I hate docker :)
I wanted to setup a private tracker torrent that should include:
1) Jackett: For the authentication
2) Radarr: The inhouse browser
3) qBitorrent: which receives the torrent files automatically from Radarr
4) Jellyfin: Of course :)
I used ChatGPT to assist me into getting the above done as simple as possible and all done via Ansible:
1) Ansible playbook to setup a Debian LXC Proxmox container
2) Jackett + Radarr + qBitorrent all in one for simplicity
3) Wireguard VPN + Proton VPN: If the VPN ever go down, the entire container network must stop (IPTables) so my home IP isn't leaked.
After 3 nights I got everything working and running 24/7, but it required a lot of review so it can be managed 10 years down the road instead of WTF is this???
There were silly mistakes that make you question "Why am I even using this tool??" but then I remember, Google and search engines are dead. It would have taken me weeks to get this done otherwise, AI tools speed that process by fetching the info I need so I can put them together.
I use AI purely to replace the broken state of search engines, even Brave and DuckDuckGo, I know what I am asking it, not just copy/paste and hope it works.
I have colleagues also into IT field whose the company where they work are fully AI, full access to their environment, they no longer do the thinking, they just press the button. These people are cooked, not just because of the state of AI, if they ever go look for another job, all they did for years was press a button!!
For the record, I use AI to generate code but not for "vibecoding". I don't believe when people tell me "you just prompt it badly". I saw enough to lose faith.
I also keep seeing that writing more detailed specs is the answer and retorts from those saying we’re back to waterfall.
That isn’t true. I think more of the iteration has moved to the spec. Writing the code is so quick now so can make spec changes you wouldn’t dare before.
You also need gates like tests and you need very regular commits.
I’m gradually moving towards more detailed specs in the form of use cases and scenarios along with solid tests and a constantly tuned agent file + guidelines.
Through this I’m slowly moving back to letting Claude lose on implementation knowing I can do scan of the git diffs versus dealing with a thousand ask before edits and slowing things down.
When this works you start to see the magic.
AI is far from perfect, but the same is true about any work you may have to entrust to another person. Shipping slop because someone never checked the code was literally something that happened several times at startups I have worked at - no AI necessary!
Vibecoding is an interesting dynamic for a lot of coders specifically because you can be good or bad at vibecoding - but the skill to determine your success isn't necessarily your coding knowledge but your management and delegation soft skills.
This is no different. And I'm not talking about vibe coding. I just mean having an llm browser window open.
When you're losing your abilities, it's easy to think you're getting smarter. You feel pretty smart when you're pasting that code
But you'll know when you start asking "do me that thingy again". You'll know from your own prompts. You'll know when you look at older code you wrote with fear and awe. That "coding" has shifted from an activity like weaving cloth to one more like watching YouTube.
Active coding vs passive coding
Or how I would start spamming SQL scripts and randomly at some point nuke all my work (happened more than once)... luckily at least I had backups regularly but... yeah.
I'm sorry but no, LLMs can't replace software engineers.
Relevant xkcd: https://xkcd.com/568/
Even if we reach the point where it's as good as a good senior dev. We will still have to explain what we want it to do.
That's how I find it most helpful too. I give it a task and work out the spec based on the bad assumptions it makes and manually fix it.
The result stunned everyone I work with. I would never in a million years put this code on Github for others. It's terrible code for a myriad reasons.
My lived experience was... the task was accomplished but not in a sustainable way over the course of perhaps 80 individual sessions with the longest being multiple solid 45 minute refactors...(codex-max)
About those. One of things I spotted fairly quickly was the tendency of models to duplicate effort or take convoluted approaches to patch in behaviors. To get around this, I would every so often take the entire codebase, send it to Gemini-3 Pro and ask it for improvements. Comically, every time, Gemini-3-Pro responds with "well this code is hot garbage, you need to refactor these 20 things". Meanwhile, I'm side-eying like.. dude you wrote this. Never fails to amuse me.
So, in the end, the project was delivered, was pretty cool, had 5x more features than I would have implemented myself and once I got into a groove -- I was able to reduce the garbage through constant refactors from large code reviews. Net Positive experience on a project that had zero commercial value and zero risk to customers.
But on the other hand...
I spend a week troubleshooting a subtle resource leak (C#) on a commercial project that was introduced during a vibe-coding session where a new animation system was added and somehow added a bug that caused a hard crash on re-entering a planet scene.
The bug caused an all-stop and a week of lost effort. Countless AI Agent sessions circularly trying to review and resolve it. Countless human hours of testing and banging heads against monitors.
In the end, on the maybe random 10th pass using Gemini-3-Pro it provided a hint that was enough to find the issue.
This was a monumental fail and if game studios are using LLMs, good god, the future of buggy mess releases is only going to get worse.
I would summarize this experience as lots of amazement and new feature velocity. A little too loose with commits (too much entanglement to easily unwind later) and ultimately a negative experience.
A classic Agentic AI experience. 50% Amazing, 50% WTF.
It requires refactoring at scale, but GenAI is fast so hitting the same code 25 times isn’t a dealbreaker.
Eventually the refactoring is targeted at smaller and smaller bits until the entire project is in excellent shape.
I’m still working on Sharpee, an interactive fiction authoring platform, but it’s fairly well-baked at this point and 99% coded by Claude and 100% managed by me.
Sharpee is a complex system and a lot of the inner-workings (stdlib) were like coats of paint. It didn’t shine until it was refactored at least a dozen times.
It has over a thousand unit tests, which I’ve read through and refactored by hand in some cases.
The results speak for themselves.
https://sharpee.net/ https://github.com/chicagodave/sharpee/
It’s still in beta, but not far from release status.