I'll use AI to design the implementation of a medium sized, cross cutting feature. Review all the details, maybe iterate on just that. Then implement with Claude 4.7 Max - which runs slower, but does a better job. Then review the implementation, then have Codex GPT 5.5 xhigh fast review it - which almost always finds corner cases. Have Claude fix those - Claude is better at writing intuitive maintainable code versus Codex overengineered/shortcut filled code. (Codex is better at finding/fixing bugs and doing reviews - it's annoyingly pedantic)
Then repeat with fresh Claude/Codex instances having them both review the current staged changes and getting feedback, handling the feedback. Then covering it in tests. I mean overall I still implement the feature faster than coding it manually, but I spend a majority of the time going back and forth with reviews, handling corner cases and at the finish end up with what I feel a really solid implementation of whatever feature I'm working on. The v1 feature feels more like a v3 given the amount of iteration it already went through.
Once this is done, the mechanical coding parts are mostly routine (for codex)
I'm sure there's an interesting study on how users 'leak' their preference unintentionally to the LLM; perhaps when users list their options, they often put their prefered option first; but not showing the cards on my hand has been very useful when thinking through a problem with LLMs.
I find it useful to let it generate benchmarks comparing the approaches. Turns out AI is terrible at guessing whats faster or allocates less
the cynic in me would say that a good engineer should fully understand the code you write.
I'm not suggesting that AI is the problem here - you could vibe code with the AI have have it explain the reasoning and patterns - or else tell it to use 'simpler' patterns from the outset. For any one problem in software engineering, there are always multiple solutions; some slower, some faster, some more flexible etc. The code you produce should, imo, but at the level that you can understand it.
How can you reason about code you don't fully understand? How can you judge the future impact (technical debt and the cost of maintenance) of your projects?
A.I makes it easier to get yourself into problems early on.
We all do, though. It takes months for a human to really get to know a project and, unless you’re working at a small startup, you’ll probably never know most of the code outside the corner you work in.
Finally feel like I have a good workflow where I can fully benefit from these things without sacrificing my understanding of what they're doing.
What I've tried to do is make the bot write detailed spec documents, slowly building it over time as I explain the full problem.
It works for the most part but it's you have some non standard requirement, the agent seems to skip over that part of the spec document when it starts to code. Or it would have needless checks for situations that I said will never happen
I would also recommend explaining the specs and doing a lot of your back and forth with a lower end model and set it to a higher end model only once the conversation history has all the context you feel the higher end model needs.
You will outgrow it at some point.
[0] At least, in my experience, "micromanaging" the AI is what gives me the best results. Iterating on the initial design, then iterating on the plan, then reviewing the proposed code changes (including tests), then getting an independent code review from another LLM, etc. If you give an LLM too much latitude that's when the really shitty code and ill-considered breaking changes/obliteration of existing functionality starts to creep in.
I was more annoyed than anything that I didn't hit this moment until my 40s.
Except it's not just reddit (I quit reddit 15 years ago). It's the whole internet.
AI is an excellent rubber duck and test writer. Maybe I sniff my farts too much but I like my code just the way I want it lol
I have my own skill: 5 rounds of research/planning/test-planning. Interactive with me in loop for all important decisions. Starts with high level shape, then details. Planning can take 2-3 days of my time, then the implementation agent can take many hours (Opus 4.7). It splits the implementation across many phases/commits, each with its own code-review fix loop. Deep code review at the end can take another hour or two. It opens a PR, Gemini reviews, it reads out and resolves those issues.
Projects still take days or weeks, but 5x faster than doing it all myself.
Edit: the skill - https://github.com/scosman/vibe-crafting
Because this version of AI is worth 10 trillion dollars.
While the pragmatic versions from realists you can find all over this thread are ultimately probably less of a speed boost than just having your CEO/local micromanager be conveniently on vacation during critical periods when the work actually gets done.
i wonder how much the real version of AI is worth. I've got a hinch we're going to find out pretty soon.
As a result I've abandoned the idea of having LLMs generate code except for very small, localized and tightly scoped things. They really can't produce much more than a function or a small module without shitting the bed (last time I vibecoded was with Opus 4.6, Composer 2 and GPT-5.4). I use it almost entirely as another signal in analysis, which naturally makes it fit in better because all the other signals (reading the code, stepping through the code, writing the code myself) are already there so when the LLM points things out the information it actually renders can be taken in much more easily (and seen through more easily when it's false or irrelevant).
I think it's neat that people find fun ways to develop, but I think dressing up vibecoding in a fancy dress and layering SpecLang, sometimes in multiple steps, on top of it, is an exercise in trying to use the tool more instead of trying to use it in its most useful capacity.
This has been my experience every time I've suggested that there are any sort of inherent ontological/conceptual or computational limits to the sophistication of LLM mimicry.
IMO if you are not shipping out faster then the faster work gains are meaningless.
If you are shipping faster, you’re probably picking up more work and shipping everything too fast leading to burnout.
1. Some bad idea gets embedded into the context that you just can't argue away
2. Some important idea gets lost in compression and the ai wheres off into funland without recourse.
In both cases if is often better to start over or just do it yourself. I sometimes find myself asking for a summary, editing it and then using the edited one to seed a new session.
Edit: s/Finland/funland/
You say “all that time” babysitting AIs but in my experience it isn’t that much time, if anything the back and forth at the planning stages is more productive than when I’m doing it by myself because I’m being asked questions and having to think things through from different angles.
Define 'aware'. The volume of code for a feature/system to make it worth using a more complex workflow such as this one, is definitely larger than what a human can even briefly review and build a mental model about the inner workings within a reasonable amount of time. Reasonable meaning not considerable delaying the process. When deadlines loom and management adds pressure, this 'awareness' is the first thing that goes out the window.
Maybe it’s just me, but I’ve never understood how one understands from reading code. Yes you can understand what that code does, but not why it was done that way instead of a different way. In the end I only understand it deeply if I end up writing it. Chatting through it is helpful to me, but having AI crank out code loses all of that context pretty quickly.
I’m not disagreeing. Just curious how you think about this, and if there are key parts of your process that help you stay contexted in.
I do wonder how much of how people approach coding is shaped by the games they played when younger.
Keeping that many tasks in parallel, running all the time will kill you.
The majority of jobs are still paid on a 40 hour per week basis. Disappearing for a day each week (20%) won't fly when you're full time.
Now if it’s my job then I can’t have a knowledge debt and if Claude is down I’ll continue working manually because I know and understand and can continue without having to understand a lot of logic before continuing
then demand some lack-of-uptime compensation for a lack of uptime
I pretty significant number of their engineers flat out refused to work. Like publicly said so. "Increase our plan or I'm taking the week off."
Style can be as important as substance.
I still do a lot of back and forth about the plan - have it written to a file. Read through the file, make changes by hand and have claude read my changes and on and on. But starting with the basic architecture there's less ambiguity.
Ingest big project, comment on it gets expensive. I'm not sure how expensive.
Lately I’ve been experimenting with adding an explicit reward function so the models optimize for measurable output quality.
This creates a generate, critique, revise loop where candidate answers compete for a higher score. It feels promising because it reduces the amount of handholding for every task. It is also more fun because part of the review process is embedded in the scoring function, which simplifies the review effort.
This stuff works so much better when you just tell it what to do
So, people do know how to design a feature, but they also know it takes a lot of time and effort. They want AI to do that work for them.
- aimless AI wandering, leading to pretty, frankly, useless design docs
- using AI to "expand" upon a bullet pointed/shorthanded design doc. To which I feel like saying "the bullet points are already a good design doc!"
I understand that teams sometimes have specific formats that they have to make deliverables for, but having a nice 5 point bulletpoint list turn into 5 paragraphs... all for me to turn the 5 paragraphs back into 5 bullet points in my notes is depressing.
I do think you can get a lot of value in the mechanics, I just have had so much success leaving the thinking to me and the rote stuff to the AI. I'm going to have to think about the design eventually anyways right?
Often depending on how complex the feedback, I'll do it one at a time addressing each one individually. And after the feedback is addressed, I'll go back to the AI that generated the feedback and say like, "I handled 4/5 items you found, can you double check."
It's similar to handling PR feedback, where you do it, validate it, but then still have to submit it for peer review.
And maybe don't use tools that lock you into one model?
1. Have claude form the plan and converse with a simple "Note any concerns with this plan" type plan-critic agent.
2. Let it run.
3. After (with everything in context) have it make a future_recommendations.md.
4. Have it make a plan.md to implement those future recommendations, conversing with the plan critic..
5. Clear context. Repeat with 1. Do this loop a few times, with some feedback from actual review thrown in.
But, most importantly, because Claude will aggressively try to maintain code "as is", and happily build on it's previous crap, while preferring to hand roll implementations of everything, add something like this to memories/directives:
* When evaluating designs, default to "pull in the library" over "hand-roll it." Hand-rolling is much worse than a dependency.
* "Precedent" / "matches house style" / "reuses existing pattern" / "consistent with what we already do" are not valid engineering arguments.
* This project is still in the development stage with no real deployments. Mitigation costs and existing precedence are not a concern.
With these, in the last week that I've started using them (after inspecting the insane justifications for leaving crap design decisions in the plans), Claude went from junior level slop that required more oversight than it was worth to something very reasonable, using standard libraries, requiring nudges for architecture rather than pure "wtf!?".
I think they've fine tuned heavily towards "don't rewrite the codebase" tuning, which completely rational from multiple perspectives, but also not appropriate for new code.
I do enjoy a considerable daily token allowance, so this may not apply to everyone.
In my experience, even on a relatively trivial task, you can ask an LLM at least 20 times:
Is this actually done, or only partially implemented? Did you finish x, y, z?
And the LLM will say, no, I'm not done and keep working.
After that, I'll feed the branch to a different LLM, and ask if the implementation matched the design, where it's weak and needs improvements.
Same thing - that feedback will usually only be partially finished for several rounds.
When they all agree it's done - I'll finally look at the code, and there's still typically glaringly obvious problems - duplicate systems that reinvent the wheel, etc - that will take typically more than one prompt to get right...
Getting things right takes almost ~100x as long as getting things almost right with LLMs.
You can tell an LLM to "make me Rust, but easier. Make no mistakes," and it'll plan out a 100 commit process and get something that - somehow - sort of works... but isn't even close to complete.
Still, on a cost basis, you're still able to get features that would take yourself several times longer and cost orders of magnitude more money, and - if you're doing it right - they'll probably do a better job than you would've done (at least for me).
I’ve found it useful to write out a list of feedback / issues and have a bunch of sub agents work on them in worktrees with a loop bringing them all back together. That way it can work for a few hours while I just can review a bulk at a time.
We may need some sort of paradigm shift - like more powerful frameworks or even higher level languages that allow us to review less, but more functional code blocks.
Before AI, myself and everyone else I knew was drowning in tech debt. And now with AI we are treading water.
In my experience, software engineering is a matter of knowledge. Understanding it and then coming up with a solution. The latter is a flash of insight that comes mostly from experience. Then you gather more information to flesh it out, or brainstorm it with your colleagues.
What you're describing sounds more like a ritual of doing busy work than anything practical. Because tasks vary so much. A feature may be huge, but you take care of it in a day with copy pasting because you already have all the building blocks in other files. And something may be twenty lines of code, but you spent the whole week sweating on it (concurrency stuff maybe). Those ritualistic workflows sounds more like someone imagining software development than actually doing it.
Lost you in the last paragraph - features are not "copy pasting because you already have all the building blocks" and "something may be twenty lines of code". Mid sized features often mean tearing up many layers of code across the stack to add in some sort of new capability. Tearing up existing code means there are all sorts of add-on considerations in addition to feature you are working on.
What? No, it shouldn't. I've worked on a lot of codebases and if you have to do this, something is very, very wrong.
The mental model is still in my head, my brain is overloaded, but only from the amount of code reviews - like I said, I'm building v3 of a feature in the time it takes to build v1, but I am in a way doing 3x the code reviews going back and forth. That's the fall out of the iteration speed enabled by AI.
Between submitting PRs, getting feedback, iterating, re-submitting, repeat - there used to be breathing room. Now it's all compressed into an afternoon. Productivity is through the roof, but it can be draining.
If the feature isn't released, it's not a new version.
Maybe I'm too far gone down the AI rabbit hole, but that seems a really strange take to have. If you replaced 'back and forth with the AI' with 'pair programming' or 'brainstorming' this phrase would be really strange, after all these are all techniques to sharpen your ideas. Even 'rubber ducking' is widely accepted as an effective way to go through a problem, and you can definitely use AI as a rubber duck.
For me the idea of chatting with the AI about a problem/solution is just another tool to help us work. It's not the best solution because it has a lot of downsides you should be aware while using it, but that is true for any technique including 'writing the code yourself'.
I will say, it does help me get over procrastination lol. I get annoyed by the robot doing dumb shit and finish it myself.
Also I never multitask with multiple agents doing other stuff. Meh I focus on just the one task.
If only things were so! If only code was discussed, reviewed, iterated on! If only the "manager" actually read the code, provided actionable feedback, and disseminated PRs to multiple people with diverse skill sets.
(If you can't tell, I'm a jaded consultant desperately trying to make the horse drink the water.)
1. I write a list of things I want to have without AI support
2. I discuss the list with an LLM, which occasionally reveals obviously missing things I hadn't thought about or just things that would be smart to have. Or sometimes the LLM doesn't get it and wants to funnel me down a commonly walked path, which is a non-goal
3. From that list I draft an implementation plan containing things like how the code shall be structured, which language, libraries, build systems, etc to use. This may even contain some data models and considerations that are more detailed, like for example ideas about how a specific interaction shall be event sourced. I work on that, till I feel a satisfactory level of clarity has been reached
4. Actual writing of code as a back and forth between manual writing, letting an LLM write something and so on. LLMs suck at writing CSS that feels like good UX design to me, so usually templates, layout and CSS will be (re)written entirely by hand
5. Bug-hunting and guessing potential edge cases is one thing where LLMs really shine. Often if the work before that was quality the LLM has an okay time coming up with fixes that are no worse than what I would have done.
The In-Laws (1979): Getting off the plane in Tijuara: