Actually you can. If you shift the reviews far to the left, and call them code design sessions instead, and you raise problems on dailys, and you pair programme through the gnarly bits, then 90% of what people think a review should find goes away. The expectation that you'll discover bugs and architecture and design problems doesn't exist if you've already agreed with the team what you're going to build. The remain 10% of things like var naming, whitespace, and patterns can be checked with a linter instead of a person. If you can get the team to that level you can stop doing code reviews.
You also need to build a team that you can trust to write the code you agreed you'd write, but if your reviews are there to check someone has done their job well enough then you have bigger problems.
If you can agree what to build and how to build it and then it turns out that actually is a working plan - then you are better than me. That hasn't happened in 20 years of software development. Most of what's planned falls down within the first few hours of implementation.
Iterative architecture meetings will be necessary. But that falls into the pit of weekly meeting.
I cannot write a realistic non-hand-wavy design document without having a proof of concept working, because even if I try, I will need to convince myself that this part and this part and that part will work, and the only way to do it is to write an actual code, and then you pretty much have code ready, so why bother writing a design doc.
Some of my best (in terms of perf consequences) design documents were either completely trivial from the code complexity point of view, so that I did not actually need to write the code to see the system working, or were written after I already had a quick and dirty implementation working.
Doing this often settles the design direction in a stable way early on. More than that, it often reveals a lot of the harder questions you’ll need to answer: domain constraints and usage expectations.
Putting this kind of work upfront can save an enormous amount of time and energy by precluding implementation work on the wrong things, and ruling out problematic approaches for both the problem at hand as well as a project’s longer term goals.
> Most of what's planned falls down within the first few hours of implementation.
Not my experience at all. We know what computers are capable of.
> Most of what's planned falls down within the first few hours of implementation.
Planning is priceless. But plans are worthless.1. The longer I work in this industry, the more it becomes clear that CxO's aren't great at projecting/planning, and default to copy-cat, herd behaviors when uncertain.
Agents are getting really good, and if you're used to planning and designing up front you can get a ton of value from them. The main problem with them that I see today is people having that level of trust without giving the agent the context necessary to do a good job. Accepting a zero-shotted service to do something important into your production codebase is still a step too far, but it's an increasingly small step.
> You get things like the famous Toyota Production System where they eliminated the QA phase entirely.
> [This] approach to manufacturing didn’t have any magic bullets. Alas, you can’t just follow his ten-step process and immediately get higher quality engineering. The secret is, you have to get your engineers to engineer higher quality into the whole system, from top to bottom, repeatedly. Continuously.
> The basis of [this system] is trust. Trust among individuals that your boss Really Truly Actually wants to know about every defect, and wants you to stop the line when you find one. Trust among managers that executives were serious about quality. Trust among executives that individuals, given a system that can work and has the right incentives, will produce quality work and spot their own defects, and push the stop button when they need to push it.
> I think we’re going to be stuck with these systems pipeline problems for a long time. Review pipelines — layers of QA — don’t work. Instead, they make you slower while hiding root causes. Hiding causes makes them harder to fix.
hell in one sentence
I tell every hire new and old “Hey do your thing, we trust you. Btw we have your phone number. Thanks”
Works like a charm. People even go out of their way to write tests for things that are hard to verify manually. And they verify manually what’s hard to write tests for.
The other side of this is building safety nets. Takes ~10min to revert a bad deploy.
Does it? Reverting a bad deploy is not only about running the previous version.
Did you mess up data? Did you take actions on third party services that that need to be reverted? Did it have legal reprecursions?
That's cool. Expect to pay me for the availability outside work hours. And extra when I'm actually called
If we hired two programmers, the goal was to produce twice the LOC per week. Now we are doing far less than our weekly target. Does not meet expectation.
Code review benefits from someone coming in fresh, making assumptions and challenging those by looking at the code and documentation. With pair programming, you both take the same logical paths to the end result and I've seen this lead to missing things.
You conveniently brushed this under the rug of pair programming but of the handful of companies I've worked at, only one tried it and just as an experiment which in the end failed because no one really wanted to work that way.
I think this "don't review" attitude is dangerous and only acceptable for hobby projects.
https://blog.barrack.ai/amazon-ai-agents-deleting-production...
This is like, there's not going to be surprise on the road you'll take if you already set the destination point. Though most of the time, you are just given a vague description of the kind of place you want to reach, not a precise point targeted. And you are not necessarily starting with a map, not even an outdated one. Also geological forces reshape the landscape at least as fast as you are able to move.
1. I don't care because the company at large fails to value quality engineering.
2. 90% of PR comments are arguments about variable names.
3. The other 10% are mistakes that have very limited blast radius.
It's just that, unless my coworker is a complete moron, then most likely whatever they came up with is at least in acceptable state, in which case there's no point delaying the project.
Regarding knowledge share, it's complete fiction. Unless you actually make changes to some code, there's zero chance you'll understand how it works.
I regularly review code that is way more complicated that it should.
The last few days I was going back and forth on reviews on a function that had originally cyclomatic complexity of 23. Eventually I got it down to 8, but I had to call him into a pair programming session and show him how the complexity could be reduced.
My trust in my colleagues is gone, I have no reason to believe they wrote the code they asked me to put my approval on, and so I certainly don’t want to be on a postmortem being asked why I approved the change.
Perhaps if I worked in a different industry I would feel like you do, but payments is a scary place to cause downtime.
This sort of comment is meaningless noise that people add to PRs to pad their management-facing code review stats. If this is going on in your shop, your senior engineers have failed to set a suitable engineering culture.
If you are one of the seniors, schedule a one-on-one with your manager, and tell them in no uncertain terms that code review stats are off-limits for performance reviews, because it's causing perverse incentives that fuck up the workflow.
I have been involved in enough code reviews both in a corporate environment and in open source projects to know this is an outlier. When code review is done well, both the author and reviewer learn from the experience.
I think that's why startups have such an edge over big companies. They can just build and iterate while the big company gets caught up in month-long review processes.
IMHO / IME (over 20y in dev) reviewing PRs still has value as a sanity check and a guard against (slippery slope) hasty changes that might not have received all of the prior checks you mentioned. A bit of well-justified friction w/ ROI, along the lines of "slow is smooth, smooth is fast".
Perhaps kind of a pain to inject fixes in, have to rebase the outstanding work. But I kind of like this idea of the org having responsibility to do what review it wants, without making every person have to coral all the cats to get all the check marks. Make it the org's challenge instead.
and it also works for me when working with ai. that produces much better results, too, when I first so a design session really discussing what to build. then a planning session, in which steps to build it ("reviewability" world wonder). and then the instruction to stop when things get gnarly and work with the hooman.
does anyone here have a good system prompt for that self observance "I might be stuck, I'm kinda sorta looping. let's talk with hooman!"?
But. The design contract needs review, which takes time.
to move to the hyperspeed timescale you need reliable models of verification in the digital realm, fully accessible by AI.
The question is really "Will up-front design and pair programming cost more than not doing up-front design and pair programming?".
In my experience, somewhat counter-intuitively, alignment and pairing is cheaper because you get to the right answer a bit 'slower' but without needing the time spent reworking things. If rework is doubling the time it takes to deliver something (which is not an extreme example, and in some orgs would be incredibly conservative) then spending 1.5 times the estimate putting in good design and pair programming time is still waaaay cheaper.
Welcome to working with real people. They go off the rails and ignore everything you’ve agreed to during design because they get lazy or feel schedule pressure and cut corners all the time.
Sideline: I feel like AI obeys the spec better than engineers sometimes sigh.
Consequently, people tend to become invested in reviewing work only once it’s blocking their work. Usually, that’s work that they need to do in the future that depends on your changes. However, that can also be work they’re doing concurrently that now has a bunch of merge conflicts because your change landed first. The latter reviewers, unfortunately, won’t have an opinion until it’s too late.
Fortunately, code is fairly malleable. These “reviewers” can submit their own changes. If your process has a bias towards merging sooner, you may merge suboptimal changes. However, it will converge on a better solution more quickly than if your changes live in a vacuum for months on a feature branch passing through the gauntlet of a Byzantine review and CI process.
If you're aiming for a higher level, you also need to review work. If you're leading a team or above (or want to be), I assume you'll be doing a lot of reviewing of code, design docs, etc. If you're judged on the effectiveness of the team, reviews are maybe not an explicit part of some ladder doc, but they're going to be part of boosting that effectiveness.
I agree with him anyway: if every dev felt comfortable hitting a stop button to fix a bug then reviewing might not be needed.
The reality is that any individual dev will get dinged for not meeting a release objective.
Now I work at a company where reviews take minutes. We have 5 lines of technical debt per 3 lines of code written. We spend months to work on complicated bugs that have made it to production.
Everyone was very highly paid, managers measured everything (including code review turnaround), and they frequently fired bottom performers. So, tradeoffs.
At some moment I realized that reviews are holding things back most of all. I started to jump to review my team's code ASAP. I started to encourage others to go review things ASAP. It works even in relatively large companies, as long as your team has a reasonable size.
This can be learned, taught, and instilled.
We had a "support rota", i.e. one day a week you'd be essentially excused from doing product delivery.
Instead, you were the dev to deal with big triage, any code reviews, questions about the product, etc.
Any spare time was spent looking for bugs in the backlog to further investigate / squash.
Then when you were done with your support day you were back to sprint work.
This meant there was no ambiguity of who to ask for code review, and limited / eliminated siloing of skills since everyone had to be able to review anyone else's work.
That obviously doesn't scale to large teams, but it worked wonders for a small team.
The code quality was much better than in my current workplace where the reviews are done in minutes, although the software was also orders of magnitude more complex.
A review must be useful and the time spent on reviewing, re-editing, and re-reviewing must improve the quality enough to warrant the time spent on it. Even long and strict reviews are worth it if they actually produce near bugless code.
In reality, that's rarely the case. Too often, reviewing gets down into the rabbithole of various minutiae and the time spent to gain the mutual compromise between what the programmer wants to ship and the reviewer can agree to pass is not worth the effort. The time would be better spent on something else if the process doesn't yield substantiable quality. Iterating a review over and over and over to hone it into one interpretation of perfection will only bump the change into the next 10x bracket in the wallclock timeline mentioned in this article.
In the adage of "first make it work, then make it correct, and then make it fast" a review only needs to require that the change reaches the first step or, in other words, to prevent breaking something or the development going into an obviously wrong direction straight from the start. If the change works, maybe with caveats but still works, then all is generally fine enough that the change can be improved in follow-up commits. For this, the review doesn't need to be thorough details: a few comments to point the change into the right direction is often enough. That kind of reviews are very efficient use of time.
Overall, in most cases a review should be a very short part of the development process. Most of the time should be spent programming and not in review churn. A review serves as a quick check-point that things are still going the right way but it shouldn't dictate the exact path that should be used in order to get there.
Amen brother
If I complete a bugfix every 30 minutes, and submit them all for review, then I really don't care whether the review completes 5 hours later. By that time I have fixed 10 more bugs!
Sure, getting review feedback 5 hours later will force me to context switch back to 10 bugs ago and try to remember what that was about, and that might mean spending a few more minutes than necessary. But that time was going to be spent _anyway_ on that bug, even if the review had happened instantly.
The key to keeping speed up in slow async communication is just working on N things at the same time.
The value of your bug fix is cashed out only when it reaches the customer, not when you have finished implementing it.
There is a cost of delay for value to reach the customer, and we want that delay to be as short as possible.
So it doesn't matter if you fix 10 bugs because your 10th bug is going to reach production 5x10 hours (that's an exageration, but you get the point) after you had fixed it (which is why the article mentions latency and not touch time)
You can tell me "yes but I also participate in the code review effort in parallel). Yes, but then you are not fixing 10 bugs, you are fixing less and reviewing more, and reviews take longer than implementation (especially with LLMs now in the loop).
It's because of the pretty counter-intuitive Little's Law : the more in-progress you have in parallel, the slower it will get for each item to be completed.
Although you'll have to mentally replace the word "agent" with "PR" for it to make sense in this context. The math is the same. It all boils down to how much those context switches costs you. If it's a large cost, then you can get a huge productivity boost by increasing review speed.
In the "show calculations" section, the amount of wasted time caused by context switching is the delta between the numbers in the phrase "T_r adjusted from 30.0 to 35 minutes". That number is increases as context switching cost and "average agent time" (AKA "average PR review time") goes up.
Making entire classes of issues effectively impossible is definitely the ideal outcome. But, this feels much more complicated when you consider that trust doesn't always extend beyond the company's wall and you cannot always ignore that fact because the negative outcomes can be external to the company.
What if I, a trusted engineer, run `npm update` at the wrong time and malware makes its way into production and user data is stolen? A mistake to learn from, for sure, but a post-mortem is too late for those users.
I'm certainly not advocating for relying on human checks everywhere, but reasoning about where you crank the trust knob can get very complicated or costly. Occasionally a trustworthy human reviewer can be part of a very reasonable control.
The linked page in the thread is short and quite enlightening, but here is the relevant passage:
> Rule 9: Human operators have dual roles: as producers & as defenders against failure.
> The system practitioners operate the system in order to produce its desired product and also work to forestall accidents. This dynamic quality of system operation, the balancing of demands for production against the possibility of incipient failure is unavoidable. Outsiders rarely acknowledge the duality of this role. In non-accident filled times, the production role is emphasized. After accidents, the defense against failure role is emphasized. At either time, the outsider’s view misapprehends the operator’s constant, simultaneous engagement with both roles.
[1] https://news.ycombinator.com/item?id=32895812I've always liked Tailscale as a product and now I might be a fan of their CEO too. Who knew?
I'll be sharing this post widely. Avery - if you're on here, thanks for writing this!
But for anything else, you just need an individual (not a team) who's okay (not great) at multiple things (architecting, coding, communicating, keeping costs down, testing their stuff). Let them build and operate something from start to finish without reviewing. Judge it by how well their produce works.
(And in a pyramid of queues like many layers of reviews, each layer will wind up being about equally loaded, because otherwise you would get a big benefit from adding/removing capacity, so each layer will slowly be optimized towards its breaking point, yielding the 10x everywhere.)
The handover to a peer for review is a falsehood. PRs were designed for open source projects to gate keep public contributors.
Teams should be doing trunk-based development, group/mob programming and one piece flow.
Speed is only one measure and AI is pushing this further to an extreme with the volume of change and more code.
The quality aspect is missing here.
Speed without quality is a fallacy and it will haunt us.
Don’t focus on speed alone, and the need to always be busy and picking up the next item - focus on quality and throughput keeping work in progress to a minimum (1). Deliver meaningful reasoned changed as a team, together.
The best balance of individual initiative-velocity vs. peer review culture I've seen was at Facebook.
Needing full human attention on a co.plex task from a pro who can only look at your thing has a wait time. It is worse when there are only 2 or 3 such people in the world you can ask!
Not saying this is a good situation, but it's quite easy to run into it.
Most devs set aside some time at most twice a day for PRs. That's 5 hours at least.
Some PRs come in at the end of the day and will only get looked at the next day. That's more than 5 hours.
IME it's rare to see a PR get reviewed in under 5 hours.
If you work in a team of 5 people, and each one only reviews things twice a day, that's still less than 5 hours any way you slice it.
> you can’t overcome latency with brute force
Curious what rang true to you if not the main point?
The flip side of that, and why the software world is not a complex network of millions of tiny startups but in fact has quite a few companies where log(organization) >= 2, is that there are a lot of tasks that are just larger than a startup, and the log of the minimum size organization that can do the job becomes 2 or 3 or 4.
There is certainly at least the possibility that AI can really enhance those startups even faster, but it also means that they'll get to the point that they need more layers more quickly, too. Since AI can help much, much more with coding than it can with the other layers (not that it can't help, but at the moment I don't think there's anybody else in the world getting the advantages from AI that programmers are getting), it can also result in the amount of time that startups can stay in the log(organization)=1 range shrink.
(Pardon the sloppy "log(organization)" notation. It should not be taken too literally.)
If I can approve something without review, it’s instant. If it requires only immediate manager, it takes a day. Second level takes at least ten days. Third level trivially takes at least a quarter (at least two if approaching the end of the fiscal year). And the largest proposals I’ve pushed through at large companies, going up through the CEO, take over a year.
I hope we shift engineers closer to users than ever before. Get them to understand user's needs and the actual product more - they'll write better plans and prompts. Review the plans.
Code review becomes less of a thing when the team's on the same page, so regularly align on what the goals are.
Accept post-merge code reviews. Things slip, normalise coming back and saying "actually, we should have done this differently". It's not a bad thing, you're learning!
I prefer to review plan (this is more to flush out my assumptions about where something fits in the codebase and verify I communicated my intent correctly).
I'll loosely monitor the process if it's a longer one - then I review the artifacts. This way I can be doing 2/3 things in parallel, using other agents or doing meetings/prod investigation/making coffee/etc.
Whenever we have to talk/write about our work, it slows things down. Code reviews, design reviews, status updates, etc. all impact progress.
In many cases, they are vital, and can’t be eliminated, but they can be streamlined. People get really hung up on tools and development dogma, but I've found that there’s no substitute for having experienced, trained, invested, technically-competent people involved. The more they already know, the less we have to communicate.
That’s a big reason that I have for preferring small meetings. I think limiting participants to direct technical members, is really important. I also don’t like regularly-scheduled meetings (like standups). Every meeting should be ad hoc, in my opinion.
Of course, I spent a majority of my career, at a Japanese company, where meetings are a currency, so fewer meetings is sort of my Shangri-La.
I’m currently working on a rewrite of an app that I originally worked on, for nearly four years. It’s been out for two years, and has been fairly successful. During that time, we have done a lot of incremental improvements. It’s time for a 2.0 rewrite.
I’ve been working on it for a couple of months, with LLM assistance, and the speed has been astounding. I’m probably halfway through it, already. But I have also been working primarily alone, on the backend and model. The design and requirements are stable and well-established. I know pretty much exactly what needs to be done. Much of my time is spent testing LLM output, and prompting rework. I’m the “review slowdown,” but the results would be disastrous, if I didn’t do it.
It’s a very modular design, with loosely-coupled, well-tested and documented components, allowing me to concentrate on the “sharp end.” I’ve worked this way for decades, and it’s a proven technique.
Once I start working on the GUI, I guarantee that the brakes will start smoking. All because of the need for non-technical stakeholder team involvement. They have to be involved, and their involvement will make a huge difference (like a Graphic UX Designer), but it will still slow things down. I have developed ways to streamline the process, though, like using TestFlight, way earlier than most teams.
"...a Pull Request is a delivery. It's like UPS standing at your door with a package. You think, "Nice, the feature, bugfix, etc has arrived! And because it's a delivery, it's also an inspection. A Code Review. Like a freight delivery with a manifest and signoff. So you have to be able to conduct the inspection: to understand what you're receiving and evaluate if it's acceptable as-is. Like signing for a package, once you approve, the code is yours and your team's to keep."
The metaphor has limits. IRL I sign immediately and resolve issues post-hoc with customer service. The UPS guy is not going to stand on my porch while I check if there's actually a bootable MacBook in the box. The vast majority of the time, there's no issue. If that were the same with code, teams could adopt a similar "trust now and defer verification" approach.
The article has a section on Modularity but never defines it. I wrote a post a few weeks ago on modularity and LLMs which does provide a definition. [1].
[1] https://www.slater.dev/2026/02/relieve-your-context-anxiety-...
> Get it code reviewed by the peer next to you 300 minutes → 5 hours → half a day
Is it takes 5 hours for a peer to review a simple bugfix your operation is dysfunctional.
We talked a lot about the costs of context switches so its reasonable to finish your work before switching to the review.
So, 1 hour? Sure. Two hours? Ok. But five hours means you only look at your teammates code once a day.
It's ok for a process where you work on something for a week and then come back for reviews but then it's silly to complain about overhead.
To what degree do we expect intellectual peerage from someone just glancing into this problem because of a PR? I would expect that to be the proper intellectual peer of someone studying the problem, it's quite reasonable to basically double your efforts.
Having somebody else devote enough time to being up to speed enough to do code review on an area is also an investment in resilience so the team isn't suddenly in huge difficulty if the lone expert in that area leaves. It's still a problem, but at least you have one other person who's been looking at the code and talking about it with the now-departed expert, instead of nobody.
Generally if the reviewer is not familiar with the content asynchronous line by line reviews are of limited value.
https://capocasa.dev/the-golden-age-of-those-who-can-pull-it...
See recent Amazon outages caused by vibe/slop/movefast coding practices with little review.
You don't need so much code or maintenance work if you get better requirements upfront. I'd much rather implement things at the last minute knowing what I'm doing than cave in to the usual incompetent middle manager demands of "starting now to show progress". There's your actual problem.
Instead everyone wants perfect foresight, but systems are full of surprises you only find by building and the cost of pushing uncertainty into docs is that the docs rot because nobody updates them. Most "progress theater" starts as CYA for management but hardens into process once the org is too scared to change anything after the owners move on.
In software it's the opposite, in my experience.
> You don't need so much code or maintenance work if you get better requirements upfront.
Sure, and if you could wave a magic wand and get rid of all your bugs that would cut down on maintenance work too. But in the real world, with the requirements we get, what do we do?
That's been my experience as well: ten hours of doing will definitely save you an hour of planning.
If you aren't getting requirements from elsewhere, at least document the set of requirements you think you're working towards, and post them for review. You sometimes get new useful requirements very fast if you post "wrong" ones.
* Maybe you don't have privileges to delete the database
* Maybe your CI environments are actually high fidelity, and will fail when there is no DB
* Maybe destructive actions require further review
* Maybe your service isn't exposed to the public internet, and exposing to 0.0.0.0/0 isn't a problem.
* Maybe we engineer our systems to have trivial instant undo, and deleting a DB triggers an undo
Our tooling is kind of crappy. There's a lot we can do.
You allow self review and optional external review of code but the default is that the engineer can ship to production without a review block.
Then you either do post merge review: have a review column in jira or whatever where people are assigned to review and can complete on their own schedule in a non disruptive and non blocking way. This also avoids piling reviews onto whoever has the best rubber stamp.
Or
You switch to a quarterly system review meeting where you as a group go over and suggest improvements to make in the codebase holistically.
I've seen both of these work extremely well and with AI you can basically automate the review process to the point it's pretty much pointless having a human review step.
The only other way to avoid the issue of people sitting on thumbs waiting for review is to have everyone prioritize reviewing code ahead of producing new code. This works but is incredibly disruptive to the reviewer and has side effects like review bombing on the largest rubber stamp reviewer.
I’ll cover one of them: layers of management or bureaucracy does not reduce risk. It creates in-action, which gives the appearance of reducing risk, until some startup comes and gobbles up your lunch. Upper management knows it’s all bullshit and the game theoretic play is to say no to things, because you’re not held accountable if you say no, so they say no and milk the money printer until the company stagnates and dies. Then they repeat at another company (usually with a new title and promotion).
Solution: Feed this paper to the llm and ask it to solve your problem. Then contact me with your experience. XD
So we will need to extract the decision making responsibility from people management and let the Decision maker be exclusively focused on reviewing inputs, approving or rejecting. Under an SLA.
My hypothesis is that the future of work in tech will be a series of these input/output queue reviewers. It's going to be really boring I think. Probably like how it's boring being a factory robot monitor.
With AI my task to review is to see high level design choices and forget reviewing low level details. It’s much simpler.
This seems to check out, and it's the reason why I can't reconcile the claims of the industry about workers replacement with reality. I still wonder when a reckoning will come, though. seems long overdue in the current environment
Never. Until 1-10 person teams starts disrupt enterprises (legacy banks, payments systems, consultancies).
“Why” would you ask? Because it’s a house of cards. If engineers get redundant, then we don’t need teams. If we don’t need teams, then we don’t need team leads/PMs/POs and others, if we don’t need middle management, then we don’t need VPs and others. All of those layers will eventually catch up to what’s going on and kill any productivity gains via bureaucracy.
The worst places I’ve worked have a pattern where someone senior drives a major change without any oversight, review or understanding causing multiple ongoing issues. This problem then gets dumped onto more junior colleagues, at which point it becomes harder and more time consuming to fix (“technical debt”). The senior role then boasts about their successful agile delivery to their superiors who don’t have visibility of the issues, much to the eye-rolls of all the people dealing with the constant problems.
That's me. I'm the mad reviewer. Each time I ranted against AI on this site, it was after reviewing sloppy code.
Yes, Claude Opus is better on average than my juniors/new hires. But it will do the same mistakes twice. I _need_ you to fucking review your own generated code and catch the obvious issues before you submit it to me. Please.
can't believe I was baited into reading this slop
/jk
good post actually, and a fair point
I do think many people will argue that you can just not review things though.
See this rarely known trick! You can be up to 9x more efficient if you code something else when you wait for review
> AI
projectile vomits
Fuck engineering, let's work on methods to make artificial retard be more efficient!
Context switch alone would kill any productivity gains from this. And I’m not even touching on conflicting MRs and interdependencies yet.
1. Whoa, I produced this prototype so fast! I have super powers!
2. This prototype is getting buggy. I’ll tell the AI to fix the bugs.
3. Hmm, every change now causes as many new bugs as it fixes.
4. Aha! But if I have an AI agent also review the code, it can find its own bugs!
5. Wait, why am I personally passing data back and forth between agents
6. I need an agent framework
7. I can have my agent write an agent framework!
8. Return to step 1
the author seems to imply this is recursive when it isn't. when you have an effective agent framework you can ship more high quality code quickly.
You expect your calculator to always give correct answers, your bank to always transfer your money correctly, and so on.
Worst case in a modern agentic scenario is more like "drained your bank account to buy bitcoin and then deleted your harddrive along with the private key"
> Pre-LLMs correct output was table stakes
We're only just getting to the point where we have languages and tooling that can reliably prevent segfaults. Correctness isn't even on the table, outside of a few (mostly academic) contexts
So in that extra time, you can now stack more PRs that still have a 30 hour review time and have more overall throughput (good lord, we better get used to doing more code review)
This doesn’t work if you spend 3 minutes prompting and 27 minutes cleaning up code that would have taken 30 minutes to write anyway, as the article details, but that’s a different failure case imo
Hang on, you think that a queue that drains at a rate of $X/hour can be filled at a rate of 10x$X/hour?
No, it cannot: it doesn't matter how fast you fill a queue if the queue has a constant drain rate, sooner or later you are going to hit the bounds of the queue or the items taken off the queue are too stale to matter.
In this case, filling a queue at a rate of 20 items per hour (every 3 minutes) while it drains at a rate of 1 item every 5 hours means that after a single day, you can expect your last PR to be reviewed in ((8x20) - 1) hours.
IOW, after a single day the time-to-review is 159 hours. Your PRs after the second day is going to take +300 hours.
There are some strategies that help: a lot of the AI directives need to go towards making the code actually easy to review. A lot of it it sits around clarity, granularity (code should be committed primarily in reviewable chunks - units of work that make sense for review) rather than whatever you would have done previously when code production was the bottleneck. Similarly, AI use needs to be weighted not just more towards tests, but towards tests that concretely and clearly answer questions that come up in review (what happens on this boundary condition? or if that variable is null? etc). Finally, changes need to be stratified along lines of risk rather than code modularity or other dimensions. That is, if a change is evidently risk free (in the sense of, "even if this IS broken it doesn't matter) it should be able to be rapidly approved / merged. Only things where it actually matters if it wrong should be blocked.
I have a feeling there are whole areas of software engineering where best practices are just operating on inertia and need to be reformulated now that the underlying cost dynamics have fundamentally shifted.
I think GP is thinking in terms of being incentivized by their environment to demonstrate an image of high personal throughput.
In a dysfunctional organization one is forced to overpromise and underdeliver, which the AI facilitates.
Generally if your job is acting as an expensive frontend for senior engineers to interact with claude code, well, speaking as a senior engineer I'd rather just use claude code directly.
We can use AI these days to add another layer.