Orchestrating AI code review at scale (opens in new tab)

(blog.cloudflare.com)

118 pointspramodbiligiri3d ago47 comments

47 comments

>Code review is a fantastic mechanism for catching bugs and sharing knowledge

"Sharing knowledge" is one of the first phrases in the article, and highlighted as a key benefit of code review. But the loss to human-capital from this process is never examined in the post.

> Trivial reviews (typo fixes, small doc changes) cost 20 cents on average

They did around 25,000 of these runs (about 20% of total). So CF spent $5k in the period making language models run through PRs which were <10 lines long. I get that CF engineers are paid well, but the labour cost of having an intern/entry level engineer spend ~30-60s looking through these is likely close to $0.20, and that engineer builds some human-capital while they're at it.

2 more replies

Dumbledumb4h ago

This blog post is full of small inconsistencies that make it read like a low quality SEO piece.

> We also extract a shared context file (shared-mr-context.txt) from the coordinator's prompt and write it to disk. Sub-reviewers read this file instead of having the full MR context duplicated in each of their prompts. This was a deliberate decision, as duplicating even a moderately-sized MR context across seven concurrent reviewers would multiply our token costs by 7x.

No, it would not, because neither is the prompt of the subagent 100% of its token usage, nor will the "shared-mr-context.txt" which is then being read have a size of zero compared to the creation of this shared context.

> You don't need seven concurrent AI agents burning Opus-tier tokens to review a one-line typo fix in a README.

Yeah, well you wouldn't have anyways. Earlier in the post it says that Opus is "exclusively for the Review Coordinator".

2 more replies

joshuamoyers5h ago

we’ve been struggling with review throughput. this actually seems worthwhile to build at this point though i remain fairly skeptical of workflows that are agent-only, at a point it seems like the only practical solution.

we are finding lots of value in self review. its the “imagine you are doing a synchronous paired review with someone - anything that is difficult to explain, has a code smell, doesnt fit the architecture of the system around you, write a comment.” then at the end, agents do a good job of looping over PR comments.

the second thing would be a guided, educational code review tool - there are a few attempts at this, but nothing that has a good enough interface to actually stick. organize hunks by semantic importance, spend some tokens exploring the surrounding systems, showing how new code, public apis and data model flow with the existing design, and allow a human to traverse larger PRs more quickly.

thank you to cloudflare for publishing this.

1 more reply

thih99h ago

> Today, when an engineer at Cloudflare opens a merge request, it gets an initial pass from a coordinated smörgåsbord of AI agents.

I’d prefer to have that happen as some sort of pre commit hook, before a merge request is sent. The feedback loop might be a bit faster and the process might produce less noise this way.

7 more replies

rzmmm10h ago

> The entire system also runs locally.

I think approaches like this don't need to run other than locally. Maybe integrated as pre-push hook. The system is nondeterministic, so it's at odds with the purpose of CI.

4 more replies

plmpsu10h ago

I built a more naive version for our team using Copilot and GitHub actions and it works quite well (wish I had metrics too). The team loves it.

The ROI here is so high that I don't mind using the strongest model available for the actual code review. I don't trust Sonnet and such. Just let Opus or GPT 5.5 do the whole thing and pay a bit more for less complexity.

3 more replies

suika6h ago

As a solo dev or rather nowadays more so only a decision maker / agent overseer, I came to enjoy letting my agents develop against a Gerrit repository / workflow. Dev agent pushes a CL, review agent picks it up (not just the diff, but the full repo), runs tests/reviews/review-subagents and concludes by posting a review as well as a vote. This goes back and forth with new patch sets / replies to the threads. Eventually the CL gets a +2 or whatever and I have the final call to manually submit it. It is way slower compared to just pushing through development with one agent doing everything yolo against a normal repository, but it seems to me that the additional time is well spent (no, I don't have fancy graphs or similar analysis to prove this other than my gut feeling after looking at recent development results).

jmakov8h ago

Every iteration something can be found. How many times do you iterate e.g. on performance - use optimized struct, oh, you can change the architecture etc.? At that point one can just have a while loop for the agents to make changes until no comments left.

bob10299h ago

> One of the operational headaches we didn’t predict was that large, advanced models like Claude Opus 4.7 or GPT-5.4 can sometimes spend quite a while thinking through a problem, and to our users this can make it look exactly like a hung job.

I had the same problem in my recursive agent harness. It would always come back, but it could sometimes take up to 10 minutes depending. I fixed this by adding a required "purpose" argument to every tool and call/return event. As the recursive evaluation proceeds, every single thing that happens streams incremental purpose text to the user's browser (also using the magic of JSONL for this). The incremental progress events contain the purpose and a detail section (tool arg JSON) that the user can expand/collapse.

1 more reply

34qa1231h ago

The suits have taken over Cloudflare. All buzzwords are on the bingo card: Using Bun, modeling agent roles after management, graphs, you name it.

They apparently think they need to cash in on AI by serving models and at the same time blocking scrapers. So they need to fuel the hype by pretending to use it.

This shows how the US economy is fundamentally broken: companies that provide a useful service (in theory, if you discount SSL MITM and turnstile gatekeeping) struggle, quasi-religious scams like OpenAI and Anthropic get funded by mentally ill Boomers and Gen-Xers.

merrvk3h ago

PR reviews were never the bottleneck

2 more replies

etothet9h ago

What’s the over/under on when Cloudflare will acquire OpenCode (and keep it open source)?

faangguyindia8h ago

what's best workflow for solo devs?

2 more replies

j / k navigate · click thread line to collapse

47 comments

OtherShrezzing5h ago

>Code review is a fantastic mechanism for catching bugs and sharing knowledge

"Sharing knowledge" is one of the first phrases in the article, and highlighted as a key benefit of code review. But the loss to human-capital from this process is never examined in the post.

> Trivial reviews (typo fixes, small doc changes) cost 20 cents on average

2 more replies

Dumbledumb4h ago

This blog post is full of small inconsistencies that make it read like a low quality SEO piece.

> You don't need seven concurrent AI agents burning Opus-tier tokens to review a one-line typo fix in a README.

Yeah, well you wouldn't have anyways. Earlier in the post it says that Opus is "exclusively for the Review Coordinator".

2 more replies

joshuamoyers5h ago

thank you to cloudflare for publishing this.

1 more reply

thih99h ago

> Today, when an engineer at Cloudflare opens a merge request, it gets an initial pass from a coordinated smörgåsbord of AI agents.

I’d prefer to have that happen as some sort of pre commit hook, before a merge request is sent. The feedback loop might be a bit faster and the process might produce less noise this way.

7 more replies

rzmmm10h ago

> The entire system also runs locally.

I think approaches like this don't need to run other than locally. Maybe integrated as pre-push hook. The system is nondeterministic, so it's at odds with the purpose of CI.

4 more replies

plmpsu10h ago

I built a more naive version for our team using Copilot and GitHub actions and it works quite well (wish I had metrics too). The team loves it.

3 more replies

suika6h ago

jmakov8h ago

bob10299h ago

1 more reply

34qa1231h ago

The suits have taken over Cloudflare. All buzzwords are on the bingo card: Using Bun, modeling agent roles after management, graphs, you name it.

They apparently think they need to cash in on AI by serving models and at the same time blocking scrapers. So they need to fuel the hype by pretending to use it.

merrvk3h ago

PR reviews were never the bottleneck

2 more replies

etothet9h ago

What’s the over/under on when Cloudflare will acquire OpenCode (and keep it open source)?

faangguyindia8h ago

what's best workflow for solo devs?

2 more replies

j / k navigate · click thread line to collapse