This data is great, and it is exciting to see the rapid growth of autonomous coding agents across GitHub.
One thing to keep in mind regarding merge rates is that each of these products creates the PR at a different phase of the work. So just tracking PR create to PR merge tells a different story for each product.
In some cases, the work to iterate on the AI generated code (and potentially abandon it if not sufficiently good) is done in private, and only pushed to a GitHub PR once the user decides they are ready to share/merge. This is the case for Codex for example. The merge rates for product experiences like this will look good in the stats presented here, even if many AI generated code changes are being abandoned privately.
For other product experiences, the Draft PR is generated immediately when a task is assigned, and users can iterate on this “in the open” with the coding agent. This creates more transparency into both the success and failure cases (including logs of the agent sessions for both). This is the case for GitHub Copilot coding agent for example. We believe this “learning in the open” is valuable for individuals, teams, and the industry. But it does lead to the merge rates reported here appearing worse - even if logically they are the same as “task assignment to merged PR” success rates for other tools.
We’re looking forward to continuing to evolve the notion of Draft PR to be even more natural for these use cases. And to enabling all of these coding agents to benefit from open collaboration on GitHub.
Current US stance seems to be: https://www.copyright.gov/newsnet/2025/1060.html “It concludes that the outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements”.
If entire commit is generated by AI then it is obvious what created it - it’s AI. Such commit might not be covered by the law. Is this something your team has already analysed?
Now we have text which is legally not owned by anybody. Is it "public domain" though? It is not possible to verify it, so maybe it is but it still poses legal risks.
Whether it's committed or not is irrelevant to the conclusion there, the question is what was the input.
How would that work if it's a patch to a project with a copyleft license like GPL which requires all derivate work to be licensed the same?
https://open.spotify.com/episode/6o2Ik3w6c4x4DYILXwRSos?si=5...
This is not the case. The output of a compiler is 100% created by a compiler too. Copyright is based on where the creative aspect comes from.
I have had very little luck having 2025-era AIs manage the creative aspects of coding -- design, architecture, and similar -- and that's doubly true for what appears to be the relatively simplistic model in codex (as far as I can tell, codex trades off model complexity for model time; the model does a massive amount of work for a relatively small change).
However, it is much better than I am at the mechanical aspects. LLMs can fix mechanical bugs almost instantly (the sort of thing with a cut-and-paste fix in some build process from Stack Overflow), and generate massive amounts of code without typos or shallow bugs.
A good analogy is working with powertools versus handtools. I can do much more in one step, but I'm still in creative control.
The codebase I'm working on is pretty sophisticated, and I might imagine they could implement more cookiecutter things (e.g. a standard oauth workflow) more automatically.
However, even there -- or in discussions with larger models about my existing codebase -- what they do is in part based their creativity on human contributions to their training set. I'm not sure how to weigh that. An LLM oauth workflow might be considered the creative median of a lot of human-written code.
I write a lot of AGPL code, and at least in the 3.5 era, they were clearly trained on my code, and would happily print it out more-or-less verbatim. Indeed, it was to the point where I complained to OpenAI about it at the time, but never got a response. I suspect a lot of generated code will include some fractional contribution from me now (an infinitesimal fraction most of the time, but more substantial for niche code similar to my codebase).
So in generated code, we have a mixture of at least a few different pieces:
- User's contributions, in prompt, review, etc.
- Machine contributions
- Training set contributions
We are looking into paths where we can support this more personal/private kind of PR, which would provide the foundation within GitHub to support the best of both worlds here.
I just started using Codex casually a few days ago though and already have 3 PRs. While different tools for different purposes make sense, Codex's fully async nature is so much nicer. It does simple things like improve consistency and make small improvements quite well which is really nice. Finally we have something that operates more like an appliance for a certain classes of problems. Previously it felt more like a teenager with a learners license.
I found out that I have access to codex on Thursday with my plus subscription. I've created and merged about a dozen PRs with it on my OSS projects since then. It's not flawless but it's pretty good. I've done some tedious work that I had been deferring, got it to complete a few FIXMEs that I hadn't gotten around to fixing, made it write some API documentation, got it to update a README, etc. It's pretty easy to review the PRs.
What I like is that it creates and works on its own branch. I can actually check that branch out, fix a few things myself, push it and then get it to do PRs against that branch. I had to fix a few small compilation issues. In one case, the fix was just removing a single import that it somehow got wrong after that everything built and the tests passed. Overall it's pretty impressive. Very usable.
I wonder how it performs on larger code bases. I expect some issues there. I'm going to give that a try next.
Of these "In the loop", seems to be the one that doesn't work that well (yet). The main problem is latency in my opinion.
https://play.clickhouse.com/play?user=play#V0lUSCByZXBvX3N0Y...
I've also added some less popular agents like jetbrains-junie, and added a link to a random pull request for each agent, so we can look at the example PRs.
That "spark bar-chart" column output is one of the neatest things I've seen in a while. What a brilliant feature.
also, of course OpenAI Codex would perform well because the tool is heavily tailored to this type of task, whereas Cursor is a more general-purpose (in the programming domain) tool/app.
- It has non-interactive CLI functionality (with -p "prompt" option) in addition to the default interactive TUI, making it easy to integrate to workflows.
- It has turn-key GitHub integration (https://github.com/anthropics/claude-code-action).
- It has internal task tracking system that uses ReadTodo/WriteTodo tools to write JSON task lists to `$HOME/.claude/tasks/`, and enabling it to stay on track better than most other tools.
- It has excellent and customisable context compaction.
- And it has flexible permission system that can be used to turn all permissions questions to auto-accept when running in sandboxed environments.
Together those features enable it to be just as autonomous as any GitHub AI bot action hype thing (even though that might not have been its original or primary use).
https://x.com/paradite_/status/1931644656762429503
Docs: https://docs.anthropic.com/en/docs/claude-code/github-action...
It seems like Claude Code doesn't do that? some preliminary searching reveals that PRs generated by people using Claude Code use their own user account but may sign that they used Claude, example https://github.com/anthropics/claude-code/pull/1732
feat: add progress bar for token probability calculation
- Add optional progress_cb parameter to get_token_probs function
- Integrate `rich` progress bar in CLI showing real-time token processing progress
- Add comprehensive tests for progress callback functionality
- Maintain backward compatibility with optional parameter
Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
But you could probably filter this a bit by looking at PR commit counts?
Since all agents are able to use the terminal I suggest looking up the Gitlab CLI and have it use that. Should work locally and in runners.
Also filter conditions that would be interesting - size of PR, language, files affected, distinct organizations etc. lmk if these get added please!
or something will fix this
* Cursor agents where just introduced in Beta and have privacy limitations that prevent their usage as many organizations.
* Cursor is still focused on hands-on-keyboard agentic flows, which aren't included in these counts.
Poor data: If I make one, I either if I want to:
a) Merge it (success)
b) Modify it (sometimes success, sometimes not). In one case, Codex made the wrong changes in all the right places, but it was still easier to work from that by hand.
c) Pick ideas from it (partial success)
So simple merge rates don't say much.
The denominator varies wildly based on whether or not the PR is made. If codex makes nonsense, I don't ask it to make a PR.