Tracking Copilot vs. Codex vs. Cursor vs. Devin PR Performance (opens in new tab)

(aavetis.github.io)

254 pointsHiPHInch11mo ago122 comments

122 comments

(Disclaimer: I work on coding agents at GitHub)

This data is great, and it is exciting to see the rapid growth of autonomous coding agents across GitHub.

One thing to keep in mind regarding merge rates is that each of these products creates the PR at a different phase of the work. So just tracking PR create to PR merge tells a different story for each product.

In some cases, the work to iterate on the AI generated code (and potentially abandon it if not sufficiently good) is done in private, and only pushed to a GitHub PR once the user decides they are ready to share/merge. This is the case for Codex for example. The merge rates for product experiences like this will look good in the stats presented here, even if many AI generated code changes are being abandoned privately.

For other product experiences, the Draft PR is generated immediately when a task is assigned, and users can iterate on this “in the open” with the coding agent. This creates more transparency into both the success and failure cases (including logs of the agent sessions for both). This is the case for GitHub Copilot coding agent for example. We believe this “learning in the open” is valuable for individuals, teams, and the industry. But it does lead to the merge rates reported here appearing worse - even if logically they are the same as “task assignment to merged PR” success rates for other tools.

We’re looking forward to continuing to evolve the notion of Draft PR to be even more natural for these use cases. And to enabling all of these coding agents to benefit from open collaboration on GitHub.

polskibus11mo ago

What is your team’s take on the copyright for commits generated by ai agent ? Would the copyright protect it?

Current US stance seems to be: https://www.copyright.gov/newsnet/2025/1060.html “It concludes that the outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements”.

If entire commit is generated by AI then it is obvious what created it - it’s AI. Such commit might not be covered by the law. Is this something your team has already analysed?

qznc11mo ago

This is a very fascinating aspect which is not discussed much. So far in human history every text was written by someone and thus there is some kind of copyright.

Now we have text which is legally not owned by anybody. Is it "public domain" though? It is not possible to verify it, so maybe it is but it still poses legal risks.

IanCal11mo ago

>If entire commit is generated by AI then it is obvious what created it - it’s AI.

Whether it's committed or not is irrelevant to the conclusion there, the question is what was the input.

chiph11mo ago

For something like a compiler where the output is mostly deterministic[0] I agree. For an AI that was trained on an unknown corpus, and that corpus changes over time, the output is much less deterministic and I would say you lose the human element needed of copyright claims.

If it can be shown that for the same prompt, run through the AI several times over perhaps a year, results in the same output - then I will change my mind. Or if the AI achieves personhood.

[0] Allowances for register & loop optimization, etc.

rustc11mo ago

> “It concludes that the outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements”

How would that work if it's a patch to a project with a copyleft license like GPL which requires all derivate work to be licensed the same?

olq_plo11mo ago

IANAL, but it means the commit itself is public domain. When integrated into a code base with a more restrictive license, you can still use that isolated snippet in whatever way you want.

More interesting question is whether one could remove the GPL restrictions on public code by telling AI to rewrite the code from scratch, providing only the behavior of the code.

This could be accomplished by making AI generate a comprehensive test suite first, and then let it write the code of the app seeing only the test suite.

1 more reply

anticensor11mo ago

GPL is a copyright licence, not a ToS.

1 more reply

jegudiel11mo ago

AI Code and Copyright - Risky Business or Creative Power-Up(AI Generated Podcast)

https://open.spotify.com/episode/6o2Ik3w6c4x4DYILXwRSos?si=5...

jegudiel11mo ago

An unconventional license for AI-generated code. Maybe public domain, maybe not. Use freely, vibe responsibly.

https://jilvin.github.io/vibe-license/

blagie11mo ago

> If entire commit is generated by AI then it is obvious what created it - it’s AI.

This is not the case. The output of a compiler is 100% created by a compiler too. Copyright is based on where the creative aspect comes from.

I have had very little luck having 2025-era AIs manage the creative aspects of coding -- design, architecture, and similar -- and that's doubly true for what appears to be the relatively simplistic model in codex (as far as I can tell, codex trades off model complexity for model time; the model does a massive amount of work for a relatively small change).

However, it is much better than I am at the mechanical aspects. LLMs can fix mechanical bugs almost instantly (the sort of thing with a cut-and-paste fix in some build process from Stack Overflow), and generate massive amounts of code without typos or shallow bugs.

A good analogy is working with powertools versus handtools. I can do much more in one step, but I'm still in creative control.

The codebase I'm working on is pretty sophisticated, and I might imagine they could implement more cookiecutter things (e.g. a standard oauth workflow) more automatically.

However, even there -- or in discussions with larger models about my existing codebase -- what they do is in part based their creativity on human contributions to their training set. I'm not sure how to weigh that. An LLM oauth workflow might be considered the creative median of a lot of human-written code.

I write a lot of AGPL code, and at least in the 3.5 era, they were clearly trained on my code, and would happily print it out more-or-less verbatim. Indeed, it was to the point where I complained to OpenAI about it at the time, but never got a response. I suspect a lot of generated code will include some fractional contribution from me now (an infinitesimal fraction most of the time, but more substantial for niche code similar to my codebase).

So in generated code, we have a mixture of at least a few different pieces:

- User's contributions, in prompt, review, etc.

- Machine contributions

- Training set contributions

soamv11mo ago

This is a great point! But there's an important tradeoff here about human engineering time versus the "learning in the open" benefits; a PR discarded privately consumes no human engineering time, a fact that the humans involved might appreciate. How do you balance that tradeoff? Is there such a thing as a diff that's "too bad" to iterate on with a human?

lukehoban11mo ago

I do agree there is a balance here, and that the ideal point in the spectrum is likely in between the two product experiences that are currently being offered here. There are a lot of benefits to using PRs for the review and iteration - familiar diff UX, great comment/review feedback mechanisms, ability to run CI, visibility and auth tracked natively within GitHub, etc. But Draft PRs are also a little too visible by default in GitHub today, and there are times when you want a shareable PR link that isn't showing up by default on the Pull Requests list in GitHub for your repo. (I frankly want this even for human-authored Draft PRs, but its even more compelling for agent authored PRs).

We are looking into paths where we can support this more personal/private kind of PR, which would provide the foundation within GitHub to support the best of both worlds here.

ambicapter11mo ago

Do people where you work spend time reviewing draft PRs? I wouldn’t do that unless asked to by the author.

drawnwren11mo ago

It’s hard enough for me to get time to review actual PRs, who are these engineers trawling through the drafts?

osigurdson11mo ago

I've been underwhelmed with dedicated tools like Windsurf and Cursor in the sense that they are usually more annoying than just using ChatGPT. They have their niche but they are just so incredibly flow destroying it is hard to use them for long periods of time.

I just started using Codex casually a few days ago though and already have 3 PRs. While different tools for different purposes make sense, Codex's fully async nature is so much nicer. It does simple things like improve consistency and make small improvements quite well which is really nice. Finally we have something that operates more like an appliance for a certain classes of problems. Previously it felt more like a teenager with a learners license.

elliotec11mo ago

Have you tried Claude code? I’m surprised it’s not in this analysis but in my personal experience, the competition doesn’t even touch it. I’ve tried them all in earnest. My toolkit has been (neo)vim and tmux for at least a decade now so I understand the apprehension for less terminal-inclined folks that prefer other stuff but it’s my jam and just crushes it.

cap1123511mo ago

Right, after the Sonnet 4 release it was the first time I could tell an agent something and just let it run comfortably. As for the tool itself, I think a large part of its ability comes from how it writes recursive todo-lists for itself, which are shown to the user, so you can intervene early on the occasions it goes full Monkey's Paw.

tough11mo ago

yeah i've been manually doing first a TASKS.md so i can modify it while the agent starts working on it.

jillesvangurp11mo ago

OpenAI nailed the UX/DX with codex. This completely obsoletes cursor and similar IDEs. I don't need AI in my tools. I just need somebody to work on my code in parallel to me. I'm happy to interact via pull requests and branches.

I found out that I have access to codex on Thursday with my plus subscription. I've created and merged about a dozen PRs with it on my OSS projects since then. It's not flawless but it's pretty good. I've done some tedious work that I had been deferring, got it to complete a few FIXMEs that I hadn't gotten around to fixing, made it write some API documentation, got it to update a README, etc. It's pretty easy to review the PRs.

What I like is that it creates and works on its own branch. I can actually check that branch out, fix a few things myself, push it and then get it to do PRs against that branch. I had to fix a few small compilation issues. In one case, the fix was just removing a single import that it somehow got wrong after that everything built and the tests passed. Overall it's pretty impressive. Very usable.

I wonder how it performs on larger code bases. I expect some issues there. I'm going to give that a try next.

osigurdson11mo ago

I think there are basically three kinds of uses for AI: 1) "Out of loop" - e.g. Codex - it does things while you work on something else. Today it can handle basic things on its own like an appliance. 2) "In the loop" - e.g. Windsurf / Cursor. Here, you know what you are doing but are trying to use AI to essentially type at super human speeds. 3) "Coach mode" - you need to learn something in order to progress. You are using ChatGPT (usually), but possibly other tools as a way to help you get the right context faster.

Of these "In the loop", seems to be the one that doesn't work that well (yet). The main problem is latency in my opinion.

jillesvangurp11mo ago

In the loop is not really a problem I have. I use intellij. So, I'm usually not really limited by my ability to type fast. I don't actually type a lot of code mostly.

A better auto complete than comes with the IDE already is actually hard and most of the AI code completion approaches I've seen conflict with the built in auto complete and don't actually don't do better. I've tried a few things and usually end up disabling the auto complete features they offer because they are quite pointless for me. What happens is that I get a lot of suggestions for code I definitely don't want drowning out the completions I do want and messing up my editing flow. Aside from having to constantly read through code that is definitely a combination of not what I'm looking for and probably wrong. And it is actually extra work that I don't need in my life. A bit of an anti feature as far as I'm concerned.

But, I actually have been using chat gpt quite a bit. It works for me because it connects to the IDE (instead of interfering with it) and it allows me to easily prompt it to ask questions about my code. This is much more useful to me than an AI second guessing me on every keystroke.

Codex adds to this by being more like a team mate that I can delegate simple things to. It would be nice if it could notify me when it is done or when it needs my input. But otherwise it's nice.

I'm pretty sure the codex and chat gpt desktop UIs might merge soon. There's no good reason to have two modalities here other than that they are probably created by two different teams. Conway's law might be an issue here. But I like what OpenAI has done with their desktop client though and they seem to be on top of that.

softwaredoug11mo ago

I love using codex to just explore code instead of searching. It’s a great tool to learn or research what’s happening in the code with great code breadcrumbs to find what you need to know

wahnfrieden11mo ago

On Mac I don’t like how chatgpt makes it difficult to have a few queries generating in parallel for my Xcode

deadbabe11mo ago

You can just use Cursor as a chat assistant if you want.

threeseed11mo ago

But then you're paying far more than just using Claude web which can be used for tasks other than coding.

deadbabe11mo ago

Your company can be paying for it

1 more reply

zX41ZdbW11mo ago

It is also worth looking at the number of unique repositories for each agent, or the number of unique large repositories (e.g., by the threshold on the number of stars). Here is the report we can check:

https://play.clickhouse.com/play?user=play#V0lUSCByZXBvX3N0Y...

I've also added some less popular agents like jetbrains-junie, and added a link to a random pull request for each agent, so we can look at the example PRs.

gavinray11mo ago

This is really cool and ought to be higher up I think, especially since you can freely edit + re-run the query in the browser.

That "spark bar-chart" column output is one of the neatest things I've seen in a while. What a brilliant feature.

behnamoh11mo ago

How about Google Jules?

also, of course OpenAI Codex would perform well because the tool is heavily tailored to this type of task, whereas Cursor is a more general-purpose (in the programming domain) tool/app.

ubj11mo ago

Where is Claude Code? Surprised to see it completely left out of this analysis.

ainiriand11mo ago

It is not an 'agent' in the sense that it is not really autonomous afaik.

HenriNext11mo ago

Claude Code was not designed from the ground up to be only an autonomous agent, but it can certainly act as one.

- It has non-interactive CLI functionality (with -p "prompt" option) in addition to the default interactive TUI, making it easy to integrate to workflows.

- It has turn-key GitHub integration (https://github.com/anthropics/claude-code-action).

- It has internal task tracking system that uses ReadTodo/WriteTodo tools to write JSON task lists to `$HOME/.claude/tasks/`, and enabling it to stay on track better than most other tools.

- It has excellent and customisable context compaction.

- And it has flexible permission system that can be used to turn all permissions questions to auto-accept when running in sandboxed environments.

Together those features enable it to be just as autonomous as any GitHub AI bot action hype thing (even though that might not have been its original or primary use).

cap1123511mo ago

Yeah, my primary usage pattern for it is purely autonomous for new feature development. I have Claude iterate on a prompt for itself a lot, asking me questions as it goes, then after, I can just say generic things like "Do the thing", "Continue", "Check the repo" and it does the thing, based on R/W Todo and my larger scale todo list for implementation. Also, Claude does have a github action (not that I've tried it though).

NiekvdMaas11mo ago

Same for Google Jules

ukblewis11mo ago

Claude Code isn’t a complete agent - it cannot open PRs autonomously AFAIK

paradite11mo ago

It can open PR via GitHub actions integration. I just did:

https://x.com/paradite_/status/1931644656762429503

Docs: https://docs.anthropic.com/en/docs/claude-code/github-action...

mmaunder11mo ago

Yeah it can. Either using MCP or git via bash. It’s a glaring omission and calls the data into question. How is attribution done? If it’s via the agent taking credit in commit messages, that’s a problem because Claude code, for example, has a config parameter that lets you tell it to not credit itself. With Claude code completely missing I’d say this is wildly inaccurate.

wordofx11mo ago

The problem with Claude code is it doesn’t let you walk away. You have to press yes yes yes yes yes yes yes 500 times.

Glad it’s missing until they fix this.

3 more replies

romanovcode11mo ago

If there is gh cli tool installed it will do it without any special prompts/requests/instructions no problem.

tmvnty11mo ago

Merge rates is definitely a useful signal, but there are certainly other factors we need consider (PR small/big edits, refactors vs deps upgrades, direct merges, follow up PRs correcting merged mistakes, how easy it is to setup these AI agents, marketing, usage fees etc). Similar to how NPM downloads alone don’t necessarily reflect a package’s true success or quality.

osigurdson11mo ago

I suspect most are pretty small. But hey, that is fine as long as they are making code bases a bit better.

dimitri-vs11mo ago

This might be an obvious questions but why is Claude Code not included?

a_bonobo11mo ago

I think the OP's page works because these coding agents identify themselves as the PR author so the creator can just search Github's issue tracker for things like is:pr+head:copilot or is:pr+head:codex

It seems like Claude Code doesn't do that? some preliminary searching reveals that PRs generated by people using Claude Code use their own user account but may sign that they used Claude, example https://github.com/anthropics/claude-code/pull/1732

cap1123511mo ago

Claude does credit itself in the commit messages. eg:

feat: add progress bar for token probability calculation

- Add optional progress_cb parameter to get_token_probs function

- Integrate `rich` progress bar in CLI showing real-time token processing progress

- Add comprehensive tests for progress callback functionality

- Maintain backward compatibility with optional parameter

Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

a_bonobo11mo ago

OK then OP can slightly change their site by using a different search term:

https://github.com/search?q=is:pr+is:merged+Co-Authored-By:+...

Instead of looking at the author of the PR, look for that 'Co-Authored-By: Claude' text bit.

That way I get 753 closed PRs and '1k' PRs in total, that's a pretty good acceptance rate.

1 more reply

csallen11mo ago

I believe these are all "background" agents that, by default, are meant to write code and issue pull requests without you watching/babysitting/guiding the process. I haven't used Claude Code in a while, but from what I recall, it's not that.

cap1123511mo ago

If you enable it in permissions, Claude is very happy to do so. For personal fun/experimental projects (usually I give it arXiv papers to implement), I generally have a couple Claude instances (on different projects) just chugging along all day. I have them write really detailed plans at the start (50-100 steps in the implementation plan, plus actual specifications for project structure, dev practices, and what the actual goals are). I iterate on these plan documents by having Claude write QUESTIONS.md which has dev questions for me to clarify, which I fill out with answers, and then instruct Claude to update the plan docs with my answers. Then most of my interaction throughout the day is just saying something like "P18" to implement implementation plan step #18. I instruct it in CLAUDE.md to stop after each step, output what automated tests have been written for P18's features, and I require that the LLM write a demo script that I can run that shows the features, using real APIs. I'm having a great time with it.

ilteris11mo ago

How much do you pay monthly? What kind of service do you use thanks

1 more reply

koakuma-chan11mo ago

Claude Code can run in background and I don't see why it wouldn't be able to create pull requests if you gave it such a tool.

cap1123511mo ago

The prompts in Claude Code have specific instructions on doing pull requests.

``` grep 'gh pr ' ~/.claude/local/node_modules/@anthropic-ai/claude-code/cli.js - Create PR using gh pr create with the format below. Use a HEREDOC to pass the body to ensure correct formatting. gh pr create --title "the pr title" --body "$(cat <<'EOF' 1. Use \`gh pr view --json number,headRepository\` to get the PR number and repository info 1. If no PR number is provided in the args, use ${O4.name}("gh pr list") to show open PRs 2. If a PR number is provided, use ${O4.name}("gh pr view <number>") to get PR details 3. Use ${O4.name}("gh pr diff <number>") to get the diff ```

wordofx11mo ago

How to tell CC to accept everything. It’s frustrating to need to press yes yes yes yes.

2 more replies

throwaway31415511mo ago

Is this data not somewhat tainted by the fact that there's really zero way to identify how much a human was or wasn't "in the loop" before the PR was created?

thorum11mo ago

With Jules, I almost always end up making significant changes before approving the PR. So “successful merge” is not great indicator of how well the model did in my case. I’ve merged PRs that were initially terrible after going in and fixing all the mistakes.

tptacek11mo ago

I kind of wondered about that re: Devin vs. Cursor, because the people I know that happen to use Devin are also very hands-on with the code they end up merging.

But you could probably filter this a bit by looking at PR commit counts?

SilverSlash11mo ago

Wasn't Codex only released recently? Why is it present an order of magnitude more than the others?

bkls11mo ago

OpenAI brand, and they're already used by many consumers/enterprises. Distribution advantage

ehsanu111mo ago

It's hard to attribute PR merge rate with higher tool quality here. Another likely reason is level of complexity of task. Just looking at the first PR I saw from the github search for codex PRs, it was this one-line change that any tool, even years ago, could have easily accomplished: https://github.com/maruyamamasaya/yasukaribike/pull/20/files

knes11mo ago

This is great work. Would love to see Augmentcode.com remote agent. If you are down OP, msg and I'll give you a free subscription to add to the test

nojs11mo ago

For people using these, is there an advantage to having the agent create PRs and reviewing these versus just iterating with Cursor/Claude Code locally before committing? It seems like additional bureaucracy and process when you could fix the errors sooner and closer to the source.

cap1123511mo ago

Ignoring the issue of non-LLM team members, PR's are helpful if you are using GH issues as a memory mechanism, supposedly. That said, I don't bother if I don't have to. I have Claude commit automatically when it feels it made a change, then I curate things before I push (usually just squash).

yoran11mo ago

All these tools seem to be GitHub-centric. Any tips for teams using GitLab to store their repositories?

s900mhz11mo ago

I use Claude code daily at work, it writes all my PRs. It uses the GitHub cli to manage them.

Since all agents are able to use the terminal I suggest looking up the Gitlab CLI and have it use that. Should work locally and in runners.

myhandleisbest11mo ago

Can I get a clarification on the data here - Are these PRs reviewed by the tools or fully authored?

Also filter conditions that would be interesting - size of PR, language, files affected, distinct organizations etc. lmk if these get added please!

joshstrange11mo ago

I can't be the only one annoyed by the square/circle mismatch in the legend/graph?

https://cs.joshstrange.com/lWRtNMTk

zekone11mo ago

thank you sir i will fix this

or something will fix this

https://github.com/aavetis/ai-pr-watcher/issues/21

pkongz11mo ago

How does this analysis handle potential false positives? For instance, if a user coincidentally names their branch `codex/my-branch`, would it be incorrectly included in the "Codex" statistics?

selvan11mo ago

Total PRs between Codex vs Cursor is 208K vs 705, this is an enormous difference in absolute PRs. Since cursor is very popular, how does their PRs is not even 1% of codex PRs?.

ezyang11mo ago

The happy path way of getting code out of Codex is a PR. This is emphatically not true for Cursor.

cap1123511mo ago

Feels like a sort of pollution.

falcor8411mo ago

Why? That is its intent - unlike an IDE, it is intended to work autonomously and only get back to you after it has prepared the full changeset - which at that point you'd review via a PR. Where's the pollution in that?

1 more reply

rahimnathwani11mo ago

I didn't even realize Cursor could make PRs. I thought most people would create PRs themselves once they were happy with a series of commits.

cap1123511mo ago

At least you know that Codex knows how to advertise itself, if nothing else.

SkyPuncher11mo ago

This is only comparing _agents_, which is going to exclude pretty much all Cursor usage for two reasons:

* Cursor agents where just introduced in Beta and have privacy limitations that prevent their usage as many organizations.

* Cursor is still focused on hands-on-keyboard agentic flows, which aren't included in these counts.

nikolayasdf12311mo ago

yeah, GitHub Copilot PRs are unusable. from personal experience

TZubiri11mo ago

Why is there 170k PR for a product released last month, but 700 for a product that has been around for like 6 months and was so popular it got acquired for 3B?

simoncion11mo ago

It might be the case that "number of PRs" is roughly as good a metric as "number of lines of code produced".

SatvikBeri11mo ago

I've used Cursor for months and didn't even realize you could make PRs from it. It's not really part of the default workflow.

frognumber11mo ago

Missing data: I don't make a codex PR if it's nonsense.

Poor data: If I make one, I either if I want to:

a) Merge it (success)

b) Modify it (sometimes success, sometimes not). In one case, Codex made the wrong changes in all the right places, but it was still easier to work from that by hand.

c) Pick ideas from it (partial success)

So simple merge rates don't say much.

osigurdson11mo ago

It isn't so much "poor" data as it is a fairly high bar for value generation. If it gets merged it is a fairly clear indicator that some value is created. If it doesn't get merged then it may be adding some value or it may not.

frognumber11mo ago

There's a numerator and a denominator. The numerator is fine for what you're saying -- the number of merged PRs.

The denominator varies wildly based on whether or not the PR is made. If codex makes nonsense, I don't ask it to make a PR.

pryelluw11mo ago

Is it me or are there a lot of documentation related PRs? Not a majority, but enough to mask the impact of agent code.

myhandleisbest11mo ago

Stats? What about the vibes leaderboard?

falcor8411mo ago

Which one?

myhandleisbest11mo ago

Sorry if it existed I would've linked . But maybe there's an opportunity for someone here :)

m3kw911mo ago

Agents should also sign the pr with secret keys so people can’t just fake the commit message

cjbarber11mo ago

Seems like the high order bit impacting results here might be how difficult the PR is?

kaelandt11mo ago

could be nice to add a "merged PR with a test" metric. looking at the PRs they are mostly without tests, so could be bogus for all we know

m4r1k11mo ago

Just curious, why is there no reference to Google?

rcarmo11mo ago

I was expecting a better definition of “performance”. Merging a garbage PR shouldn’t be a positive uptick.

zekone11mo ago

thanks for posting my project bradda

zachlatta11mo ago

Wow, this is an amazing project. Great work!

j / k navigate · click thread line to collapse

122 comments

lukehoban11mo ago

(Disclaimer: I work on coding agents at GitHub)

This data is great, and it is exciting to see the rapid growth of autonomous coding agents across GitHub.

polskibus11mo ago

What is your team’s take on the copyright for commits generated by ai agent ? Would the copyright protect it?

If entire commit is generated by AI then it is obvious what created it - it’s AI. Such commit might not be covered by the law. Is this something your team has already analysed?

qznc11mo ago

This is a very fascinating aspect which is not discussed much. So far in human history every text was written by someone and thus there is some kind of copyright.

Now we have text which is legally not owned by anybody. Is it "public domain" though? It is not possible to verify it, so maybe it is but it still poses legal risks.

IanCal11mo ago

>If entire commit is generated by AI then it is obvious what created it - it’s AI.

Whether it's committed or not is irrelevant to the conclusion there, the question is what was the input.

chiph11mo ago

If it can be shown that for the same prompt, run through the AI several times over perhaps a year, results in the same output - then I will change my mind. Or if the AI achieves personhood.

[0] Allowances for register & loop optimization, etc.

rustc11mo ago

> “It concludes that the outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements”

How would that work if it's a patch to a project with a copyleft license like GPL which requires all derivate work to be licensed the same?

olq_plo11mo ago

IANAL, but it means the commit itself is public domain. When integrated into a code base with a more restrictive license, you can still use that isolated snippet in whatever way you want.

More interesting question is whether one could remove the GPL restrictions on public code by telling AI to rewrite the code from scratch, providing only the behavior of the code.

This could be accomplished by making AI generate a comprehensive test suite first, and then let it write the code of the app seeing only the test suite.

1 more reply

anticensor11mo ago

GPL is a copyright licence, not a ToS.

1 more reply

jegudiel11mo ago

AI Code and Copyright - Risky Business or Creative Power-Up(AI Generated Podcast)

https://open.spotify.com/episode/6o2Ik3w6c4x4DYILXwRSos?si=5...

jegudiel11mo ago

An unconventional license for AI-generated code. Maybe public domain, maybe not. Use freely, vibe responsibly.

https://jilvin.github.io/vibe-license/

blagie11mo ago

> If entire commit is generated by AI then it is obvious what created it - it’s AI.

This is not the case. The output of a compiler is 100% created by a compiler too. Copyright is based on where the creative aspect comes from.

A good analogy is working with powertools versus handtools. I can do much more in one step, but I'm still in creative control.

The codebase I'm working on is pretty sophisticated, and I might imagine they could implement more cookiecutter things (e.g. a standard oauth workflow) more automatically.

So in generated code, we have a mixture of at least a few different pieces:

- User's contributions, in prompt, review, etc.

- Machine contributions

- Training set contributions

soamv11mo ago

lukehoban11mo ago

We are looking into paths where we can support this more personal/private kind of PR, which would provide the foundation within GitHub to support the best of both worlds here.

ambicapter11mo ago

Do people where you work spend time reviewing draft PRs? I wouldn’t do that unless asked to by the author.

drawnwren11mo ago

It’s hard enough for me to get time to review actual PRs, who are these engineers trawling through the drafts?

osigurdson11mo ago

elliotec11mo ago

cap1123511mo ago

tough11mo ago

yeah i've been manually doing first a TASKS.md so i can modify it while the agent starts working on it.

jillesvangurp11mo ago

I wonder how it performs on larger code bases. I expect some issues there. I'm going to give that a try next.

osigurdson11mo ago

Of these "In the loop", seems to be the one that doesn't work that well (yet). The main problem is latency in my opinion.

jillesvangurp11mo ago

In the loop is not really a problem I have. I use intellij. So, I'm usually not really limited by my ability to type fast. I don't actually type a lot of code mostly.

Codex adds to this by being more like a team mate that I can delegate simple things to. It would be nice if it could notify me when it is done or when it needs my input. But otherwise it's nice.

softwaredoug11mo ago

I love using codex to just explore code instead of searching. It’s a great tool to learn or research what’s happening in the code with great code breadcrumbs to find what you need to know

wahnfrieden11mo ago

On Mac I don’t like how chatgpt makes it difficult to have a few queries generating in parallel for my Xcode

deadbabe11mo ago

You can just use Cursor as a chat assistant if you want.

threeseed11mo ago

But then you're paying far more than just using Claude web which can be used for tasks other than coding.

deadbabe11mo ago

Your company can be paying for it

1 more reply

zX41ZdbW11mo ago

https://play.clickhouse.com/play?user=play#V0lUSCByZXBvX3N0Y...

I've also added some less popular agents like jetbrains-junie, and added a link to a random pull request for each agent, so we can look at the example PRs.

gavinray11mo ago

This is really cool and ought to be higher up I think, especially since you can freely edit + re-run the query in the browser.

That "spark bar-chart" column output is one of the neatest things I've seen in a while. What a brilliant feature.

behnamoh11mo ago

How about Google Jules?

also, of course OpenAI Codex would perform well because the tool is heavily tailored to this type of task, whereas Cursor is a more general-purpose (in the programming domain) tool/app.

ubj11mo ago

Where is Claude Code? Surprised to see it completely left out of this analysis.

ainiriand11mo ago

It is not an 'agent' in the sense that it is not really autonomous afaik.

HenriNext11mo ago

Claude Code was not designed from the ground up to be only an autonomous agent, but it can certainly act as one.

- It has non-interactive CLI functionality (with -p "prompt" option) in addition to the default interactive TUI, making it easy to integrate to workflows.

- It has turn-key GitHub integration (https://github.com/anthropics/claude-code-action).

- It has internal task tracking system that uses ReadTodo/WriteTodo tools to write JSON task lists to `$HOME/.claude/tasks/`, and enabling it to stay on track better than most other tools.

- It has excellent and customisable context compaction.

- And it has flexible permission system that can be used to turn all permissions questions to auto-accept when running in sandboxed environments.

Together those features enable it to be just as autonomous as any GitHub AI bot action hype thing (even though that might not have been its original or primary use).

cap1123511mo ago

NiekvdMaas11mo ago

Same for Google Jules

ukblewis11mo ago

Claude Code isn’t a complete agent - it cannot open PRs autonomously AFAIK

paradite11mo ago

It can open PR via GitHub actions integration. I just did:

https://x.com/paradite_/status/1931644656762429503

Docs: https://docs.anthropic.com/en/docs/claude-code/github-action...

mmaunder11mo ago

wordofx11mo ago

The problem with Claude code is it doesn’t let you walk away. You have to press yes yes yes yes yes yes yes 500 times.

Glad it’s missing until they fix this.

3 more replies

romanovcode11mo ago

If there is gh cli tool installed it will do it without any special prompts/requests/instructions no problem.

tmvnty11mo ago

osigurdson11mo ago

I suspect most are pretty small. But hey, that is fine as long as they are making code bases a bit better.

dimitri-vs11mo ago

This might be an obvious questions but why is Claude Code not included?

a_bonobo11mo ago

cap1123511mo ago

Claude does credit itself in the commit messages. eg:

feat: add progress bar for token probability calculation

- Add optional progress_cb parameter to get_token_probs function

- Integrate `rich` progress bar in CLI showing real-time token processing progress

- Add comprehensive tests for progress callback functionality

- Maintain backward compatibility with optional parameter

Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

a_bonobo11mo ago

OK then OP can slightly change their site by using a different search term:

https://github.com/search?q=is:pr+is:merged+Co-Authored-By:+...

Instead of looking at the author of the PR, look for that 'Co-Authored-By: Claude' text bit.

That way I get 753 closed PRs and '1k' PRs in total, that's a pretty good acceptance rate.

1 more reply

csallen11mo ago

cap1123511mo ago

ilteris11mo ago

How much do you pay monthly? What kind of service do you use thanks

1 more reply

koakuma-chan11mo ago

Claude Code can run in background and I don't see why it wouldn't be able to create pull requests if you gave it such a tool.

cap1123511mo ago

The prompts in Claude Code have specific instructions on doing pull requests.

wordofx11mo ago

How to tell CC to accept everything. It’s frustrating to need to press yes yes yes yes.

2 more replies

throwaway31415511mo ago

Is this data not somewhat tainted by the fact that there's really zero way to identify how much a human was or wasn't "in the loop" before the PR was created?

thorum11mo ago

tptacek11mo ago

I kind of wondered about that re: Devin vs. Cursor, because the people I know that happen to use Devin are also very hands-on with the code they end up merging.

But you could probably filter this a bit by looking at PR commit counts?

SilverSlash11mo ago

Wasn't Codex only released recently? Why is it present an order of magnitude more than the others?

bkls11mo ago

OpenAI brand, and they're already used by many consumers/enterprises. Distribution advantage

ehsanu111mo ago

knes11mo ago

This is great work. Would love to see Augmentcode.com remote agent. If you are down OP, msg and I'll give you a free subscription to add to the test

nojs11mo ago

cap1123511mo ago

yoran11mo ago

All these tools seem to be GitHub-centric. Any tips for teams using GitLab to store their repositories?

s900mhz11mo ago

I use Claude code daily at work, it writes all my PRs. It uses the GitHub cli to manage them.

Since all agents are able to use the terminal I suggest looking up the Gitlab CLI and have it use that. Should work locally and in runners.

myhandleisbest11mo ago

Can I get a clarification on the data here - Are these PRs reviewed by the tools or fully authored?

Also filter conditions that would be interesting - size of PR, language, files affected, distinct organizations etc. lmk if these get added please!

joshstrange11mo ago

I can't be the only one annoyed by the square/circle mismatch in the legend/graph?

https://cs.joshstrange.com/lWRtNMTk

zekone11mo ago

thank you sir i will fix this

or something will fix this

https://github.com/aavetis/ai-pr-watcher/issues/21

pkongz11mo ago

How does this analysis handle potential false positives? For instance, if a user coincidentally names their branch `codex/my-branch`, would it be incorrectly included in the "Codex" statistics?

selvan11mo ago

Total PRs between Codex vs Cursor is 208K vs 705, this is an enormous difference in absolute PRs. Since cursor is very popular, how does their PRs is not even 1% of codex PRs?.

ezyang11mo ago

The happy path way of getting code out of Codex is a PR. This is emphatically not true for Cursor.

cap1123511mo ago

Feels like a sort of pollution.

falcor8411mo ago

1 more reply

rahimnathwani11mo ago

I didn't even realize Cursor could make PRs. I thought most people would create PRs themselves once they were happy with a series of commits.

cap1123511mo ago

At least you know that Codex knows how to advertise itself, if nothing else.

SkyPuncher11mo ago

This is only comparing _agents_, which is going to exclude pretty much all Cursor usage for two reasons:

* Cursor agents where just introduced in Beta and have privacy limitations that prevent their usage as many organizations.

* Cursor is still focused on hands-on-keyboard agentic flows, which aren't included in these counts.

nikolayasdf12311mo ago

yeah, GitHub Copilot PRs are unusable. from personal experience

TZubiri11mo ago

Why is there 170k PR for a product released last month, but 700 for a product that has been around for like 6 months and was so popular it got acquired for 3B?

simoncion11mo ago

It might be the case that "number of PRs" is roughly as good a metric as "number of lines of code produced".

SatvikBeri11mo ago

I've used Cursor for months and didn't even realize you could make PRs from it. It's not really part of the default workflow.

frognumber11mo ago

Missing data: I don't make a codex PR if it's nonsense.

Poor data: If I make one, I either if I want to:

a) Merge it (success)

b) Modify it (sometimes success, sometimes not). In one case, Codex made the wrong changes in all the right places, but it was still easier to work from that by hand.

c) Pick ideas from it (partial success)

So simple merge rates don't say much.

osigurdson11mo ago

frognumber11mo ago

There's a numerator and a denominator. The numerator is fine for what you're saying -- the number of merged PRs.

The denominator varies wildly based on whether or not the PR is made. If codex makes nonsense, I don't ask it to make a PR.

pryelluw11mo ago

Is it me or are there a lot of documentation related PRs? Not a majority, but enough to mask the impact of agent code.

myhandleisbest11mo ago

Stats? What about the vibes leaderboard?

falcor8411mo ago

Which one?

myhandleisbest11mo ago

Sorry if it existed I would've linked . But maybe there's an opportunity for someone here :)

m3kw911mo ago

Agents should also sign the pr with secret keys so people can’t just fake the commit message

cjbarber11mo ago

Seems like the high order bit impacting results here might be how difficult the PR is?

kaelandt11mo ago

could be nice to add a "merged PR with a test" metric. looking at the PRs they are mostly without tests, so could be bogus for all we know

m4r1k11mo ago

Just curious, why is there no reference to Google?

rcarmo11mo ago

I was expecting a better definition of “performance”. Merging a garbage PR shouldn’t be a positive uptick.

zekone11mo ago

thanks for posting my project bradda

zachlatta11mo ago

Wow, this is an amazing project. Great work!

j / k navigate · click thread line to collapse