We all had that one "productive" engineer in our teams who would write huge PRs that would have large swaths of refactoring whether warranted or not and that was way before anyone even could imagine in their wildest dreams that neural networks could generate that huge amounts of code.
The net effect of such a "productive" engineer always was that instead of increasing the team velocity, team would come to a crawling pace because either his PR had to be reviewed in detail eating up all the time and/or if you just did cursory LGTM then they blew up in production meanwhile forcing everyone back to the drawing board but project architecture would have shifted so rapidly due to his "productivity" that no one had a clear picture of the codebase such as what's where except that one "super smart talented productive loyal to the company goals" guy.
“Almost every software development organization has at least one developer who takes tactical programming to the extreme: a tactical tornado. The tactical tornado is a prolific programmer who pumps out code far faster than others but works in a totally tactical fashion. When it comes to implementing a quick feature, nobody gets it done faster than the tactical tornado. In some organizations, management treats tactical tornadoes as heroes. However, tactical tornadoes leave behind a wake of destruction. They are rarely considered heroes by the engineers who must work with their code in the future. Typically, other engineers must clean up the messes left behind by the tactical tornado, which makes it appear that those engineers (who are the real heroes) are making slower progress than the tactical tornado.” - John Ousterhout, A Philosophy of Software Design
But also I have no idea how that situation arises unless the slower folks are just auto-approving PRs. You kind of did that to yourself if you let the new person get away with it.
I knew one engineer who came in every Sunday night to process missed orders from an e-com system they wrote. They were unable to actually fix the problems with their code, so they just fixed the problems by hand. Every week...for years on end. Management thought he was a star who worked hard. The devs knew he was the worst engineer they have ever worked with. He still works at that same company 25 years later.
The correlation between what management thinks and reality can be pretty large at times.
In my case, it's worse than that. They usually get promotions, raises and move up the ladder. The business only cares about thing: making cash. This means pumping out feature as soon as possible because the sales team closed a million dollar contract, which includes features we don't have.
The engineers who deliver the features are noticed by managers and win big. No one cares about code quality and half the time, the code is rewritten or thrown away anyway..
I'm sure there exists organization where code is treated as art, but I sure as heck haven't worked in one. Over the years, I've given up trying to cleanup crap code, now I just get the work done as best as I can and call it a day.
You start by rejecting those PRs, saying "write more maintainable code, not quick hacks".
Management starts pressuring the original developer "why is it not merged yet, I thought you had it working".
That developer hits back with "well, it failed code review, they want me to refactor it".
Management goes back to the reviewer, "why did you fail this? It meets coding standards right? Pipeline is green".
Reviewer says "Well, yes it technically meets coding standards but it's full of hacks and is not future proof, it will bite us."
Management says "If we coded for tomorrow we'd never get anything done. Don't be so awkward". And then code gets merged.
Then you learn to just let these people go wild. If it hurts in the future you have a nice little "I told you so". But in my experience, management doesn't actually care if it hurts us in the future, it's not their problem. They just say "Well give me bigger estimates if you need to refactor". Fair enough, it's not a big deal but it is a pointless slog of picking up the pieces.
The other way it comes about is when the original developer just isn't really that good of a developer. So you end up in such an endless feedback loop trying to get the code in a good state that you piss everyone off and it's just easier to merge it.
Some hills just aren't worth dying on. And these guys can be exploited for your own advantage if you want to get code merged quickly ;)
Restricting changes to PR's is nowhere near universal.
The problem is allowing this kind of frantic tactical development even in "peace time".
In theory, sure, but in practice, to echo the others, you often don't have a choice, because of power dynamics/politics.
Its easy to say "its management's fault", but the principle is the same - these guys are spammers and quacks (and deserve nothing less than to be confined to the level of hell reserved for spammers), they just have to spam long enough and something will get through (volume over quality). And after their "success" i.e. fraud, they can ditch the company and move onto the next. I've seen multiple "seniors" like this, not actually very good at the work, but great at pushing half-baked slop.
For their bug bounty program, the company can just charge 5-10$ per submission to guarantee everything you send gets thoroughly reviewed by a human, and so it completely eliminates bot slop DDoS submissions overnight. If your bug and PR was actually good, then you get 10 + 1000$ back, and if it wasn't good, then you need to do better due diligence next time, and the skilled human feedback you received on why it wasn't good, was a valuable lesson for your engineering career, and it only cost you the price of a Starbucks latte, and it also cut out all the scammers polluting the system. This way everyone wins.
I said it before and I'll say it again, for opportunities open to the entire world on the internet, adding monetary friction is THE ONLY (anonymous) WAY to filter out serious people from bad actors doing spray-and-pray hoping they'll make some money, or get that job, by weaponizing AI bots. You can't rely on honor systems and a high trust society on the anonymous open internet, you need to financially gatekeep to save yourself and your sanity, and make sure the honest serious people you want to engage with don't end up drowning in the noise of the scammers and unscrupulous opportunists.
But we can't shut ourselves down just because we refuse to apply solutions to AI slop DDoS.
The problem, as the coach pointed out, was that that kind of behavior was pathological and showed poor planning and bad project management. Cheering on someone, even someone with the best of intentions, who was working like that was sending exactly the wrong message and reinforcing the wrong behavior.
But we already knew that!
Totally.
But seriously, I guarantee you the opposite is more common- the incompetent devs which can't manage shipping anything, keep trying to do "surgical and small edits" after 1 week of thinking about them and then have them blow up in prod for someone else to fix quickly because if it's up to them, it'll take 2-3 sprints
10 years ago I was a lot closer to what y'all talking about. After having more and more colleagues I can no longer agree and suspect this is mostly the opinion of incompetents which try to discredit regular devs.
Another thing they always lack is the ability to see when a large change is necessary because that's just what is necessary to achieve the feature in a stable manner. Sorry to say this, but starting of this discussion while trying to discredit large change sets in the age of ai is incredibly inept.
When you wrote your software well, large changes are possible and increase stability when you actually need to add a fundamental change of behavior. Which can come from a miniscule requirement.
But to close off on the topic of this article: they made the right call. In the open source context you cannot have this kind of incentive anymore with openclaw continuously shitting out one PR after another
That's the level that most competent software engineers should be working at.
Delegating understanding to LLM's is totally different thing. It's not plumbing at all. It's more like hiring a unlicensed, generalist but well-reputed handyman from Craigslist and then going out to a movie while they do the work. It could turn out fine, or not, and if it does work out, it could even save time and money if they're rate is low enough.
But it's not plumbing anymore, and you should be wary about billing plumber's rates for their work or taking on liability for it if you haven't even made sure that work meets your own standards of trade and quality.
You can argue that it's "one more level of abstraction" but it's a qualitatively different kind of abstraction. And in the economy of skilled labor, and the legal landscape of accountability and liability, that difference is enormously relevant.
Your point about AI being another abstraction similar to the "mostly deterministic" C compiler also comes up often but there are many arguments against it. If you think the determinism of a compiler and an AI are similar then I'm not sure whether you know anything about how either of them work or have even compared examples of what they produce.
PS We have way too many levels of abstraction now, that doesn't mean the right answer is to add another. Even worse unlike the others, LLMs aren't deterministic.
Anyway, my point is prompts are non deterministic and there’s no way of inferring what code output by an LLM is intended to do because that’s not how LLMs work
LLMs, as pushed currently, are not deterministic.
Moreover, I yet have to see a compiler whose output try to convince me I'm completely right and bring very smart interesting point on the table. Quite the contrary actually, though generally errors messages are not explicitly telling users how stupid the proposed code is as it doesn't even pass mere syntax and fundamental logic requirements.
I wouldn't advocate for using different tools, but everyone should be able to reason about the machine instructions underlying their code. Both in the immediate sense of the assembly a simple function turns into, and the tricks language runtimes use to enable their neat features.
The attitude that things are magic is poison. There is a difference between feeling confident something is comprehensible and not yet needing to go learn it, vs resigning to a position of powerlessness.
The main issue is that not everyone cares about the semantic of what they’re writing. You don’t need to know assembly to talk about C’s semantic or know C to talk about Python semantics. It does not require going up and down some abstraction tower.
A better example would be if you’d changed the behavior of the library as you did this work, and the library changes introduced hard-to-detect bugs across the application.
PR can be huge that's OK. For example, codebases that moved from Python 2 to Python 3 would have had huge PRs but the cognitive load was well understood.
Aka increasing the attack surface and maintenance burden.
That naturally meant reading and understanding more code than writing. Sometimes my LOC count was even negative, and I was proud of that accomplishment.
Now with AI I write even less and I've given up on the dream to gain fulfillment that way. The ability to quickly understand large amounts of code from questionable sources, be them machine or human, should hopefully stay valuable until my retirement, especially when supported by AI? What do you think?
And as this goes on, folks who can run an LLM _and_ understand/criticize/rework/re-prompt are just going to get more and more scarce. Even using an LLM in my preferred style, where you guide the model through a long series of small steps, will fade away.
As long as LLMs keep constantly making mistakes and introducing bugs and humans keep having to verify their output and clean up after them it should mean plenty of work for the few humans alive who can actually understand the code. Future AI models being trained on an increasingly large body of vibe coded bug-filled slop will only make the problem worse.
A small number of people with skills that are in demand will tend to make good money, and jobs that make good money will attract more people into learning how to code.
The problem, as I see it, with prolific use of AI to generate code is that it goes in the exact opposite direction. More and more code is bolted on top of existing code, more and more edge-cases, patch-ups, workarounds, etc. accumulate, the codebase grows and grows. In the end, no matter how good you are at understanding "code from questionable sources", you're still a human being. The AI can generate new code at rates several orders of magnitude faster than you can injest and understand it, and when your meat brain becomes exhausted, the machine does not tire. From a business perspective, your employer will weigh their options: they can wait for you to interpret the code and generate good code (whether by hand or by machine + human review).. Or they can just keep pulling the lever on the slot machine until it works well enough to sell. And for the business exec just looking for the fastest path to paydirt, I'm afraid the latter option is going to look way more appealing.
The people AI evangelists often say "typing" instead of "writing code", because they don't really understand -- or it's not lucrative for them to acknowledge -- what makes writing code hard.
We don't just write code to be executed by machines, we also write it to be read by humans. Code reviews, debugging, future changes -- all of these things involve reading and understanding the code someone wrote. And until we have an AI that we can actually hold responsible for its actions, we can't delegate the understanding to it.
So all we have to do is write code without reading or understanding it! Larry Wall was right all along!
No, it's that quite literally the PR submissions are SPAMmed. Whoever made them is acting like a SPAMmer, sending out lots of garbage and hoping it sticks. IE, they're not doing their part in reviewing what the AI found and generated.
Whoever is running the AI is spamming instead of doing the actual work. IE, they are just pushing their work onto the people who would review legitimate submissions.
Even with AI, just tell it to make smaller self contained PRs. I do this with Claude or GPT models and they do just fine.
If you rubberstamp some people‘s PRs all the time, you can then get them to greenlight your unpleasant PRs via pm instantly.
The other way round, retaliation: I once added some serious review notes to the PR of a very senior engineer because it was a dangerous topic. He would then spend the next months nitpicking every single PR I created. Had to post my PR in slack whenever he was not online to get them merged. After that I never seriously reviewed his PRs again. Too much of a headache.
Do you want one big PR or 100 small ones? You can't escape the sheer volume of code it's going to produce.
If you don't ever have a massive PR from a dynamite session, then you cannot ever be better than "average and plodding". So the question is, what's the context of the massive PR and how should it be handled?
* Mature product making money, intermediate engineer just refactored everything so it's "better"? Shut the fuck up, kindly please, you will have to demonstrate that you understand why things are this way and why it's better before we even have this conversation.
* Greenfield dev, trusted engineer getting from 0 -> 1 on something big? Maybe it shouldn't be held up in committee for 2 weeks. Maybe most objections will be superficial stylistic concerns.
Obviously there are many other contexts and these are 2 extremes in a multi-dimensional space. But if the process is "we litigate every line", then that's just not an innovative place to be. Yes, most PRs should be small, targeted, easy to review and tied to a ticket but if you're innovating? By definition it's a little different.
I can fling that back to you: very often the team hates the conclusion I arrive at, which is "It worked during your initial crunch and then everyone is just afraid to change it, which means your test coverage is far from good -- why is it not enriched?"
I am not trying to be an arse on purpose but the inertia and cargo-culting and tribe-defending practices I've seen during my contracting years (10-11) made me almost physically sick. Programmers are a fiercely territorial bunch and it's often to the detriment of the organization.
Of course the reverse cases exist: where the domain is difficult and ugly hacks had to be done so the project works and makes money. Absolutely. I love receiving this knowledge and integrating it; makes for interesting engineering discussions.
> Greenfield dev, trusted engineer getting from 0 -> 1 on something big? Maybe it shouldn't be held up in committee for 2 weeks. Maybe most objections will be superficial stylistic concerns.
Yep, full agree. And often times these stylistic concerns are not even that; they are often "I suffered here at the beginning, this green-horn should suffer as well!" which is honestly pathetic and it also happens quite a lot.
That's just cope to avoid learning how to turn a big change into a well organized patch series.
I'm not saying one shouldn't learn how to stage large changes into a mature codebase. Sometimes the overhead is very worth it, maybe most times if you're close to the profit center of a faang. But one should understand multiple ways of working, for different situations.
How many times have you reviewed your old code and been appalled at the terrible quality? You personally created slop; it's no different from GenAI output except that a human had to spend precious time crafting it. You likely were indeed bottlenecked by your ability to churn out code that you just had to get to work, for one reason or another.
The real issue is in the asymmetry when one party can use automation to create more code than another party can possibly manually verify.
Anyone trying to suggest that AI hasn't sped up quality code production is just insisting on keeping their head in the sand, IMO.
https://github.com/UnsafeLabs/Bounty-Hunters
The corresponding leaderboard:
It's likely to get blacklisted by AI bots, soon enough, though.
I'm not trying to suggest they _need_ to implement it. Like I said, closing it is reasonable. Completely aside from any other considerations, one could just decide that they don't feel like dealing with it. But there are other options.
For those who encounter bugs as part of their employment, they'd now need to convince their employer to fork over money up front. For most employers, getting them to spend even insignificant money is like pulling teeth.
But even for the self-employed or hobbyists, gambling real money on "are they going to be a jerk about my exploit report". No offense towards Turso, but the bulk of software firms are TERRIBLE about handling reports like that. Many already have unstated policies of screwing people out of deserved bug bounties at every step.
To submit such reports today already requires you to accept that your work is statistically, just going to be a bunch of free labour that you gave away for the betterment of the product's users. Adding a cash fee just further deters submissions, especially once people haven't gotten their money back a few times. (Consider how many "AI detection tools" are themselves incredibly unreliable machine learning or sometimes even LLM systems)
I'd say closing a program which doesn't work anymore is a better idea.
If they have to pay for reviewer time for each of 1000 reports, then the scheme stops being viable.
If you can think of something that isn't solved by one of those two mechanisms, I'd be interested in hearing them enumerated.
It's even possible to directly link this to maintainers/employees - if you can review 10 such AI/real things per hour (likely more if it's AI slop that's easy to detect), you're generating another revenue stream. Now, I have no idea if these guys are based in SF Bay or a 3rd world country with low COL but as an "add on", $100 an hour isn't too shabby (and can be on the "low end" if one's good at spotting AI crap.)
Side note, isn't it possible to have some way to verify if the "vulns" are actual vulns or not? ...Heck why not throw an LLM at it, powered by a single $10 submission fee?
It can't be on individual maintainers to stop this, imo its on Github (and Gitlab) to stop these sort of accounts from even getting to the point of submitting PRs. Its essentially spam.
Look at the user who created the first PR they reference https://github.com/Samuelsills. This is not an account that should be allowed to do anything close to opening a PR against a well known repo.
joking, but maybe not?
I was thinking of using it for my full stack Rust apps just so everything works with cargo and I don't have to bring in SQLite separately.
> It is possible to set up automated systems to gatekeep this, but with a non-negligible dollar value attached to it, the incentive is just too great for the AIs to just keep arguing, reopening the same PR, etc.
By "bought" I don't mean they won't sponsor stuff. I mean they've got a public standard that can be trusted to some degree.
Your final example isn't exactly what I'm thinking of here. I'm thinking that a well-known identity and name within a community bypasses a lot of this BS with AI slop and communities bombarded by the slop will continue to close themselves off which will increase the value of being a known, contributing member.
Idk I need to figure out a way to articulate this better but essentially the value of being verifiably human is increasing IMO.
I think we're very close to those two lines crossing. Which is another way of saying that people might care today whether something was generated by/with AI, but I don't think they will care soon. Humans will still decide what gets created, but the how won't matter as much.
You might be right that the software equivalent of a sourdough-baking Reddit community will continue to exist. But most people will buy bread at the store and have no idea how it's made.
For example, our community [0] asks you to submit an application before you're granted an invite code. If you attend a meetup in person we'll grant a "Verified Human" badge too. This gives you the power to invite others into the fortress: you're responsible for them.
The price to pay is steep because community growth is now glacial. It really does solve the slop problem though. (I'm also no longer convinced maximizing growth is Good.) Maybe there's some in-between solution for those who dislike invite-only spaces.
Edit: it is genuinely wild, I don't know of another product category that selects so perfectly for the WORST type of person to be it's enthusiast. Just every single person I see hyped about AI is fucking insufferable on at least one and usually multiple axis.
Of course, I suspect you knew that.
AI is the fucking problem. Yes, it has (some) uses. It is not nearly the number advertised. And more and more the median use case seems to be, again, overloading people actually trying to do work with an avalanche of bullshit.
The solution is exactly what the linked article says: shut it down. The AI people have ruined another good thing that was both beneficial to the project, and to a number of individuals.
China says no. what are you going to do now, sanction it? =)
At this point it's impossible, so I concur with the parent: forget about the shutting it down and think of something actually realistic.
Last month I tried my hand at finding a way to tell whether an OSS project is slop or not, based on the amount of "human attention" it received vs the amount of code it contains. The idea is that a 100k LOC project which received 3 days' worth of attention from a human is most certainly slop.
The approach doesn't work very well, though¹, mostly because it's hard to gauge the amount of attention that was given. If I see one commit with +3000 LOC, I can assume it's AI-generated, but maybe you're just the type of dev that commits infrequently.
Maybe we need some sort of "proof of human attention" for digital artifacts, that guarantees that a human spent X time working on it.
¹ I wrote about it here https://pscanf.com/s/352/
I stay pretty busy[0], and have been accused of "gaming" my GH repos.
That's not the case. I'm retired, experienced, and working on software all day, every day. I just don't get paid for it.
I also don't especially care, whether or not anyone thinks I'm a bot. I eat my own dogfood. Most of my work is on modules that I use in my own projects.
Humans are bad at writing code. Garbage PRs and slop have been a problem in open source and bug bounty programs since long before AI came on the scene.
We need better AI so that there's no need to solicit external bug fixes, and better AI so other contributions can be evaluated for usefulness and quality.
What do you care if a human ever looked at it at all? It implies that humans are adding value to the process. It's possible for a human to add value. The right human can add tremendous value. But I'll take a completely autonomous AI over 99% of the human software engineers and 99% of the people contributing PRs and bugfixes.
It was hard to keep up with slop before. It's a lot harder now. AI will help weed through the garbage.
AI lets good-faith bug hunters look through more repos they are not deeply familiar with. They may recognize a bad pattern quickly, almost like a very specialized static-analysis rule. But without project context, it is not always clear whether something is a real bug, a footgun, expected behavior, or just out of scope.
The blog shows obvious slop examples, but I think borderline accepted vs rejected examples would be more useful. They would help people understand what is worth reporting and what would just drain maintainers.
It could also help to ask reporters to clarify how the bug was found so you let people set reasonable expectations: "AI-found and manually confirmed", "AI-assisted", or "no AI used".
And why would they tell the truth?
If the bug hunter is acting in good faith, they can communicate how much scrutiny they think their report deserves, which may reduce maintainer frustration.
If the bug hunter is acting in bad faith, and they claim "no AI used" but the report shows obvious AI-generated content, detectable by a classifier, maintainers can dismiss it more easily.
The project does not accept bug bounty submissions without BBBS attestation. To get it, you must first submit your report to the BBBS for review.
Now, if this is your first submission (you are unknown to the BBBS), you must submit $50 to the BBBS along with the bug report, to pay a human to spend an hour looking at your work to verify it is written in good faith. This is not a review of whether the bug is real or valuable, just a readover to verify the report is coherent and plausible. If you have done this before, you can get a free attestation based on being a member in good standing, but submitting slop (per the judgement of the BBBS reviewer or the project receiving the report) is an account ban.
The BBBS couldn't steal your work and submit it themselves if they gave you some sort of signed hash as a receipt, which as a side effect would also be a deterrant against bounty programs stealing your work.
Submissions would only be expensive per submission for an anonymous user, enabling the low friction high trust communication under which collaboration works best when reputation has been established.
The BBBS itself won't be overrun by slop since the price of establishing an account far exceeds what a bot might expect to make with a single malicious submission. Nor can legitimate established accounts be sold since the cost of creating them exceeds the value to be expected from abusing them. Moreover, the cost to establish a reputation as a bug bounty hunter is small in dollars compared to the cost in time and expertise that a legitimate hunter would be expected to expend in the course of their work.
The vast majority of slop would go away as the cost of a first submission is much too high. The cost to the project is close to nothing - integrating with the BBBS attestation API. The cost to a legitimate bug bounty hunter is low - some human review while establishing a reputation, which could even be made useful if it came in the form of feedback. All review is paid for by the submitter, so no one is trying to counter infinite slop with volunteer hours.
Moreover, the BBBS can serve as a mediator of trust, not only against AI, but as a place to receive reputational merit for high value work and trustworthy bug bounty programs.
I realize I am describing a lightweight guild, which is subject to well known political failure modes (the most significant of which is exploiting newcomers), but the concept has the advantage that guilds have functioned as successful slop gatekeepers in society for a very long time and a lot is known about how to make them work.
...large swaths of approaches on online engagement just becoming non-viable
Someone automated rewriting Bun in Rust, allegedly fixing the bugs.
>the author just injected garbage bytes manually into the database header, and then argued that this corrupted the database
>Steps to reproduce: Modified cli/main.rs to include a Vec with limited capacity. Forced a volatile write beyond the allocated bounds using std::ptr::write_volatile.
>author claims to have found a critical vulnerability that allows for the execution of arbitrary SQL statements. Imagine that? A SQL database that allows the execution of SQL statements. How can we ever recover from this.
I wonder why are they even doing this. Do any of these PRs ever win any money? It feels like they are burning down a forest thinking they'll find gold if they do it, without any evidence that there will be any gold after the forest is burnt down.
(Okay Claude is too expensive, but Deepseek can probably handle it.)
Skynet has won.
*Edit - I get it. It seems like the authentication is a challenge.
New identities are cheap.
Denominated in BTC to avoid chargebacks etc.