Stealth ads are annoying.
Someone working on AI tooling for code reviews is exactly the right person I'd want to get an opinion from on the space, otherwise it's just opining with no validation.
I just thought it would be a good idea to share what I have learnt.
It just talks like it's very smart, and humans apparently have a bias for persuasive communication skills. It's also very fast, which humans also think indicates general intelligence. But it's not, and that's why most LLM tools are author-focused, so that a human expert can catch errors.
The way you know fully autonomous driving is nowhere near ready is by noticing we don't even trust robots to do fully autonomous cooking and cleaning. Similarly, let's see it understand and refactor a massive codebase first.
- More than being "good enough", it is about taking responsibility. - A human can make more mistakes than an AI, and they are still the more appropriate choice because humans can be held responsible for their actions. AI, by its very nature, cannot be 'held responsible' -- this has been agreed upon based on years of research in the field of "Responsible AI". - To completely automate anything using AI, you need a way to trivially verify whether it did the right thing or not. If the output cannot be verified trivially, you are just changing the nature of the job, and it is still a job or a human being (like the staff you mentioned who remotely control Waymos when something goes wrong). - If an action is not trivially verifiable and requires AI's output to directly reach the end-user without a human-in-the-loop, then the creator is taking a massive risk. Which usually doesn't make sense for a business when it comes to mission-critical activities.
In Waymo's case, they are taking massive risks because of Google's backing. But it is not about being 'good enough'. It is about the results of the AI being trivially verifiable - which, in the case of driving, is true. You just need three yes/no answers: Did the customer reach where they wanted? Are they safe? Did they arrive on time? Are they happy with the experience?
How do you reconcile this claim with Waymo's dramatically increased rate of expansion these past few years?
https://www.businessinsider.com/robotaxis-may-mobility-tesla...
High operational costs, low revenue potential, technical difficulties, competitors exiting the space.
Ai is better than humans at all those things. It's not good at those things when the context it needs to look over is more than a few thousand tokens.
Rejoice programmer, for your inability to write modular code saved your job.
As a sole developer of a non-trivial open source project, I've recently started using CodeRabbit, very skeptical about it, but right on the first PR, it actually found a bug that my CI tests did not catch, decided to keep it after that.
Gemini Code Assist on the other hand, the first suggestion it did would actually lead to a bug, so that was out immediately.
At this stage, you don't need "another set of eyes" because it is not that big of a problem to break something, as you are not going to lose massive amounts of money because of the mistake.
All these teams need is a sanity check. They also generally (even without the AI code reviewers) do not have a strong code review process.
This is why, in the article, I have clearly mentioned that these are learning based on talking to engineers in Series-B and Series-C startups.
> most AI code review tools on the market today are fundamentally author-focused, not reviewer-focused.
This pretty much describes our experience. Our engineers create a PR and now wait for the review bot to provide feedback. The author will any fix any actual issues the bot brings up and only then will they publish the PR to the rest of the team.
From our experience there are 4 things that make the bot valuable:
1. Any general logical issues in the code are caught with relative certainty (not evaluating a variable value properly or missing a potential edge case, etc).
2. Some of the comments the bot leaves are about the business logic in code and asking about it and having the author provide a clearer explanation helps reviewers to understand what's going on as well if it wasn't clear enough from the code itself.
3. We provide a frontend platform to other engineers in the company that our operations teams interact with. The engineers rarely implement more than 1-2 features a year. We gave the bot a list of coding and frontend guidelines that we enforce (capitalisation rules, title formatting, component spacing, etc) and it will remind reviewers about these requirements.
4. We told it to randomly change it's way of talking from Yoda to Dr Seuss and some of the comments, while correct on a technical level, are absolutely hilarious and can give you a short giggle in an otherwise stressful day.
The commentary given above is invalid if due to the preferences of the human developers or just weird protocol in their working relationship they end up with different AI's in the two instances. But I think in the long term equilibrium this point applies.
In the maker-checker process, if we are imagining a future where AI will be writing/editing most of the code, the AI-code-review tools will need to integrate within its agentic process.
And the job of a better code-review interface (like the one that I am trying to build) would be to provide a higher level of abstraction to the user so that they can verify the output of the AI code generators more effectively.
But if you not also reviewing the feature-to-main branch pull request, you are just inviting problems. That is a bigger CR that you should review carefully, and there is no way that could be a small CR.