pomarie on Hacker News

Why AI code fails differently: What I learned talking to 200 engineering teams

Hey HN, I'm Paul, co-founder of cubic (YC X25). Over the past few months, I've talked to over 200 engineering teams about how they're using AI to ship code.

I kept hearing the same pattern: some teams are shipping 10-15 AI PRs daily without issues. Others tried once, broke production, and gave up entirely.

The difference wasn't what I expected– it wasn't about model choice or prompt engineering.

---

One team shipped an AI-generated PR that took down their checkout flow.

Their tests and CI passed, but AI had "optimized" their payment processing by changing `queueAnalyticsEvent()` to `analytics.track()`. The analytics service has a 2-second timeout so when it's slow, payment processing times out.

In prod, under real load, 95th percentile latency went from 200ms to 8 seconds. Ended up with 3hh of downtime and $50k in lost revenue.

Everyone on that team knew you queue analytics events asynchronously, but that wasn't documented anywhere. It's just something they learned when analytics had an outage years ago.

*The pattern*

Traditional CI/CD catches syntax errors, type mismatches, test failures.

The problem is that AIs don't make these mistakes. (Or at least, tests and lints catch them before they get committed). The problem with AI is that it generates syntactically perfect code that violates your system's unwritten rules.

*The institutional knowledge problem*

Every codebase has landmines that live in engineers' heads, accumulated through incidents.

AIs can't know these, so they fall into the traps. It's then on the code reviewer to spot them.

*What the successful teams do differently*

They write constraints in plain English. Then AI enforces them semantically on every PR. Eg. "All routes in /billing/* must pass requireAuth and include orgId claim"

AI reads your code, understands the call graph, and blocks merges that violate the rules.

*The bottleneck*

When you're shipping 10x more code, validation becomes the constraint; not generation speed.

The teams shipping AI at scale aren't waiting for better models. They're using AI to validate AI-generated code against their institutional knowledge.

The gap between "AI that generates code" and "AI you can trust in production" isn't about model capabilities, it's about bridging the institutional knowledge gap.

8pomarie6mo ago5

Ask HN: What will humans do when AI writes and reviews code?

I’m noticing more and more teams are using cubic (an code reviewer – I’m the founder) to review PRs written by AIs (eg. Devin, Codex…).

Some teams already run an entirely AI-driven pipeline where AI writes the code, reviews it, and humans just click Merge at the end.

Interestingly, the AI reviewer often finds errors in AI-generated code. For now, AIs can review AIs code with OK results – probably better than a junior-mid level dev.

It got me thinking of the future…

1. Junior developer growth*:* Traditionally, code reviews are how junior developers learn. Have you noticed changes in how they learn since codegen has become a thinh? 2. Society expects self-driving cars to be dramatically safer (e.g. 10x better) before accepting them. Do you expect similar standards from AI code reviewers, or is slightly better than human good enough? 3. As AI surpass humans at writing and reviewing code (at least for technical correctness), how do you see the role of code reviews changing? 4. Philosophically, do you think code generation AIs will ever become so effective that specialized AI code reviewers won’t find any issues?

Ask HN: When AI writes and reviews code, what do humans do?

I’m noticing more and more teams are using cubic (an code reviewer – I’m the founder) to review PRs written by AIs (eg. Devin, Codex…).

Some teams already run an entirely AI-driven pipeline where AI writes the code, reviews it, and humans just click Merge at the end.

Interestingly, the AI reviewer often finds errors in AI-generated code. For now, AIs can review AIs code with OK results – probably better than a junior-mid level dev.

Qs:

1. Traditionally, code reviews are how junior developers learn. Have you noticed changes in how they learn since codegen has become a thing?

2. Society expects self-driving cars to be dramatically safer (e.g. 10x better) before accepting them. Do you expect similar standards from AI code reviewers, or is slightly better than human good enough?

3. As AI surpass humans at writing and reviewing code (at least for technical correctness), how do you see the role of code reviews changing?

4. Philosophically, do you think code generation AIs will ever become so effective that specialized AI code reviewers won’t find any issues?

Why AI code fails differently: What I learned talking to 200 engineering teams

Hey HN, I'm Paul, co-founder of cubic (YC X25). Over the past few months, I've talked to over 200 engineering teams about how they're using AI to ship code.

I kept hearing the same pattern: some teams are shipping 10-15 AI PRs daily without issues. Others tried once, broke production, and gave up entirely.

The difference wasn't what I expected– it wasn't about model choice or prompt engineering.

---

One team shipped an AI-generated PR that took down their checkout flow.

In prod, under real load, 95th percentile latency went from 200ms to 8 seconds. Ended up with 3hh of downtime and $50k in lost revenue.

Everyone on that team knew you queue analytics events asynchronously, but that wasn't documented anywhere. It's just something they learned when analytics had an outage years ago.

*The pattern*

Traditional CI/CD catches syntax errors, type mismatches, test failures.

*The institutional knowledge problem*

Every codebase has landmines that live in engineers' heads, accumulated through incidents.

AIs can't know these, so they fall into the traps. It's then on the code reviewer to spot them.

*What the successful teams do differently*

They write constraints in plain English. Then AI enforces them semantically on every PR. Eg. "All routes in /billing/* must pass requireAuth and include orgId claim"

AI reads your code, understands the call graph, and blocks merges that violate the rules.

*The bottleneck*

When you're shipping 10x more code, validation becomes the constraint; not generation speed.

The teams shipping AI at scale aren't waiting for better models. They're using AI to validate AI-generated code against their institutional knowledge.

The gap between "AI that generates code" and "AI you can trust in production" isn't about model capabilities, it's about bridging the institutional knowledge gap.

Ask HN: What will humans do when AI writes and reviews code?

I’m noticing more and more teams are using cubic (an code reviewer – I’m the founder) to review PRs written by AIs (eg. Devin, Codex…).

Some teams already run an entirely AI-driven pipeline where AI writes the code, reviews it, and humans just click Merge at the end.

Interestingly, the AI reviewer often finds errors in AI-generated code. For now, AIs can review AIs code with OK results – probably better than a junior-mid level dev.

It got me thinking of the future…

Ask HN: When AI writes and reviews code, what do humans do?

I’m noticing more and more teams are using cubic (an code reviewer – I’m the founder) to review PRs written by AIs (eg. Devin, Codex…).

Some teams already run an entirely AI-driven pipeline where AI writes the code, reviews it, and humans just click Merge at the end.

Interestingly, the AI reviewer often finds errors in AI-generated code. For now, AIs can review AIs code with OK results – probably better than a junior-mid level dev.

Qs:

1. Traditionally, code reviews are how junior developers learn. Have you noticed changes in how they learn since codegen has become a thing?

3. As AI surpass humans at writing and reviewing code (at least for technical correctness), how do you see the role of code reviews changing?

4. Philosophically, do you think code generation AIs will ever become so effective that specialized AI code reviewers won’t find any issues?

pomarie

Recent submissions

We fixed a critical auth bypass in Cloudflare's AI-generated Next.js (opens in new tab)

Show HN: cubic 2.0 – improving our AI code reviewer (3x more accurate,2x faster) (opens in new tab)

Anduril's autonomous weapons stumble in tests and combat (opens in new tab)

OpenAI prepares GPT-5.1-Codex-MAX for large-scale projects (opens in new tab)

Musk, Bezos and Pichai Want AI Data Centers in Space (opens in new tab)

America Is All-In on Deep Learning; China Emphasises Robotics and Hardware (opens in new tab)

Experts question Anthropic's claims of cyberattacks using its tools (opens in new tab)

Why AI code fails differently: What I learned talking to 200 engineering teams

The real problem with AI coding (opens in new tab)

Learnings from building AI agents (opens in new tab)

Ask HN: What will humans do when AI writes and reviews code?

Ask HN: When AI writes and reviews code, what do humans do?

Recent submissions

We fixed a critical auth bypass in Cloudflare's AI-generated Next.js (opens in new tab)

Show HN: cubic 2.0 – improving our AI code reviewer (3x more accurate,2x faster) (opens in new tab)

Anduril's autonomous weapons stumble in tests and combat (opens in new tab)

OpenAI prepares GPT-5.1-Codex-MAX for large-scale projects (opens in new tab)

Musk, Bezos and Pichai Want AI Data Centers in Space (opens in new tab)

America Is All-In on Deep Learning; China Emphasises Robotics and Hardware (opens in new tab)

Experts question Anthropic's claims of cyberattacks using its tools (opens in new tab)

Why AI code fails differently: What I learned talking to 200 engineering teams

The real problem with AI coding (opens in new tab)

Learnings from building AI agents (opens in new tab)

Ask HN: What will humans do when AI writes and reviews code?

Ask HN: When AI writes and reviews code, what do humans do?