AI for planning, AI for RFC, AI for writing code, AI for creating PRs. Sure we can have harnesses and tests to ensure nothing breaks. But how do we enforce engineers to have a deep understanding of the code that they are shipping?
Our team has the usual suggestions: write a plan first, write test cases first, etc. But in this age, how do you verify that the engineer did not simply delegate these tasks to an LLM first?
Also genuinely worried about junior engineers' growth if this is the future.
Diff reading is not a practice forced on developers from above. On the contrary, it is the only way for a developer to stay competent enough to lead the next session properly.
Instead of discussing how to ensure that developers will understand the importance of diff reading, the question here is whether the developers understand they cannot shift the responsibility of creating a mental model of the system away from themselves and still maintain effective control over the agent's behavior.
> But how do we enforce engineers to have a deep understanding of the code that they are shipping?
By not pushing AI-first mindset. When you start tracking token usage metrics, people will optimize for that metric.
If you don't push them, and tell to use AI as a tool, they will optimize for understanding the code
Having said that, if you want to know if an engineer really gave it a thought before having the AI do the work - you can ask things like "why did you decide to design it like this" - in person
Did they do this before AI? Does the company really, truly care about software quality or are they just trying to ship features?
Things like
- in depth code reviews
- encouraging sharing knowledge and helping others
- dedicating time to address technical debt
- giving engineers freedom to explore technologies and solutions
- following best practices for software dev
- hiring the right people
This is one of those things you can't enforce, but your leadership can encourage it by setting examples. If your company does not care about understanding the software by carving out time and explicitly encouraging it, then employees won't either.
With AI, docs are now very cheap to produce and not immediately proof of thought.
e.g. if you see something that doesn't make sense you can't just ask the author to write a doc for it anymore, because they'll just feed that to an LLM.
Of course that is perfect for MVPs, speed runs, getting the general shape of something before you commit to an implementation but for many classes of production code that is too high of a risk to take.
I'd like to hear other's experiences however. Has anyone found a way to get >10X productivity gains with AI in production code with AI?
When you writes code yourself, you only use concepts you have come across previously. But if you generate an AI solution, it may well use concepts one has not previously come across.
What do we expect a junior engineer to do then, if they do not have time to learn the concepts used ? Do we really think they will reject the AI code and implement their own, more familiar, solution ?
For any design docs, have them do a walkthrough with the team. If they have to speak to their plan live and answer questions, it will be obvious if they've put thought into it or not.
The same thing can be done for any PRs that have some depth to them like if they touch critical logic or involve complexity.
1. Did the engineer personally understand this change? 2. Is this change allowed to affect critical parts of the system?
The first one is hard to enforce mechanically. You can require design docs, tests, PR explanations, walkthroughs, etc., but a determined person can route all of that through an LLM too.
The second one is more enforceable, and I think it matters a lot in the AI-coding world.
Not all code deserves the same review posture. A dashboard, script, prototype, migration helper, etc. should be able to move fast. But auth, billing, security-sensitive logic, and core business rules should not quietly depend on code that was “just agent output” or barely reviewed.
The pattern I’ve been experimenting with is explicit trust/review tiers in the codebase:
- low-risk / vibe-coded code can exist - agent-touched files get marked as lower-trust - humans can restore trust after review - CI enforces that high-trust code cannot import lower-trust code - critical directories can be required to stay high-trust
This doesn’t prove the engineer understood the code. Nothing really does.
But it does create review memory in the repo. If a file was touched by an agent, that state is visible in the diff. If someone promotes it back to a trusted tier, that promotion is also visible in the diff, and reviewers can ask “did you actually read this?”
I ended up building a small OSS tool around this idea called Tears: https://github.com/Thillel/tears
The slogan is a bit tongue-in-cheek, but it captures the point: vibe-code responsibly.
What works in my experience:
1. Code review with "explain this to me" questions. Not gotchas, genuine "walk me through why this works." If they can't explain it, it doesn't merge.
2. On-call rotation for what you ship. Nothing motivates understanding like being woken up at 3am by your own code.
3. Pair programming on complex features. Not watching - actually driving together.
The real question is: are they shipping code they don't understand because they're lazy, or because the codebase is so complex that nobody fully understands it? If it's the latter, the problem isn't the engineers - it's the architecture.