Over the past two years I’ve built and debugged a fair number of production pipelines—mainly retrieval‑augmented generation stacks, agent frameworks, and multi‑step reasoning services. A pattern emerged: most incidents weren’t outright crashes, but silent structural faults that slowly compromised relevance, accuracy, or stability.
I began logging every recurring fault in a shared notebook. Colleagues started using the list for post‑mortems, so I turned it into a small public reference: 16 distinct failure modes (semantic drift after chunking, embedding/meaning mismatches, cross‑session memory gaps, recursion traps, etc.). The taxonomy isn’t academic; each item references a real outage or mis‑prediction we had to fix.
Why share it?
Common vocabulary – naming a failure mode makes root‑cause discussions faster and less hand‑wavy.
Earlier detection – several teams now check new features against the list before shipping.
Community feedback – if something is missing or misclassified, I’d rather learn it here than during another 3 a.m. incident.
The reference has already helped a few startups (and my own projects) avoid hours of trial‑and‑error. If you work on LLM infrastructure, you might find a familiar bug—or a new one to watch for. The link to the full table and brief write‑ups is in the “url” field of this Show HN post.
I’m not selling anything; it’s MIT‑licensed text. Comments, critiques, or additional failure patterns are very welcome.
Thanks for taking a look.