It all depends on whose specification you’re assessing the “bugginess” against, the inference code as written, the research paper, colloquial understanding in technical circles, or how the product is pitched and presents to users.
And this is why I feel it's so important to fix the way we talk about hallucinations. Engineers need to be extremely clear with product owners, salespeople, and other business folks about the inherent limitations of LLMs—about the fact that certain things, like factual accuracy, may asymptotically approach 100% accuracy but will never reach it. About the fact that even getting asymptotically close to 100% is extremely (most likely prohibitively) expensive. And once they've chosen a non-zero failure rate, they have to be clear about what the consequences of the chosen failure rate are.
Before engineers can communicate that to the business side, they have to have that straight in their own heads. Then they can communicate expectations with the business and ensure that they understand that once you've chosen a failure rate, individual 'hallucinations' can't be treated as bugs to troubleshoot—you need instead to have an industrial-style QC process that measures trends and reacts only when your process produces results outside of a set of well-defined tolerances.
(Yes, I'm aware that many organizations are so thoroughly broken that engineering has no influence over what business tells customers. But those businesses are hopeless anyway, and many businesses do listen to their engineers.)
You are wrong here - my company can fix individual responses by adding specific targeted data for the RAG prompt. So a JIRA ticket for a wrong response can be fixed in 2 days.
At scale, your solution looks like bolting an expert system on top of the LLM. Which is something that some researchers and companies are actually working on.
Because it gives you two ends from which to approach the business challenge. You can improve the fitness—the functionality itself. But you can also adjust the purpose—what people expect it to do.
I think a lot of the concerns about LLMs come down to unrealistic expectations: oracles, Google killers, etc.
Google has problems finding and surfacing good info. LLMs are way better at that… but they err in the opposite direction. They are great at surfacing fake info too! So they need to be thought of (marketed) in a different way.
Their promise needs to be better aligned with how the technology actually works. Which is why it’s helpful to emphasize that “hallucinations” are a fundamental attribute, not an easily fixed mistake.
People also blithely trust other humans even against all evidence that they're trustworthy. Some things just aren't fixable.