I believe the real issue is that LLMs are still so bad at reasoning. In my experience, the worst hallucinations occur where only handful of sources exist for some set of facts (e.g laws of small countries or descriptions of niche products).
LLMs know these sources and they refer to them but they are interpreting them incorrectly. They are incapable of focusing on the semantics of one specific page because they get "distracted" by their pattern matching nature.
Now people will say that this is unavoidable given the way in which transformers work. And this is true.
But shouldn't it be possible to include some measure of data sparsity in the training so that models know when they don't know enough? That would enable them to boost the weight of the context (including sources they find through inference time search/RAG) relative to to their pretraining.