The weights, so to speak, come from the knowledge base. That means you can't get away from the quality of the knowledge base. That isn't uniform across all domains of knowledge. Then the problem becomes how do you make the training material uniformly high-quality in every knowledge domain? At best it becomes the meta problem of determining the quality of knowledge in some way that makes an LLM able to calibrate confidence to a knowledge domain. But more likely we're stuck with the dubious quality that comes from human bias and wishful thinking in supposedly authoritative material.
Gemini seems to have a user interface that, for the way most people encounter Gemini, is more closely linked to search results. This leads me to suspect that Google's approach to training could be uniquely informed by both current and historic web crawling.
That's not what I said. What I said is that the claim "LLMs aren't intelligent because they stochastically produce characters" doesn't hold because humans do that too even if they're intelligent and authorative.
Find tuning and RAG should, in theory, enable applications of LLM's to perform better in specific knowledge, domains, by focusing annotation of knowledge on the domains specific to the application.
I think youre missing the point. The issue is not the amount of knowledge it possesses. The problem is that theres no way to go from "statistically generate the next word" to "what is your confidence level in the fact you just stated". Maybe, with an enormous amount of computation we could layer another AI on top to evaluate or add confidence intervals, but I just dont see how we get there wihthout another quantum leap.
Of course there is. If its training forces it to develop a theory of mind then it will weight the dice so that it's more likely to output "I don't know". Most likely the culprit is that it's hard to make training data for things that it doesn't know.