I'm not an expert on the topic, but to me it sounds plausible that a good part of the problem of confabulation comes down to misaligned incentives. These models are trained hard to be a 'helpful assistant', and this might conflict with telling the truth.
Being free of hallucinations is a bit too high a bar to set anyway. Humans are extremely prone to confabulations as well, as can be seen by how unreliable eye witness reports tend to be. We usually get by through efficient tool calling (looking shit up), and some of us through expressing doubt about our own capabilities (critical thinking).
I don't think "wrong memory" is accurate, it's missing information and doesn't know it or is trained not to admit it.
Checkout the Dwarkesh Podcast episode https://www.dwarkesh.com/p/sholto-trenton-2 starting at 1:45:38
Here is the relevant quote by Trenton Bricken from the transcript:
One example I didn't talk about before with how the model retrieves facts: So you say, "What sport did Michael Jordan play?" And not only can you see it hop from like Michael Jordan to basketball and answer basketball. But the model also has an awareness of when it doesn't know the answer to a fact. And so, by default, it will actually say, "I don't know the answer to this question." But if it sees something that it does know the answer to, it will inhibit the "I don't know" circuit and then reply with the circuit that it actually has the answer to. So, for example, if you ask it, "Who is Michael Batkin?" —which is just a made-up fictional person— it will by default just say, "I don't know." It's only with Michael Jordan or someone else that it will then inhibit the "I don't know" circuit.
But what's really interesting here and where you can start making downstream predictions or reasoning about the model, is that the "I don't know" circuit is only on the name of the person. And so, in the paper we also ask it, "What paper did Andrej Karpathy write?" And so it recognizes the name Andrej Karpathy, because he's sufficiently famous, so that turns off the "I don't know" reply. But then when it comes time for the model to say what paper it worked on, it doesn't actually know any of his papers, and so then it needs to make something up. And so you can see different components and different circuits all interacting at the same time to lead to this final answer.
One demo of this that reliably works for me:
Write a draft of something and ask the LLM to find the errors.
Correct the errors, repeat.
It will never stop finding a list of errors!
The first time around and maybe the second it will be helpful, but after you've fixed the obvious things, it will start complaining about things that are perfectly fine, just to satisfy your request of finding errors.
Not my experience. I find after a couple of rounds it tells me it's perfect.
"In the field of artificial intelligence (AI), a hallucination or artificial hallucination (also called bullshitting,[1][2] confabulation,[3] or delusion[4]) is"
"Well humans break their leg too!"
It is just a mindlessly stupid response and a giant category error.
The way an airplane wing and a human limb is not at all the same category.
There is even another layer to this that comparing LLMs to the brain might be wrong because the mereological fallacy is attributing the brain "thinks" vs the person/system as a whole thinks.
People, when tasked with a job, often get it right. I've been blessed by working with many great people who really do an amazing job of generally succeeding to get things right -- or at least, right-enough.
But in any line of work: Sometimes people fuck it up. Sometimes, they forget important steps. Sometimes, they're sure they did it one way when instead they did it some other way and fix it themselves. Sometimes, they even say they did the job and did it as-prescribed and actually believe themselves, when they've done neither -- and they're perplexed when they're shown this. They "hallucinate" and do dumb things for reasons that aren't real.
And sometimes, they just make shit up and lie. They know they're lying and they lie anyway, doubling-down over and over again.
Sometimes they even go all spastic and deliberately throw monkey wrenches into the works, just because they feel something that makes them think that this kind of willfully-destructive action benefits them.
All employees suck some of the time. They each have their own issues. And all employees are expensive to hire, and expensive to fire, and expensive to keep going. But some of their outputs are useful, so we employ people anyway. (And we're human; even the very best of us are going to make mistakes.)
LLMs are not so different in this way, as a general construct. They can get things right. They can also make shit up. They can skip steps. The can lie, and double-down on those lies. They hallucinate.
LLMs suck. All of them. They all fucking suck. They aren't even good at sucking, and they persist at doing it anyway.
(But some of their outputs are useful, and LLMs generally cost a lot less to make use of than people do, so here we are.)
LLMs are being sold as viable replacement of paid employees.
If they were not, they wouldn’t be funded the way they are.
The purpose of mechanisation is to standardise and over the long term reduce errors to zero.
Otoh “The final truth is there is no truth”
It is bad only in case of reporting on facts.
Gemini (the app) has a "mitigation" feature where it tries to to Google searches to support its statements. That doesn't currently work properly in my experience.
It also seems to be doing something where it adds references to statements (With a separate model? With a second pass over the output? Not sure how that works.). That works well where it adds them, but it often doesn't do it.
Reality is perfectly fine with deception and inaccuracy. For language to magically be self constraining enough to only make verified statements is… impossible.
It might be true that a fundamental solution to this issue is not possible without a major breakthrough, but I'm sure you can get pretty far with better tooling that surfaces relevant sources, and that would make a huge difference.
One area that I've found to be a great example of this is sports science.
Depending on how you ask, you can get a response lifted from scientific literature, or the bro science one, even in the course of the same discussion.
It makes sense, both have answers to similar questions and are very commonly repeated online.