What you are missing here is that the "hallucinations" you don't like and the "results" you do like are, in terms of the underlying process, exactly the same thing. They are not an aberration you can remove. Producing these kinds of results without "hallucinations" is going to require fundamentally different techniques. It's not a "last gap".
Humans have a condition called schizophrenia where we literally are incapable of differentiating hallucination and reality. What that capability is, is something we need to find out and discover for both ourselves and LLMs.
For example: Mathematically speaking it's possible to know how far away an inferenced point is away from a cluster of real world data. That delta when fed back into the neural network can allow the LLM to know how speculative a response is. From there we can feed the response back into itself for refinement.
And even if we were to cure schizophrenia in humans, just what makes you think that it would apply to LLMs? Having an extremely weak conceptual model of the world and not being able to reason out of rather simple problems (like LLMs struggle with) isn't schizophrenia.
This oversimplified explanation which posits that neural networks are just like human brains has truly gone too far now.
> Mathematically speaking it's possible to know how far away an inferenced point is away from a cluster of real world data.
And mathematically speaking, how would you accomplish this? As you probably know LLMs don't operate on conceptual ideas, they operate on tokens. That's why LLMs tend to fail when asked to do things that aren't well represented in their training data, they don't have a working model of the world even if they can fake it to a certain degree.
A weak conceptual model of the world is the problem. But realize humans also have a weak conceptual model of the world as well and make a bunch of hallucinations based on that weak model. For example many people are still making the claim about LLMs that it’s all stochastic parroting when it’s been proven that it’s not. That is an hallucination. Or the people betting (and not) on the financial success of crypto or AI. We don’t know how either of these things will pan out but people on either team act as if they know definitively. A huge part of human behavior is driven by hallucinations that fill in gaps.
> And mathematically speaking, how would you accomplish this? As you probably know LLMs don't operate on conceptual ideas, they operate on tokens. That's why LLMs tend to fail when asked to do things that aren't well represented in their training data, they don't have a working model of the world even if they can fake it to a certain degree.
It’s not an incorrect model of the world as technically both you and an LLM ultimately have an incorrect model of the world and both you and the LLM fake it. The best you can say is that the LLM has a less accurate approximation of the world than you but ultimately both you and the LLM hold an incorrect model and both you and the LLM regularly hallucinate off of it. You also make up bullshit on things not well represented in your own model.
But like I said we are often (and often not) aware of our own bullshit so providing that to the LLM quantitatively will help it too.
The LLM is not just trained on random tokens it’s trained on highly specific groups of tokens and those groups of represent conceptual ideas. So an LLM is 100 percent trained on concepts and tokens are only an encoding of that concept.
If a group of tokens represents a vector then we can for sure calculate distance between vectors. We know that there are also different types of vectors represented at each layer of the feed forward network that encode reasoning and not just the syntactic order of the tokens.
Like literally there is not very much training data of a human giving instructions to someone to write code and the associated code diff. The fact that an LLM can do this to a useable degree without volumes of similar training data speaks to the fact it knows concepts. This is the same tired argument that has been proven wrong. We already know LLMs aren’t just parroting training data as the majority of the agentic coding operations we currently use LLMs for actually don’t have associated training data to copy.
Given that we know all of these embeddings from the training data (the model had to calculate the embeddings at one point) we can encode proximity and distance into the model via addition and subtraction of the magnitude of vectors and from this we extract a number that ascertains distance between vectors embeddings.
Imagine a best fit 2D curve through a scatter plot of data points. But at the same time that curve has a gradient color along it. Red indicates its very close to existing data points blue indicates its far. We can definitely derive and algorithm that calculates the additional “self awareness” dimension here encoded in color and this can extend to the higher dimensional encoding that is the LLM.
If an LLm is aware that the output is red or blue then it can sort of tell that if the line is blue it’s likely to be an hallucination.