undefined | Better HN

0 pointsmrguyorama1y ago0 comments

>The meme that LLMs just generate statistically plausible text is wrong and has been from the start

Did you read that paper? It doesn't support discarding this "meme" at all. More importantly, I don't think it adequately supports that LLMs "know facts"

FFS, the actual paper is about training models on the LLM state to predict whether it's actual output is correct. The interesting finding to them is that their models predict about a 75% chance of being correct even before the LLM starts generating text, that the conversation part of the answer has a low predicted chance of being correct, and that the "exact answer", a term they've created, is usually where the chance the LLM is correct (according to their trained model) peaks.

What they have demonstrated is that you can build a model that looks at in memory LLM state and have a 75% chance of guessing whether the LLM will produce the correct answer based on how the model reacts to the prompt. Even taking as a given (which you shouldn't in a science paper) that there's no trickery going on in the Probe models, accidental or otherwise, this is perfectly congruent with the statement that LLMs only "generate statistically probable text in the context of their training corpus and the prompt"

Notably, why don't they demonstrate that you can predict whether a trained but completely unprompted model will "know" the answer? Why does the LLM have to process the conversation before you can >90% chance predict whether it will produce the answer? If the LLM stores facts in it's weights, you should be able to demonstrate that completely at rest.

IMO, what they've actually done is produce "Probe models" that can 75% of the time correctly predict whether an LLM will produce a certain token or set of tokens in it's generation. That is coherent with an LLM model being, broadly speaking, a model of how tokens relate to each other from a point of view of language. The main quibble in these discussions is that doesn't constitute "knowing" IMO. LLMs are a model of language, not reality. That's why they are good at producing accurate language, and bad at producing accurate reality. That most facts are expressed in language doesn't mean language IS facts.

A question: Why don't LLMs produce garbage grammar when they "hallucinate"?

0 comments

mike_hearn1y ago

> why don't they demonstrate that you can predict whether a trained but completely unprompted model will "know" the answer?

The answer to what? You have to ask a question to test whether the answer will be accurate, and that's the prompt. I don't understand this objection.

> If the LLM stores facts in it's weights, you should be able to demonstrate that completely at rest.

Sure, with good enough interpretability systems, and those are being worked on. Anthropic can already locate which parts of the model fire on specific topics or themes and force them on or off by manipulating the activation vectors.

> A question: Why don't LLMs produce garbage grammar when they "hallucinate"?

Early models did.

j / k navigate · click thread line to collapse