undefined | Better HN

0 pointsm348e9122y ago0 comments

The weird problem is with LLM hallucinations is that it usually will acknowledge its mistake and correct itself if you call it out. My question is why can't LLMs included a sub-routine to check itself before answering. Simply asking itself something like "this answer may not be correct, are you sure you're right?"

0 comments

Shrezzing2y ago

>The weird problem is with LLM hallucinations is that it usually will acknowledge its mistake and correct itself if you call it out.

From what I've tested, all of the current models will see a prompt like "are you sure that's correct" and respond "no, I was incorrect [here's some other answer]", irrespective of the accuracy of the original statement.

greenavocado2y ago

In my experience the corrections can be additional hallucinations one after another after pointing out inaccuracies even multiple times in a row.

Eisenstein2y ago

> My question is why can't LLMs included a sub-routine to check itself before answering.

Because LLMs don't work in a way for that to be possible if you operate them on their own.

Here is the debug output of my local instance of Mistral-Instruct 8x7B. The prompt from me was 'What is poop spelled backwards?'. It answered 'puoP'. Let's see how it got there starting with it processing my prompt into tokens:

   'What (3195)', ' is (349)', ' po (1627)', 'op (410)', ' sp (668)', 'elled (6099)', ' backwards (24324)', '? (28804)', '\n (13)', '### (27332)', ' Response (12107)', ': (28747)', '\n (13)',

It tokenized 'poop' as two tokens: 'po', number 1627, and 'op', number 410.

Next it comes up with its response:

   Generating (1 / 512 tokens) [(pu 4.43%) (The 66.62%) (po 11.96%) (p 4.99%)]
   Generating (2 / 512 tokens) [(o 89.90%) (op 10.10%)]
   Generating (3 / 512 tokens) [(P 100.00%)]
   Generating (4 / 512 tokens) [( 100.00%)]

It picked 'pu' even though it was only a ~4% chance of being correct, then instead of picking 'op' it picked 'o'. The last token was a 100% probability of being 'P'.

   Output: puoP

At no time did it write 'puoP' as a complete word nor does it know what 'puoP' is. It has no way of evaluating whether that is the right answer or not. You would need a different process to do that.

ZitchDog2y ago

The problem is that if you call it out, it will frequently change its answer, even if it was correct. LLMs currently lack chutzpa.

samus2y ago

They definitely stand their ground if they were aligned to do so.

Drakim2y ago

But then they stand their ground when wrong too.

Jensson2y ago

That is a common bullshitting strategy, talk a lot of bullshit, and then backtrack and acknowledge you were wrong when people push back. That way they will think you know way more than you do. Many people will see thought that, but most will just think you are a humble expert who can acknowledge when you are wrong instead of you always acknowledging you are wrong even when you aren't.

People have a really hard time catching such bullshitting from humans, which is why free form interviews doesn't work.

asimovfan2y ago

Its because theres no entity that is actually acknowledging anything. Its generating an answer to your prompt. You can gaslight it into anything being wrong or correct.

samus2y ago

They simply don't work that way. You are asking it for an answer, it will give you one since all it can do is extrapolate from its training data.

Good prompting and certain adjustment to the text generation parameters might help prevent hallucinations, but it's not an exact science since it depends on how it was trained. Also, an LLMs training data frankly said contains a lot of bulls*t.

j / k navigate · click thread line to collapse

0 comments

Shrezzing2y ago

>The weird problem is with LLM hallucinations is that it usually will acknowledge its mistake and correct itself if you call it out.

greenavocado2y ago

In my experience the corrections can be additional hallucinations one after another after pointing out inaccuracies even multiple times in a row.

Eisenstein2y ago

> My question is why can't LLMs included a sub-routine to check itself before answering.

Because LLMs don't work in a way for that to be possible if you operate them on their own.

   'What (3195)', ' is (349)', ' po (1627)', 'op (410)', ' sp (668)', 'elled (6099)', ' backwards (24324)', '? (28804)', '\n (13)', '### (27332)', ' Response (12107)', ': (28747)', '\n (13)',

It tokenized 'poop' as two tokens: 'po', number 1627, and 'op', number 410.

Next it comes up with its response:

   Generating (1 / 512 tokens) [(pu 4.43%) (The 66.62%) (po 11.96%) (p 4.99%)]
   Generating (2 / 512 tokens) [(o 89.90%) (op 10.10%)]
   Generating (3 / 512 tokens) [(P 100.00%)]
   Generating (4 / 512 tokens) [( 100.00%)]

It picked 'pu' even though it was only a ~4% chance of being correct, then instead of picking 'op' it picked 'o'. The last token was a 100% probability of being 'P'.

   Output: puoP

At no time did it write 'puoP' as a complete word nor does it know what 'puoP' is. It has no way of evaluating whether that is the right answer or not. You would need a different process to do that.

ZitchDog2y ago

The problem is that if you call it out, it will frequently change its answer, even if it was correct. LLMs currently lack chutzpa.

samus2y ago

They definitely stand their ground if they were aligned to do so.

Drakim2y ago

But then they stand their ground when wrong too.

Jensson2y ago

People have a really hard time catching such bullshitting from humans, which is why free form interviews doesn't work.

asimovfan2y ago

Its because theres no entity that is actually acknowledging anything. Its generating an answer to your prompt. You can gaslight it into anything being wrong or correct.

samus2y ago

They simply don't work that way. You are asking it for an answer, it will give you one since all it can do is extrapolate from its training data.

j / k navigate · click thread line to collapse