undefined | Better HN

0 pointsHarHarVeryFunny2y ago0 comments

Well, not exactly a loop. They get to "extend the thought", but there is zero continuity from one word to the next (LLM starts from scratch for each token generated).

The effect is as if you had multiple people playing a game where they each extend a sentence by taking turns adding a word to it, but there is zero continuity from one word to the next because each person is starting from scratch when it is their turn.

0 comments

whimsicalism2y ago

> LLM starts from scratch for each token generated

What do you mean? They get to access their previous hidden states in the next greedy decode using attention, it is not simply starting from scratch. They can access exactly what they were thinking when they put out the previous word, not just reasoning from the word itself.

HarHarVeryFunnyOP2y ago

There's the KV cache kept from one word to the next, but isn't that just an optimization ?

whimsicalism2y ago

Yes, the 'KV cache' (imo an invented novelty, everyone was doing this before they came up with a term to make it sound cool) is an optimization so that you don't have to recompute what the model was thinking when it was generating all the prior words every time you decode a new word.

But that's exactly what I'm saying - the model has access to what it was thinking when it generated the previous words, it does not start from scratch. If you don't have the KV cache, you still have to regenerate what it was thinking from the previous words so on the next word generation you can look back at what you were thinking from the previous words. Does that make sense? I'm not great at talking about this stuff in words

1 more reply

j / k navigate · click thread line to collapse

0 comments

whimsicalism2y ago

> LLM starts from scratch for each token generated

HarHarVeryFunnyOP2y ago

There's the KV cache kept from one word to the next, but isn't that just an optimization ?

whimsicalism2y ago

1 more reply

j / k navigate · click thread line to collapse