undefined | Better HN

0 pointsmiven2y ago0 comments

Correct me if I'm wrong but usually when you do normal token by token inference in a transformer you store calculations made in the previous step in a KV cache so you can reuse it instead of calculating it all over again.

But here since the previous few tokens were produced by another model, the current model has never seen them and as such, by definition, doesn't have those calculations stored, but it still needs them to properly calculate attention for the new token.

0 comments

breckenedge2y ago

It doesn’t appear to be token-by-token inference. Each new completion uses a different model, but the new completion is entirely created by that model.

j / k navigate · click thread line to collapse

0 pointsmiven2y ago0 comments

0 comments

breckenedge2y ago

It doesn’t appear to be token-by-token inference. Each new completion uses a different model, but the new completion is entirely created by that model.

j / k navigate · click thread line to collapse