undefined | Better HN

0 pointscodedokode2y ago0 comments

This is a great visualization because original paper on transformers is not very clear and understandable; I tried to read it first and didn't understand so I had to look for other explanations (for example it was unclear for me how multiple tokens are handled).

Also, speaking about transformers: they usually append their output tokens to input and process them again. Can we optimize it, so that we don't need to do the same calculations with same input tokens?

0 comments

No comments yet.