Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
undefined | Better HN
0 points
gpm
8mo ago
0 comments
Share
Every token has to calculate attention for every previous token, that is that attention takes O(sum_i=0^n i) work, sum_i=0^n i = n(n-1)/2, so that first expression is equivalent to O(n^2).
I'm not sure where you're getting an exponential from.
0 comments
No comments yet.