undefined | Better HN

0 pointsdr_dshiv2y ago0 comments

What about Meta’s megabyte? Also nice proposal?

0 comments

Yes. Solving context length has been tried in hundreds of different approaches, and yet most LLMs are almost identical to the original one from 2017.

Just to name a few families of approaches: Sparse Attention, Hierachical Attention, Global-Local Attention,Sliding Window Attention, Locality sensitive hashing Attention, State space model, EMA gated attention.

Loquebantur2y ago

I assume, there is a common point of failure?

Notably, human working memory isn't great either. Which begs the question (if the comparison is valid) as to whether that limitation might be fundamental.

visarga2y ago

The failure mode is that only long context tasks benefit, short ones work fast enough with full attention, and better. It's amazing that OpenAI never used them in any serious LLM even though training costs are huge.

j / k navigate · click thread line to collapse

0 comments

visarga2y ago

Yes. Solving context length has been tried in hundreds of different approaches, and yet most LLMs are almost identical to the original one from 2017.

Loquebantur2y ago

I assume, there is a common point of failure?

Notably, human working memory isn't great either. Which begs the question (if the comparison is valid) as to whether that limitation might be fundamental.

visarga2y ago

j / k navigate · click thread line to collapse