undefined | Better HN

0 pointsqsort4mo ago0 comments

> due to context-window limits

0 comments

simianwords4mo ago

context window is not some physical barrier but rather the attention just getting saturated. what did i get wrong here?

qsortOP4mo ago

> what did i get wrong here?

You don't know how an LLM works and you are operating on flawed anthropomorphic metaphors.

Ask a frontier LLM what a context window is, it will tell you.

Palmik4mo ago

It's a fair question, even if it might be coming from a place of misunderstanding.

For example, DeepSeek 3.2, which employs sparse attention [1], is not only faster with long context than normal 3.1, but also seems to be better (perhaps thanks to reducing the noise?).

[1] It uses still quadratic router, but it's small, so it scales well in practice. https://api-docs.deepseek.com/news/news250929

ed4mo ago

Parent is likely thinking of sparse attention which allows a significantly longer context to fit in memory

1 more reply

paradite4mo ago

In theory, auto-regressive models should not have limit on context. It should generate the next token with all previous tokens.

In practice, when training a model, people select a context window so that during inference, you know how much GPU memory to allocate for a prompt and reject the prompt if it exceeds the memory limit.

Of course there's also degrading performance as context gets longer, but I suspect memory limit is the primary factor of why we have context window limits.

kenjackson4mo ago

I think attention literally doesn't see anything beyond the context window. Even within the context window you may start to see attentional issues, but that's a different problem.

j / k navigate · click thread line to collapse