You don't know how an LLM works and you are operating on flawed anthropomorphic metaphors.
Ask a frontier LLM what a context window is, it will tell you.
For example, DeepSeek 3.2, which employs sparse attention [1], is not only faster with long context than normal 3.1, but also seems to be better (perhaps thanks to reducing the noise?).
[1] It uses still quadratic router, but it's small, so it scales well in practice. https://api-docs.deepseek.com/news/news250929
In practice, when training a model, people select a context window so that during inference, you know how much GPU memory to allocate for a prompt and reject the prompt if it exceeds the memory limit.
Of course there's also degrading performance as context gets longer, but I suspect memory limit is the primary factor of why we have context window limits.