undefined | Better HN

0 pointslopuhin5mo ago0 comments

Context window size of 400k is not new, gpt-5, 5.1, 5-mini, etc. have the same. But they do claim they improved long context performance which if true would be great.

0 comments

energy1235mo ago

But 400k was never usable in ChatGPT Plus/Pro subscriptions. It was nerfed down to 60-100k. If you submitted too long of a prompt they deleted the tokens on the end of your prompt before calling the model. Or if the chat got too long (still below 100k however) they deleted your first messages. This was 3 months ago.

Can someone with an active sub check whether we can submit a full 400k prompt (or at least 200k) and there is no prompt truncatation in the backend? I don't mean attaching a file which uses RAG.

piskov5mo ago

Context windows for web

Fast (GPT‑5.2 Instant) Free: 16K Plus / Business: 32K Pro / Enterprise: 128K

Thinking (GPT‑5.2 Thinking) All paid tiers: 196K

https://help.openai.com/en/articles/11909943-gpt-52-in-chatg...

energy1235mo ago

But can you do that in one message or is that a best case scenario in a long multi turn chat?

dr_dshiv5mo ago

That’s… too bad

eru5mo ago

> Or if the chat got too long (still below 100k however) they deleted your first messages. This was 3 months ago.

I can believe that, but it also seems really silly? If your max context window is X and the chat has approached that, instead of outright deleting the first messages outright, why not have your model summarise the first quarter of tokens and place those at the beginning of the log you feed as context? Since the chat history is (mostly) immutable, this only adds a minimal overhead: you can cache the summarisation, and don't have to do that over and over again for each new message. (If partially summarised log gets too long, you summarise again.)

Since I can come up with this technique in half a minute of thinking about the problem, and the OpenAI folks are presumably not stupid, I wonder what downside I'm missing.

Aeolun5mo ago

Don’t think you are missing anything. I do this with the API, and it works great. I’m not sure why they don’t do it, but I can only guess it’s because it completely breaks the context caching. If you summarize the full buffer at least you know you are down to a few thousand tokens to cache again, instead of 100k tokens to cache again.

1 more reply

gunalx5mo ago

API use was not merged in this way.

j / k navigate · click thread line to collapse

0 comments

energy1235mo ago

Can someone with an active sub check whether we can submit a full 400k prompt (or at least 200k) and there is no prompt truncatation in the backend? I don't mean attaching a file which uses RAG.

piskov5mo ago

Context windows for web

Fast (GPT‑5.2 Instant) Free: 16K Plus / Business: 32K Pro / Enterprise: 128K

Thinking (GPT‑5.2 Thinking) All paid tiers: 196K

https://help.openai.com/en/articles/11909943-gpt-52-in-chatg...

energy1235mo ago

But can you do that in one message or is that a best case scenario in a long multi turn chat?

dr_dshiv5mo ago

That’s… too bad

eru5mo ago

> Or if the chat got too long (still below 100k however) they deleted your first messages. This was 3 months ago.

Since I can come up with this technique in half a minute of thinking about the problem, and the OpenAI folks are presumably not stupid, I wonder what downside I'm missing.

Aeolun5mo ago

1 more reply

gunalx5mo ago

API use was not merged in this way.

j / k navigate · click thread line to collapse