1M context is now generally available for Opus 4.6 and Sonnet 4.6 (opens in new tab)

(claude.com)

1220 pointsmeetpateltech12d ago520 comments

520 comments

Interesting, I’ve never needed 1M, or even 250k+ context. I’m usually under 100k per request.

About 80% of my code is AI-generated, with a controlled workflow using dev-chat.md and spec.md. I use Flash for code maps and auto-context, and GPT-4.5 or Opus for coding, all via API with a custom tool.

Gemini Pro and Flash have had 1M context for a long time, but even though I use Flash 3 a lot, and it’s awesome, I’ve never needed more than 200k.

For production coding, I use

- a code map strategy on a big repo. Per file: summary, when_to_use, public_types, public_functions. This is done per file and saved until the file changes. With a concurrency of 32, I can usually code-map a huge repo in minutes. (Typically Flash, cheap, fast, and with very good results)

- Then, auto context, but based on code lensing. Meaning auto context takes some globs that narrow the visibility of what the AI can see, and it uses the code map intersection to ask the AI for the proper files to put in context. (Typically Flash, cheap, relatively fast, and very good)

- Then, use a bigger model, GPT 5.4 or Opus 4.6, to do the work. At this point, context is typically between 30k and 80k max.

What I’ve found is that this process is surprisingly effective at getting a high-quality response in one shot. It keeps everything focused on what’s needed for the job.

Higher precision on the input typically leads to higher precision on the output. That’s still true with AI.

For context, 75% of my code is Rust, and the other 25% is TS/CSS for web UI.

Anyway, it’s always interesting to learn about different approaches. I’d love to understand the use case where 1M context is really useful.

daemonk11d ago

Yeah this is the simpler and also effective strategy. A lot of people are building sophisticated AST RAG models. But you really just need to ask Claude to generally build a semantic index for each large-ish piece of code and re-use it when getting context.

You have to make sure the semantic summary takes up significantly less tokens than just reading the code or its just a waste of token/time.

Then have a skill that uses git version logs to perform lazy summary cache when needed.

1 more reply

smusamashah11d ago

It seems like a very good use of LLMs. You should write a blog post with detail of your process with examples for people who are not into all AI tools as much. I only use Web UI. Lots of what you are saying is beyond me, but it does sound like clever strategy.

tontinton11d ago

Yeah we all converge to the same workflow, in my ai coding agent I'm working on now, I've added an "index" tool that uses tree-sitter to compress and show the AI a skeleton of a code file.

Here's the implementation for the interested: https://github.com/tontinton/maki/blob/main/maki-code-index%...

jeremychone11d ago

Oh, that's great.

I've always wanted to explore how to fit tree-sitter into this workflow. It's great to know that this works well too.

Thanks for sharing the code.

(Here is the AIPack runtime I built, MIT: https://github.com/aipack-ai/aipack), and here is the code for pro@coder (https://github.com/aipack-ai/packs-pro/tree/main/pro/coder) (AIPack is in Rust, and AI Packs are in md / lua)

firemelt11d ago

whenever I see post like this

i said well yeah, but its too sophiscated to be practical

jeremychone11d ago

Fair point, but because I spent a year building and refining my custom tool, this is now the reality for all of my AI requests.

I prompt, press run, and then I get this flow: dev setup (dev-chat or plan) code-map (incremental 0s 2m for initial) auto-context (~20s to 40s) final AI query (~30s to 2m)

For example, just now, in my Rust code (about 60k LOC), I wanted to change the data model and brainstorm with the AI to find the right design, and here is the auto-context it gave me:

- Reducing 381 context files ( 1.62 MB)

- Now 5 context files ( 27.90 KB)

- Reducing 11 knowledge files ( 30.16 KB)

- Now 3 knowledge files ( 5.62 KB)

The knowledge files are my "rust10x" best practices, and the context files are the source files.

(edited to fix formatting)

1 more reply

adammarples11d ago

It's not sophisticated at all, he just uses a model to make some documentation before asking another model to work using the documentation

lukeundtrug11d ago

I built myself an AST based solution for that during the last 6 months roughly. I always wondered whether grep and agent-based discovery will be the end of it and thought it just has to be better with a more deterministic approach.

In the end it's hard to measure but personally I feel that my agent rarely misses any context for a given task, so I'm pretty happy with it.

I used a different approach than tree-sitter because I thought I found a nice way to get around having to write language-specific code. I basically use VSCode as a language backend and wrote some logic to basically rebuild the AST tree from VSCode's symbol data and other API.

That allows me to just install the correct language extension and thus enable support for that specific language. The extension has to provide symbol information which most do through LSP.

In the end it was way more effort than just using tree-sitter, however, and I'm thinking of doing a slow migration to that approach sooner or later.

Anyways, I created an extension that spins up an mcp server and provides several tools that basically replace the vanilla discovery tools in my workflow.

The approach is similar to yours, I have an overview tool which runs different centrality ranking metrics over the whole codebase to get the most important symbols and presents that as an architectural overview to the LLM.

Then I have a "get-symbol-context" tool which allows the AI to get all the information that the AST holds about a single symbol, including a parameter to include source code which completely replaces grepping and file reading for me.

The tool also specifies which other symbols call the one in question and which others it calls, respectively.

But yeah, sorry for this being already a quite long comment, if you want to give it a try, I published it on the VSCode marketplace a couple of days ago, and it's basically free right now, although I have to admit that I still want to try to earn a little bit of money with it at some point.

Right now, the daily usage limit is 2000 tool calls per day, which should be enough for anybody.

Would love to hear what you think :)

<https://marketplace.visualstudio.com/items?itemName=LuGoSoft...>

jeremychone10d ago

I looked at your solution and extension README, and it's very interesting and well thought out.

The fact that you've been using it for six months and that it performs well says a lot. At the end of the day, that's what counts.

I like your idea of piggybacking on top of the LSP services, and I can imagine that this was quite a bit of work. Doing it as an MCP server makes it usable across different tools.

I also really like the name "Context Master."

In my case, it's much more niche since it's for the tool I built. Though it's open source, the key difference is that the "indexing" is only agentic at this point.

I can see value in mixing the two. LSP integration scares me because of the amount of work involved, and tree-sitter seems like a good path.

In that case, in the code map, for each item, there could be both the LLM response info and some deterministic info, for example, from tree-sitter.

That being said, the current approach works so well that I think I am going to keep using and fine-tuning it for a while, and bring in deterministic context only when or if I need it.

Anyway, what you built looks great. If it works, that's great.

1 more reply

cloverich11d ago

This is really interesting; ive done very high level code maps but the entire project seems wild, it works?

So, small model figures out which files to use based on the code map, and then enriches with snippets, so big model ideally gets preloaded with relevant context / snippets up front?

Where does code map live? Is it one big file?

jeremychone11d ago

So, I have a pro@coder/.cache/code-map/context-code-map.json.

I also have a `.tmpl-code-map.jsonl` in the same folder so all of my tasks can add to it, and then it gets merged into context-code-map.json.

I keep mtime, but I also compute a blake3 hash, so if mtime does not match, but it is just a "git restore," I do not redo the code map for that file. So it is very incremental.

Then the trick is, when sending the code map to AI, I serialize it in a nice, simple markdown format.

- path/to/file.rs - summary: ... - when to use: ... - public types: .., .., .. - public functions: .., .., ..

- ...

So the AI does not have to interpret JSON, just clean, structured markdown.

Funny, I worked on this addition to my tool for a week, planning everything, but even today, I am surprised by how well it works.

I have zero sed/grep in my workflow. Just this.

My prompt is pro@coder/coder-prompt.md, the first part is YAML for the globs, and the second part is my prompt.

There is a TUI, but all input and output are files, and the TUI is just there to run it and see the status.

CuriouslyC11d ago

1M context is super useful with Gemini, not so much for coding, but for data analysis.

jeremychone11d ago

Even there, I use AI to augment rows and build the code to put data into Json or Polars and create a quick UI to query the data.

speakbits11d ago

I think you've kind of hit on the more successful point here, which is that you should be keeping things focused in a sufficiently focused area to have better success and not necessarily needing more context.

exceptione11d ago

  > - a code map strategy on a big repo. Per file: summary, when_to_use, public_types, public_functions. This is done per file and saved until the file changes. With a concurrency of 32, I can usually code-map a huge repo in minutes. (Typically Flash, cheap, fast, and with very good results)

Thanks, but why use any AI to generate this? I would say: you document your functions-in-code, types are provided from the compiler service, so it should all be deterministically available in seconds iso minutes, without burning tokens. Am I missing something?

jeremychone11d ago

Very good point. I had two options:

1) Deterministic

  - Using a tree-sitter/AST-like approach, I could extract types, functions, and perhaps comments, and put them into an index map.

  - Cons:

    - The tricky part of this approach is that what I extract can be pretty large per file, for example, comments.

    - Then, I would probably need an agentic synthesis step for those comments anyway.

2) Agentic

  - Since Flash is dirt cheap, I wanted to experiment and skip #1, and go directly to #2.

  - Because my tool is built for concurrency, when set to 32, it's super fast.

  - The price is relatively low, perhaps $1 or $2 for 50k LOC, and 60 to 90 seconds, about 30 to 45 minutes of AI work.

  - What I get back is relatively consistent by file, size-wise, and it's just one trip per file.

So, this is why I started with #2.

And then, the results in real coding scenarios have been astonishing.

Way above what I expected.

The way those indexes get combined with the user prompt gets the right files 95% of the time, and with surprisingly high quality.

So, I might add deterministic aspects to it, but since I think I will need the agentic step anyway, I have deprioritized it.

rafael-lua11d ago

Well, out of all the workflows I have seen, this one is rather nice, might give it a try.

I imagine if the context were being commited and kept up-to-date with CI would work for others to use as well.

However, I'm a little confused on the autocontext/globs narrowing part. Do you, the developer, provide them? Or you feed the full code map to flash + your prompt so it returns the globs based on your prompt?

Also, in general, is your map of a file relatively smaller than the file itself, even for very small files?

jeremychone11d ago

- The ..-code-map.json files are per "developer folder," which would create too many conflicts if they were kept in Git.

- I have two main globs, which are lists of globs: knowledge_globs and context_globs. Knowledge can be absolute and should be relatively static. context_globs have to be relative to the workspace, since they are the working files.

- As a dev, you provide them in the top YAML section of the coder-prompt.md.

- The auto-context sub-agent calls the code-map sub-agent. Sub-agents can add to or narrow the given globs, and that is the goal of the auto-context agent.

It looks complicated, but it actually works like a charm.

Hopefully, I answered some of your questions.

I need to make a video about it.

But regardless, I really think it's not about the tools, it's about the techniques. This is where the true value is.

2 more replies

LuxBennu11d ago

Your code map compresses signal on the context side. Same principle applies on the prompt side: prompts that front-load specifics (file, error, expected behavior) resolve in 1-2 turns. Vague ones spiral into 5-6. 1M context doesn't change that — it just gives you more room for the spiral.

Myrmornis11d ago

This is interesting but don't you worry that you're competing with entire companies (e.g. Anthropic) and thus it's a losing battle? Since you're re-implementing a bunch of stuff they either do in their harness or have decided it was better not to do?

mh-11d ago

I think it's worth remembering that for any offering like that, it necessarily needs to be ~one-size-fits-all, while what you come up with.. doesn't.

They're solving a different problem than you. So I think it's very plausible that you could come up with something that, for your use case, performs considerably better than their "defaults".

sphilipakis9d ago

Personally I don't see aipack's pro@coder and other approaches (claude code, cursor, copilot, etc...) as competitors anymore. I use both approaches to solve different problems. I keep using the agentic solutions (claude code style) for more operational tasks, a bit like "smart interfaces to terminal", and pro@coder for coding / engineering tasks where I need a much tighter control over long running work sessions.

ra711d ago

This is fascinating. I feel like this is converging into the concept of a traditional "IDE". So much of your setup reminds me of IDEs indexing, doing static analysis, building ASTs, etc. before a developer starts writing code.

jeremychone10d ago

Yes, there is a parallel here. Now, some of those "indexing" steps can be performed by an LLM.

And that does not prevent mixing and matching the two, as some comments in this thread suggest.

Anyway, it's a great time for production coding.

Weryj11d ago

My approach has been using static analysis to produce a Mermaid diagram of all Classes:Methods and their caller/callees.

make_it_sure11d ago

very interested in this approach and many other people are for sure. Please do a blog post.

dimitri-vs12d ago

The big change here is:

> Standard pricing now applies across the full 1M window for both models, with no long-context premium. Media limits expand to 600 images or PDF pages.

For Claude Code users this is huge - assuming coherence remains strong past 200k tok.

MikeNotThePope12d ago

Is it ever useful to have a context window that full? I try to keep usage under 40%, or about 80k tokens, to avoid what Dex Horthy calls the dumb zone in his research-plan-implement approach. Works well for me so far.

No vibes allowed: https://youtu.be/rmvDxxNubIg?is=adMmmKdVxraYO2yQ

furyofantares12d ago

I'd been on Codex for a while and with Codex 5.2 I:

1) No longer found the dumb zone

2) No longer feared compaction

Switching to Opus for stupid political reasons, I still have not had the dumb zone - but I'm back to disliking compaction events and so the smaller context window it has, has really hurt.

I hope they copy OpenAI's compaction magic soon, but I am also very excited to try the longer context window.

5 more replies

kaizenb12d ago

Thanks for the video.

His fix for "the dumb zone" is the RPI Framework:

● RESEARCH. Don't code yet. Let the agent scan the files first. Docs lie. Code doesn't.

● PLAN. The agent writes a detailed step-by-step plan. You review and approve the plan, not just the output. Dex calls this avoiding "outsourcing your thinking." The plan is where intent gets compressed before execution starts.

● IMPLEMENT. Execute in a fresh context window. The meta-principle he calls Frequent Intentional Compaction: don't let the chat run long. Ask the agent to summarize state, open a new chat with that summary, keep the model in the smart zone.

5 more replies

alecco11d ago

Offtopic: I find it remarkable the shortened YT url has a tracking cost of 57% extra length. We live in stupid times.

1 more reply

SkyPuncher12d ago

Yes. I've recently become a convert.

For me, it's less about being able to look back -800k tokens. It's about being able to flow a conversation for a lot longer without forcing compaction. Generally, I really only need the most recent ~50k tokens, but having the old context sitting around is helpful.

1 more reply

ogig12d ago

When running long autonomous tasks it is quite frequent to fill the context, even several times. You are out of the loop so it just happens if Claude goes a bit in circles, or it needs to iterate over CI reds, or the task was too complex. I'm hoping a long context > small context + 2 compacts.

3 more replies

dimitri-vs12d ago

It's kind of like having a 16 gallon gas tank in your car versus a 4 gallon tank. You don't need the bigger one the majority of the time, but the range anxiety that comes with the smaller one and annoyance when you DO need it is very real.

2 more replies

ricksunny12d ago

Since I'm yet to seriously dive into vibe coding or AI-assisted coding, does the IDE experience offer tracking a tally of the context size? (So you know when you're getting close or entering the "dumb zone")?

6 more replies

Barbing12d ago

Looking at this URL, typo or YouTube flip the si tracking parameter?

  youtu.be/rmvDxxNubIg?is=adMmmKdVxraYO2yQ

1 more reply

dev_l1x_be12d ago

I never use these giant context windows. It is pointless. Agents are great at super focused work that is easy to re-do. Not sure what is the use case for giant context windows.

hrmtst9383712d ago

Maxing out context is only useful if all the information is directly relevant and tightly scoped to the task. The model's performance tends to degrade with too much loosely related data, leading to more hallucinations and slower results. Targeted chunking and making sure context stays focused almost always yields better outcomes unless you're attempting something atypical, like analyzing an entire monorepo in one shot.

wat1000011d ago

I've used it many times for long-running investigations. When I'm deep in the weeds with a ton of disassembly listings and memory dumps and such, I don't really want to interrupt all of that with a compaction or handoff cycle and risk losing important info. It seems to remain very capable with large contexts at least in that scenario.

maskull12d ago

After running a context window up high, probably near 70% on opus 4.6 High and watching it take 20% bites out of my 5hr quota per prompt I've been experimenting with dumping context after completing a task. Seems to be working ok. I wonder if I was running into the long context premium. Would that apply to Pro subs or is just relevant to api pricing?

virtualritz11d ago

I haven't hit the "dumb zone" any more since two months. I think this talk is outdated.

I'm using CC (Opus) thinking and Codex with xhigh on always.

And the models have gotten really good when you let them do stuff where goals are verifiable by the model. I had Codex fix a Rust B-rep CSG classification pipeline successfully over the course of a week, unsupervised. It had a custom STEP viewer that would take screenshots and feed them back into the model so it could verify the progress resp. the triangle soup (non progress) itself.

Codex did all the planning and verification, CC wrote the code.

This would have not been possible six months ago at all from my experience.

Maybe with a lot of handholding; but I doubt it (I tried).

I mean both the problem for starters (requires a lot of spatial reasoning and connected math) and the autonomous implementation. Context compression was never an issue in the entire session, for either model.

saaaaaam12d ago

That video is bizarre. Such a heavy breather.

2 more replies

bushbaba12d ago

Yes. I’ve used it for data analysis

twodave12d ago

I mean, try using copilot on any substantial back-end codebase and watch it eat 90+% just building a plan/checklist. Of course copilot is constrained to 120k I believe? So having 10x that will blow open up some doors that have been closed for me in my work so far.

That said, 120k is pleeenty if you’re just building front-end components and have your API spec on hand already.

Bombthecat11d ago

If it's not coding, even with 200k context it starts to write gibberish, even with the correct information in the context.

I tried to ask questions about path of exile 2. And even with web research on it gave completely wrong information... Not only outdated. Wrong

I think context decay is a bigger problem then we feel like.

AnotherGoodName11d ago

Fwiw put a copy of the game folder in a directory and tell claude to extract game files and dissasemble the game in preparation for questions about the game.

As an example of doing this in a session with jagged alliance 3 (an rpg) https://pastes.io/jagged-all-69136

Claude extracting game archives and dissasembling leads to far more reliable results than random internet posts.

3 more replies

Lord-Jobo11d ago

Context decay is noticeable within 3 messages, nearly every time. Maybe not substantial, but definitely noticeable.

It’s lead to me starting new chats with bigger and bigger starting ‘summary, prompts to catch the model up while refreshing it. Surely there’s a way to automate that technique.

2 more replies

eric_cc11d ago

It could also be a skill problem. It would be more helpful if when people made llm sucks claims they shared their prompt.

The people I work with who complain about this type of thing horribly communicate their ask to the llm and expect it to read their minds.

4 more replies

staticman211d ago

Adding web search doesn't necessarily lead to better information at any context.

In my experience the model will assume the web results are the answer even if the search engine returns irrelevant garbage.

For example you ask it a question about New Jersey law and the web results are about New York or about "many states" it'll assume the New York info or "many states" info is about New Jersey.

blueblisters11d ago

I think ChatGPT has a huge advantage here. They have been collecting realistic multi-turn conversational data at a much larger scale. And generally their models appear to be more coherent with larger contexts for general purpose stuff.

holoduke11d ago

Number one thing you always need to accomplish are feedback loops for Claude so it's able to shotgun program itself to a solution.

gorjusborg11d ago

The question that comes to mind for me after reading your comment is how can a question about a game require that much context?

1 more reply

wouldbecouldbe11d ago

I feel like few weeks ago i suddenly had a week where even after 3 messages it forgot what we did. Seems fixed now.

turbostyler11d ago

We need an MCP for path of building

__MatrixMan__11d ago

Agreed, there's no getting around the "break it into smaller contexts" problem that lies between us and generally useful AI.

It'll remain a human job for quite a while too. Separability is not a property of vector spaces, so modern AIs are not going to be good at it. Maybe we can manage something similar with simplical complexes instead. Ideally you'd consult the large model once and say:

> show me the small contexts to use here, give me prompts re: their interfaces with their neighbors, and show me which distillations are best suited to those tasks

...and then a network of local models could handle it from there. But the providers have no incentive to go in that direction, so progress will likely be slow.

reactordev11d ago

That’s not context decay, that’s training data ambiguity. So much misinformation, nerfs, buffs, changes that an LLM can not keep up given the training time required. Do it for a game that has been stable and it knows its stuff.

1 more reply

jnovek11d ago

What were you asking about PoE 2? So far my _general_ experience with asking LLMs about ARPGs has been meh. Except for Diablo 2 but I think that’s just because Diablo 2 has been heavily discussed for ~25 years.

a_e_k12d ago

I've been using the 1M window at work through our enterprise plan as I'm beginning to adopt AI in my development workflow (via Cline). It seems to have been holding up pretty well until about 700k+. Sometimes it would continue to do okay past that, sometimes it started getting a bit dumb around there.

(Note that I'm using it in more of a hands-on pair-programming mode, and not in a fully-automated vibecoding mode.)

chatmasta12d ago

So a picture is worth 1,666 words?

islewis12d ago

The quality with the 1M window has been very poor for me, specifically for coding tasks. It constantly forgets stuff that has happened in the existing conversation. n=1, ymmv

robwwilliams12d ago

Yes, especially with shifts in focus of a long conversation. But given the high error rates of Opus 4.6 the last few weeks it is possibly due to other factors. Conversational and code prodding has been essential.

hagen812d ago

Well, the question is what is contributing to the usage. Because as the context grows, the amount of input tokens are increasing. A model call with 800K token as input is 8 times more expensive than a model call with 100K tokens as input. Especially if we resume a conversation and caching does not hit, it would be very expensive with API pricing.

j4511d ago

This might burn through usage faster too though.

jFriedensreich11d ago

yeah it totally does not remain coherent past 200k, would have been too nice.

__MatrixMan__11d ago

I bet it depends how homogenous the context is. I bet it works ok near 1M in some cases, but as far as I can tell, those cases are rare.

syntaxing12d ago

It’s interesting because my career went from doing higher level language (Python) to lower language (C++ and C). Opus and the like is amazing at Python, honestly sometimes better than me but it does do some really stupid architectural decisions occasionally. But when it comes to embedded stuff, it’s still like a junior engineer. Unsure if that will ever change but I wonder if it’s just the quality and availability of training data. This is why I find it hard to believe LLMs will replace hardware engineers anytime soon (I was a MechE for a decade).

necovek12d ago

As someone who did Python professionally from a software engineering perspective, I've actually found Python to be pretty crappy really: unaware of _good_ idioms living outside tutorials and likely 90% of Python code out there that was simply hacked together quickly.

I have not tested, but I would expect more niche ecosystems like Rust or Haskell or Erlang to have better overall training set (developer who care about good engineering focus on them), and potentially produce the best output.

For C and C++, I'd expect similar situation with Python: while not as approachable, it is also being pushed on beginning software engineers, and the training data would naturally have plenty of bad code.

jeremyjh11d ago

I think its pretty good at Elixir, so that tracks.

1 more reply

mettamage11d ago

Can you recommend some books that teach these idioms? I know not everything is in books but I suspect a bit of it is

n_u12d ago

I've found it's ok at Rust. I think a lot of existing Rust code is high quality and also the stricter Rust compiler enforces that the output of the LLM is somewhat reasonable.

lemagedurage12d ago

Yes, it's nice to have a strict compiler, so the agent has to keep fixing its bugs until it actually compiles. Rust and TypeScript are great for this.

1 more reply

raincole11d ago

Quite sure it's not about the language but the domain.

1 more reply

ipnon12d ago

I think the combinatorial space is just too much. When I did web dev it was mostly transforming HTML/JSON from well-defined type A to well-defined type B. Everything is in text. There's nothing to reason about besides what is in the prompt itself. But constructing and maintaining a mental model of a chip and all of its instructions and all of the empirical data from profiling is just too much for SOTA to handle reliably.

ex-aws-dude12d ago

I've had a similar experience as a graphics programmer that works in C++ every day

Writing quick python scripts works a lot better than niche domain specific code

nullpoint42012d ago

Unfortunately, I’ve found it’s really good at Wayland and OpenGL. It even knows how to use Clutter and Meta frameworks from the Gnome Mutter stack. Makes me wonder why I learned this all in the first place.

1 more reply

trenchgun12d ago

LLMsdo great with Rust though

ricardobeat11d ago

It is really good at writing C++ for Arduino, can one-shot most programs.

NanoWar11d ago

I'd say the chance of me one shotting C++ is veeeery low. Same for bash scripts etc. This is where the LLM really shines for me.

Skidaddle11d ago

How was the career transition from MechE? Looking to do the same myself.

dzonga11d ago

nor web engineers (backend) that are not doing standard crud work.

I have seen these shine on frontend work

convenwis12d ago

Is there a writeup anywhere on what this means for effective context? I think that many of us have found that even when the context window was 100k tokens the actual usable window was smaller than that. As you got closer to 100k performance degraded substantially. I'm assuming that is still true but what does the curve look like?

esperent12d ago

> As you got closer to 100k performance degraded substantially

In practice, I haven't found this to be the case at all with Claude Code using Opus 4.6. So maybe it's another one of those things that used to be true, and now we all expect it to be true.

And of course when we expect something, we'll find it, so any mistakes at 150k context use get attributed to the context, while the same mistake at 50k gets attributed to the model.

peacebeard12d ago

My personal experience is that Opus 4.6 degrades after a while but the degradation is more subtle and less catastrophic than in the past. I still aggressively clear sessions to keep it sharp though.

dcre12d ago

Personally, even though performance up to 200k has improved a lot with 4.5 and 4.6, I still try to avoid getting up there — like I said in another comment, when I see context getting up to even 100k, I start making sure I have enough written to disk to type /new, pipe it the diff so far, and just say “keep going.” I feel like the dropoff starts around maybe 150k, but I could be completely wrong. I thought it was funny that the graph in the post starts at 256k, which convenient avoids showing the dropoff I'm talking about (if it's real).

tyleo12d ago

I mentioned this at work but context still rots at the same rate. 90k tokens consumed has just as bad results in 100k context window or 1M.

Personally, I’m on a 6M+ line codebase and had no problems with the old window. I’m not sending it blindly into the codebase though like I do for small projects. Good prompts are necessary at scale.

minimaxir12d ago

The benchmark charts provided are the writeup. Everything else is just anecdata.

FartyMcFarter12d ago

Isn't transformer attention quadratic in complexity in terms of context size? In order to achieve 1M token context I think these models have to be employing a lot of shortcuts.

I'm not an expert but maybe this explains context rot.

vlovich12312d ago

Nope, there’s no tricks unless there’s been major architectural shifts I missed. The rot doesn’t come from inference tricks to try to bring down quadratic complexity of the KV cache. Task performance problems are generally a training problem - the longer and larger the data set, the fewer examples you have to train on it. So how do you train the model to behave well - that’s where the tricks are. I believe most of it relies on synthetically generated data if I’m not mistaken, which explains the rot.

1 more reply

wewewedxfgdf12d ago

The weirdest thing about Claude pricing is their 5X pricing plan is 5 times the cost of the previous plan.

Normally buying the bigger plan gives some sort of discount.

At Claude, it's just "5 times more usage 5 times more cost, there you go".

apetresc12d ago

Those sorts of volume discounts are what you do when you're trying to incentivize more consumption. Anthropic already has more demand then they're logistically able to serve, at the moment (look at their uptime chart, it's barely even 1 9 of reliability). For them, 1 user consuming 5 units of compute is less attractive than 5 users consuming 1 unit.

They would probably implement _diminishing_-value pricing if pure pricing efficiency was their only concern.

auggierose12d ago

It is not the plan they want you to buy. It is a pricing strategy to get you to buy the 20x plan.

radley12d ago

5x Max is the plan I use because the Pro plan limits out so quickly. I don't use Claude full-time, but I do need Claude Code, and I do prefer to use Opus for everything because it's focused and less chatty.

1 more reply

operatingthetan12d ago

I think they are both subsidized so either is a great deal.

cush11d ago

Yeah the free lunch on tokens is almost over. Get them while they’re still cheap

merrvk12d ago

5 times the already subsidised rate is still a discount.

tclancy12d ago

We’ll make it up on volume.

Zambyte12d ago

5 for 5

minimaxir12d ago

Claude Code 2.1.75 now no longer delineates between base Opus and 1M Opus: it's the same model. Oddly, I have Pro where the change supposedly only for Max+ but am still seeing this to be case.

EDIT: Don't think Pro has access to it, a typical prompt just hit the context limit.

The removal of extra pricing beyond 200k tokens may be Anthropic's salvo in the agent wars against GPT 5.4's 1M window and extra pricing for that.

auggierose12d ago

No change for Pro, just checked it, the 1M context is still extra usage.

zaptrem12d ago

I have Max 20x and they're still separate on 2.1.75.

hackyon110d ago

Mine took restarting. Is it still separate for you? Also might not work yet if you have CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 on

Frannky12d ago

Opus 4.6 is nuts. Everything I throw at it works. Frontend, backend, algorithms—it does not matter.

I start with a PRD, ask for a step-by-step plan, and just execute on each step at a time. Sometimes ideas are dumb, but checking and guiding step by step helps it ship working things in hours.

It was also the first AI I felt, "Damn, this thing is smarter than me."

The other crazy thing is that with today's tech, these things can be made to work at 1k tokens/sec with multiple agents working at the same time, each at that speed.

koreth112d ago

I wish I had this kind of experience. I threw a tedious but straightforward task at Claude Code using Opus 4.6 late last week: find the places in a React code base where we were using useState and useEffect to calculate a value that was purely dependent on the inputs to useEffect, and replace them with useMemo. I told it to be careful to only replace cases where the change did not introduce any behavior changes, and I put it in plan mode first.

It gave me an impressive plan of attack, including a reasonable way to determine which code it could safely modify. I told it to start with just a few files and let me review; its changes looked good. So I told it to proceed with the rest of the code.

It made hundreds of changes, as expected (big code base). And most of them were correct! Except the places where it decided to do things like put its "const x = useMemo(...)" call after some piece of code that used the value of "x", meaning I now had a bunch of undefined variable references. There were some other missteps too.

I tried to convince it to fix the places where it had messed up, but it quickly started wanting to make larger structural changes (extracting code into helper functions, etc.) rather than just moving the offending code a few lines higher in the source file. Eventually I gave up trying to steer it and, with the help of another dev on my team, fixed up all the broken code by hand.

It probably still saved time compared to making all the changes myself. But it was way more frustrating.

dcre12d ago

One tip I have is that once you have the diff you want to fix, start a new session and have it work on the diff fresh. They’ve improved this, but it’s still the case that the farther you get into context window, the dumber and less focused the model gets. I learned this from the Claude Code team themselves, who have long advised starting over rather than trying to steer a conversation that has started down a wrong path.

I have heard from people who regularly push a session through multiple compactions. I don’t think this is a good idea. I virtually never do this — when I see context getting up to even 100k, I start making sure I have enough written to disk to type /new, pipe it the diff so far, and just say “keep going.” I learned recently that even essentials like the CLAUDE.md part of the prompt get diluted through compactions. You can write a hook to re-insert it but it's not done by default.

This fresh context thing is a big reason subagents might work where a single agent fails. It’s not just about parallelism: each subagent starts with a fresh context, and the parent agent only sees the result of whatever the subagent does — its own context also remains clean.

4 more replies

olalonde12d ago

Same here. I don't understand how people leave it running on an "autopilot" for long periods of time. I still use it interactively as an assistant, going back and forth and stepping in when it makes mistakes or questionable architectural decisions. Maybe that workflow makes more sense if you're not a developer and don't have a good way to judge code quality in the first place.

There's probably a parallel with the CMSes and frameworks of the 2000s (e.g. WordPress or Ruby on Rails). They massively improved productivity, but as a junior developer you could get pretty stuck if something broke or you needed to implement an unconventional feature. I guess it must feel a bit similar for non-developers using tools like Claude Code today.

1 more reply

conception12d ago

Branch first so you can just undo. I think this would have worked with sub agents and /loop maybe? Write all items to change to a todo.md. Have it split up the work with haiku sub agents doing 5-10 changes at a time, marking the todos done, and /loop until all are done. You’ll succeed I suspect. If the main claude instance compacts its context - stop and start from where you left off.

1 more reply

a13n12d ago

If you use eslint and tell it how to run lint in CLAUDE.md it will run lint itself and find and fix most issues like this.

Definitely not ideal, but sure helps.

jdkoeck12d ago

Undefined variable references? Did you not instruct it to run typescript after changes?

stpedgwdgfhgdd11d ago

Start over, create a new plan with the lessons learned.

You need to converge on the requirements.

dyauspitr12d ago

You’re using it wrong. As soon as it starts going off the rails once you’ve repeated yourself, you drop the whole session and start over.

1 more reply

sarchertech12d ago

What kinds of things are you building? This is not my experience at all.

Just today I asked Claude using opus 4.6 to build out a test harness for a new dynamic database diff tool. Everything seemed to be fine but it built a test suite for an existing diff tool. It set everything up in the new directory, but it was actually testing code and logic from a preexisting directory despite the plan being correct before I told it to execute.

I started over and wrote out a few skeleton functions myself then asked it write tests for those to test for some new functionality. Then my plan was to the ask it to add that functionality using the tests as guardrails.

Well the tests didn’t actually call any of the functions under test. They just directly implemented the logic I asked for in the tests.

After $50 and 2 hours I finally got something working only to realize that instead of creating a new pg database to test against, it found a dev database I had lying around and started adding tables to it.

When I managed to fix that, it decided that it needed to rebuild multiple docker components before each test and test them down after each one.

After about 4 hours and $75, I managed to get something working that was probably more code than I would have written in 4 hours, but I think it was probably worse than what I would have come up with on my own. And I really have no idea if it works because the day was over and I didn’t have the energy left to review it all.

We’ve recently been tasked at work with spending more money on Claude (not being more productive the metric is literally spending more money) and everyone is struggling to do anything like what the posts on HN say they are doing. So far no one in my org in a very large tech company has managed to do anything very impressive with Claude other than bringing down prod 2 days ago.

Yes I’m using planning mode and clearing context and being specific with requirements and starting new sessions, and every other piece of advice I’ve read.

I’ve had much more luck using opus 4.6 in vs studio to make more targeted changes, explain things, debug etc… Claude seems too hard to wrangle and it isn’t good enough for you to be operating that far removed from the code.

jhatemyjob12d ago

Similar experience. I use these AI tools on a daily basis. I have tons of examples like yours. In one recent instance I explicitly told it in the prompt to not use memcpy, and it used memcpy anyway, and generated a 30-line diff after thinking for 20 minutes. In that amount of time I created a 10-line diff that didn't use memcpy.

I think it's the big investors' extremely powerful incentives manifesting in the form of internet comments. The pace of improvement peaked at GPT-4. There is value in autocomplete-as-a-service, and the "harnesses" like Codex take it a lot farther. But the people who are blown away by these new releases either don't spend a lot of time writing code, or are being paid to be blown away. This is not a hockey stick curve. It's a log curve.

Bigger context windows are a welcome addition. And stuff like JSON inputs is nice too. But these things aren't gonna like, take your SWE job, if you're any good. It's just like, a nice substitute for the Google -> Stack Overflow -> Copy/Paste workflow.

2 more replies

dcre12d ago

Curious what language and stack. And have people at your company had marginally more success with greenfield projects like prototypes? I guess that’s what you’re describing, though it sounds like it’s a directory in a monorepo maybe?

1 more reply

JoeMerchant11d ago

Try https://github.com/gsd-build/get-shit-done. It's been a game changer for me.

godd211d ago

> After about 4 hours and $75

Huh? The max plan is $200/month. How are you spending $75 in 4 hrs?

1 more reply

extr12d ago

You probably just don't have the hang of it yet. It's very good but it's not a mind reader and if you have something specific you want, it's best to just articulate that exactly as best you can ("I want a test harness for <specific_tool>, which you can find <here>"). You need to explain that you want tests that assert on observable outcomes and state, not internal structure, use real objects not mocks, property based testing for invariants, etc. It's a feedback loop between yourself and the agent that you must develop a bit before you start seeing "magic" results. A typical session for me looks like:

- I ask for something highly general and claude explores a bit and responds.

- We go back and forth a bit on precisely what I'm asking for. Maybe I correct it a few times and maybe it has a few ideas I didn't know about/think of.

- It writes some kind of plan to a markdown file. In a fresh session I tell a new instance to execute the plan.

- After it's done, I skim the broad strokes of the code and point out any code/architectural smells.

- I ask it to review it's own work and then critique that review, etc. We write tests.

Perhaps that sounds like a lot but typically this process takes around 30-45 minutes of intermittent focus and the result will be several thousand lines of pretty good, working code.

5 more replies

eknkc12d ago

I find that Opus misses a lot of details in the code base when I want it to design a feature or something. It jumps to a basic solution which is actually good but might affect something elsewhere.

GPT 5.4 on codex cli has been much more reliable for me lately. I used to have opus write and codex review, I now to the opposite (I actually have codex write and both review in parallel).

So on the latest models for my use case gpt > opus but these change all the time.

Edit: also the harness is shit. Claude code has been slow, weird and a resource hog. Refuses to read now standardized .agents dirs so I need symlink gymnastics. Hides as much info as it can… Codex cli is working much better lately.

toraway12d ago

Codex CLI is so much more pleasant to use than CC. I cancelled my CC subscription after the OpenCode thing, but somewhat ironically have recently found myself naturally trying the native Codex CLI client first more often over OpenCode.

Kinda funny how you don't actually need to use coercion if you put in the engineering work to build a product that's competitive on its own technical merits...

ai_fry_ur_brain12d ago

Im convinced everyone saying this is building the simplest web apps, and doing magic tricks on themselves.

hparadiz12d ago

I've been building a new task manager in C for Linux.

If you're not using AI you are cooked. You just don't realize it yet.

https://i.imgur.com/YXLZvy3.png

2 more replies

raldi12d ago

What evidence would convince you otherwise?

1 more reply

marginalia_nu11d ago

My experience is that it gets you 80-90% of the way at 20x the speed, but coaxing it into fixing the remaining 10-20% happens at a staggeringly slow speed.

All programming is like this to some extent, but Claude's 80/20 behavior is so much more extreme. It can almost build anything in 15-30 minutes, but after those 15-30 minutes are up, it's only "almost built". Then you need to spend hours, days, maybe even weeks getting past the "almost".

Big part of why everyone seems to be vibe coding apps, but almost nobody seems to be shipping anything.

interpol_p12d ago

I had Opus 4.6 running on a backend bug for hours. It got nowhere. Turned out the problem was in AWS X-ray swizzling the fetch method and not handling the same argument types as the original, which led to cryptic errors.

I had Opus 4.6 tell me I was "seeing things wrong" when I tried to have it correct some graphical issues. It got stuck in a loop of re-introducing the same bug every hour or so in an attempt to fix the issue.

I'm not disagreeing with your experience, but in my experience it is largely the same as what I had with Opus 4.5 / Codex / etc.

toraway12d ago

Haha, reminds me of an unbelievably aggravating exchange with Codex (GPT 5.4 / High) where it was unflinchingly gaslighting me about undesired behavior still occurring after a change it made that it was adamant simply could not be happening.

It started by insisting I was repeatedly making a typo and still would not budge even after I started copy/pasting the full terminal history of what I was entering and the unabridged output, and eventually pivoted to darkly insinuating I was tampering with my shell environment as if I was trying to mislead it or something.

Ultimately it turned out that it forgot it was supposed to be applying the fixes to the actual server instead of the local dev environment, and had earlier in the conversation switched from editing directly over SSH to pushing/pulling the local repo to the remote due to diffs getting mangled.

fbrncci12d ago

I am starting to believe it’s not OPUS but developers getting better at using LLMs across the board. And not realizing they are just getting much better at using these tools.

I also thought it was OPUS 4.5 (also tested a lot with 4.6) and then in February switched to only using auto mode in the coding IDEs. They do not use OPUS (most of the times), and I’m ending up with a similar result after a very rough learning curve.

Now switching back to OPUS I notice that I get more out of it, but it’s no longer a huge difference. In a lot of cases OPUS is actually in the way after learning to prompt more effectively with cheaper models.

The big difference now is that I’m just paying 60-90$ month for 40-50hrs of weekly usage… while I was inching towards 1000$ with OPUS. I chose these auto modes because they don’t dig into usage based pricing or throttling which is a pretty sweet deal.

danielbln12d ago

Opus is not an acronym.

2 more replies

copperx12d ago

I had similar thoughts regarding "we are simply getting better at using them", but the man I tried Gemini again and reconsidered.

olalonde12d ago

> PRD

Is it Baader-Meinhof or is everyone on HN suddenly using obscure acronyms?

shujito12d ago

It stands for Product Requirements Document, it is something commonly used in project planning and management.

1 more reply

nvarsj11d ago

Seems commonly used in Big Tech - first time I heard it was in my current job. Now it's seared into my brain since it's used so much. Among many other acronyms which I won't bore you with.

Aperocky12d ago

I had been able to get it into the classic AI loop once.

It was about a problem with calculation around filling a topographical water basin with sedimentation where calculation is discrete (e.g. turn based) and that edge case where both water and sediments would overflow the basin; To make the matter simple, fact was A, B, C, and it oscillated between explanation 1 which refuted C, explanation 2 which refuted A and explanation 3 that refuted B.

I'll give it to opus training stability that my 3 tries using it all consistently got into this loop, so I decided to directly order it to do a brute force solution that avoided (but didn't solve) this problem.

I did feel like with a human, there's no way that those 3 loop would happen by the second time. Or at least the majority of us. But there is just no way to get through to opus 4.6

schainks12d ago

> It was also the first AI I felt, "Damn, this thing is smarter than me."

1000% agree. It's also easy to talk to it about something you're not sure it said and derive a better, more elegant solution with simple questioning.

Gemini 3.1 also gives me these vibes.

eru12d ago

> [...] with multiple agents working at the same time, each at that speed.

Horizontal parallelising of tasks doesn't really require any modern tech.

But I agree that Opus 4.6 with 1M context window is really good at lots of routine programming tasks.

1 more reply

rafaelmn12d ago

I've seen a few instances of where Claude showed me a better way to do something and many many more instances of where it fails miserably.

Super simple problem :

I had a ZMK keyboard layout definition I wanted it to convert it to QMK for a different keyboard that had one key less so it just had to trim one outer key. It took like 45 minutes of back and forth to get it right - I could have done it in 30 min manually tops with looking up docs for everything.

Capability isn't the impressive part it's the tenacity/endurance.

hrishikesh-s12d ago

Opus-4.6 is so far ahead of the rest that I think Anthropic is the winner in winner-take-all

1 more reply

raincole12d ago

It's so far the best model that answers my questions about Wolfram language.

That being said it's the only use case for me. I won't subscribe to something that I can't use with third party harness.

1 more reply

fooker12d ago

I have a PhD in a niche field and this can do my job ;)

Not sure if this means I should get a more interesting job or if we are all going to be at the mercy of UBI eventually.

2 more replies

ed_elliott_asc12d ago

It’s still pretty poor writing powershell

slopinthebag11d ago

> It was also the first AI I felt, "Damn, this thing is smarter than me."

Sounds like it is.

vessenes12d ago

I’ll put out a suggestion you pair with codex or deepthink for audit and review - opus is still prone to … enthusiastic architectural decisions. I promise you will be at least thankful and at most like ‘wtf?’ at some audit outputs.

Also shout out to beads - I highly recommend you pair it with beads from yegge: opus can lay out a large project with beads, and keep track of what to do next and churn through the list beautifully with a little help.

1 more reply

scroogey12d ago

Just yesterday I asked it to repeat a very simple task 10 times. It ended up doing it 15 times. It wasn't a problem per se, just a bit jarring that it was unable to follow such simple instructions (it even repeated my desire for 10 repetitions at the start!).

dzink12d ago

Opus 4.6 is AGI in my book. They won’t admit it, but it’s absolutely true. It shows initiative in not only getting things right but also adding improvements that the original prompt didn't request that match the goals of the job.

4 more replies

devld11d ago

But does it still generate slop?

I'm late to the party and I'm just getting started with Antrophic models. I have been finding Sonnet decent enough, but it seems to have trouble naming variables correctly (it's not just that most names are poor and undescriptive, sometimes it names it wrong, confusing) or sometimes unnecessarily declaring, re-declaring variables, encoding, decoding, rather than using the value that's already there etc. Is Opus better at this?

1 more reply

vips7L12d ago

Bullshit.

phendrenad212d ago

The replies to this really make me think that some people are getting left behind the AI age. Colleges are likely already teaching how to prompt, but a lot of existing software devs just don't get it. I encourage people who aren't having success with AI to watch some youtube videos on best practices.

2 more replies

vessenes12d ago

This is super exciting. I've been poking at it today, and it definitely changes my workflow -- I feel like a full three or four hour parallel coding session with subagents is now generally fitting into a single master session.

The stats claim Opus at 1M is about like 5.4 at 256k -- these needle long context tests don't always go with quality reasoning ability sadly -- but this is still a significant improvement, and I haven't seen dramatic falloff in my tests, unlike q4 '25 models.

p.s. what's up with sonnet 4.5 getting comparatively better as context got longer?

steve-atx-760012d ago

Did it get better? I used sonnet 4.5 1m frequently and my impression was that it was around the same performance but a hell of a lot faster since the 1m model was willing to spends more tokens at each step vs preferring more token-cautious tool calls.

1 more reply

mattfrommars12d ago

Random: are you personally paying for Claude Code or is it paid by you employer?

My employer only pays for GitHub copilot extension

2 more replies

johnwheeler12d ago

This is incredible. I just blew through $200 last night in a few hours on 1M context. This is like the best news I've heard all year in regards to my business.

What is OpenAIs response to this? Do they even have 1M context window or is it still opaque and "depends on the time of day"

dominotw12d ago

rarely go over 25 percent in codex but i hit 80 on claude code in just a short time.

hagen812d ago

Did u use the API or subscription?

1 more reply

iandanforth11d ago

I'm very happy about this change. For long sessions with Claude it was always like a punch to the gut when a compaction came along. Codex/GPT-5.4 is better with compactions so I switched to that to avoid the pain of the model suddenly forgetting key aspects of the work and making the same dumb errors all over again. I'm excited to return to Claude as my daily driver!

tariky12d ago

This is amazing. I have to test it with my reverse engineering workflow. I don't know how many people use CC for RE but it is really good at it.

Also it is really good for writing SketchUp plugins in ruby. It one shots plugins that are in some versions better then commercial one you can buy online.

CC will change development landscape so much in next year. It is exciting and terrifying in same time.

aragonite12d ago

Do long sessions also burn through token budgets much faster?

If the chat client is resending the whole conversation each turn, then once you're deep into a session every request already includes tens of thousands of tokens of prior context. So a message at 70k tokens into a conversation is much "heavier" than one at 2k (at least in terms of input tokens). Yes?

dathery12d ago

That's correct. Input caching helps, but even then at e.g. 800k tokens with all of them cached, the API price is $0.50 * 0.8 = $0.40 per request, which adds up really fast. A "request" can be e.g. a single tool call response, so you can easily end up making many $0.40 requests per minute.

1 more reply

jasondclinton12d ago

If you use context cacheing, it saves quite a lot on the costs/budgets. You can cache 900k tokens if you want.

geminiboy10d ago

My companies brand guidelines document was 600 ish pages long and claude desktop couldnt handle it.

As soon as I saw the announcement , tried again and created a working design skill that can create design artifacts following the brand guidelines.

While these improvements seem incremental, they have a compounding effect on usefulness.

My AI doomsday calculator just got decremented by anothet 6 months.

Slav_fixflex11d ago

I've been using Claude Code directly on my production servers to debug complex I/O bottlenecks and database locks. The ability of the latest models to hold the entire project context while suggesting real-time fixes is a game changer for solo founders. It helped me stabilize a security tool I’m building when other agents kept hallucinating.

anshumankmr11d ago

All while their usage limits are so excessively shitty that I paid them 50$ just two days back cause I ran out of usage and they still blocked from using it during a critical work week (and did not refund my 50$ despite my emails and requests and route me to s*ty AI bot.). Anyway, I am using Copilot and OpenCode a lot more these days which is much better.

praddlebus11d ago

What model(s) do you use with OpenCode? Can you use opus4.6 1m? Is it better in terms of usage if you use the same model?

1 more reply

pixelpoet12d ago

Compared to yesterday my Claude Max subscription burns usage like absolutely crazy (13% of weekly usage from fresh reset today with just a handful prompts on two new C++ projects, no deps) and has become unbearably slow (as in 1hr for a prompt response). GGWP Anthropic, it was great while it lasted but this isn't worth the hundreds of dollars.

Spooky2312d ago

Yeah, morning eastern time Claude is brutal.

LoganDark12d ago

Finally, I don't have to constantly reload my Extra Usage balance when I already pay $200/mo for their most expensive plan. I can't believe they even did that. I couldn't use 1M context at all because I already pay $200/mo and it was going to ask me for even more.

Next step should be to allow fast mode to draw from the $200/mo usage balance. Again, I pay $200/mo, I should at least be able to send a single message without being asked to cough up more. (One message in fast mode costs a few dollars each) One would think $200/mo would give me any measure of ability to use their more expensive capabilities but it seems it's bucketed to only the capabilities that are offered to even free users.

aenis12d ago

I find it hard to understand that people consider $200 p/m a lot for what they are getting. Expensive compared to what? A netflix sub?

A 1hr of a senior dev is at least $100, depending where one lives. Since Claude saves me hours every day, it pays for itself almost instantly. I think the economic value of the Claude subscription is on the order of $20-40k a month for a pro.

2 more replies

bob102912d ago

I've been avoiding context beyond 100k tokens in general. The performance is simply terrible. There's no training data for a megabyte of your very particular context.

If you are really interested in deep NIAH tasks, external symbolic recursion and self-similar prompts+tools are a much bigger unlock than more context window. Recursion and (most) tools tend to be fairly deterministic processes.

I generally prohibit tool calling in the first stack frame of complex agents in order to preserve context window for the overall task and human interaction. Most of the nasty token consumption happens in brief, nested conversations that pass summaries back up the call stack.

drcongo12d ago

Could be pure coincidence, but my Claude Code session last night was an absolute nightmare. It kept forgetting things it had done earlier in the session and why it had done them, messed up a git merge so badly that it lost the CLAUDE.md file along with a lot of other stuff, and then started running commands on the host machine instead of inside the container because it no longer had a CLAUDE.md to tell it not to. Last night was the first time I've ever sworn at it.

xvector12d ago

I think this is just the nature of a nondeterministic system; occasionally you're gonna be unlucky enough to encounter the leftmost segment of the bell curve.

In my experience dumping a summary + starting a fresh session helps in these cases.

margorczynski12d ago

What about response coherence with longer context? Usually in other models with such big windows I see the quality to rapidly drop as it gets past a certain point.

sporkland11d ago

Can someone help me with insights about large context models? Are there relationships that pop up at the beginning and end of long context windows that don't transitively follow from intermediate points? Is there value in the training over these longer windows vs using the more basic/closer weight distributions over different sliding windows?

PeterStuer11d ago

The thing that would get me more excited is how far they could push context coherence before the model loses track. I'm hoping 250k.

k__12d ago

I heard, the middle of the context is often ignored.

Do long context windows make much sense then or is this just a way of getting people to use more tokens?

jwilliams11d ago

I'm fairly sure that your best throughput is single-prompt single-shot runs with Claude (and that means no plan, no swarms, etc) -- just with a high degree of work in parallel.

So for me this is a pretty huge change as the ceiling on a single prompt just jumped considerably. I'm replaying some of my less effective prompts today to see the impact.

chaboud12d ago

Awesome.... With Sonnet 4.5, I had Cline soft trigger compaction at 400k (it wandered off into the weeds at 500k). But the stability of the 4.6 models is notable. I still think it pays to structure systems to be comprehensible in smaller contexts (smaller files, concise plans), but this is great.

(And, yeah, I'm all Claude Code these days...)

LarsDu8812d ago

The stuff I built with Opus 4.6 in the past 2.5 weeks:

Full clone of Panel de Pon/Tetris attack with full P2P rollback online multiplayer: https://panel-panic.com

An emulator of the MOS 6502 CPU with visual display of the voltage going into the DIP package of the physical CPU: https://larsdu.github.io/Dippy6502/

I'm impressed as fuck, but a part of me deep down knows that I know fuck all about the 6502 or its assembly language and architecture, and now I'll probably never be motivated to do this project in a way that I would've learned all the tings I wanted to learn.

adamm25511d ago

That game is AWESOME! The fact that was vibe coded is insane.

1 more reply

cubefox11d ago

> Standard pricing now applies across the full 1M window for both models, with no long-context premium.

Does that mean it's likely not a Transformer with quadratic attention, but some other kind of architecture, with linear time complexity in sequence length? That would be pretty interesting.

bob102911d ago

It's almost certainly not quadratic at 1M. This would be wildly infeasible at scale. 10^6^2 = 10^12. That's a trillion things.

They are probably doing something like putting the original user prompt into the model's environment and providing special tools to the model, along with iterative execution, to fully process the entire context over multiple invokes.

I think the Recursive Language Model paper has a very good take on how this might go. I've seen really good outcomes in my local experimentation around this concept:

https://arxiv.org/abs/2512.24601

You can get exponential scaling with proper symbolic stack frames. Handling a gigabyte of context is feasible, assuming it fits the depth first search pattern.

1 more reply

FartyMcFarter11d ago

They're probably taking shortcuts such as taking advantage of sparsity. There are various tricks like that mentioned in some papers, although the big companies are getting more and more secretive about how their models work so you won't necessarily find proof.

1 more reply

holoduke12d ago

I am currently mass translating millions of records with short descriptions. Somehow tokens are consumed extremely fast. I have 3 max memberships. And all 3 of them are hitting the 5 hour limit in about 5 to 10 minutes. Still don't understand why this is happening.

cbg012d ago

Unless you're clearing up the context for each description or processing them in parallel with subagents your context window will grow for each short description added to it making you hit those hour limits.

aenis12d ago

Sample of one and all that, but it's way, way more sloppy than it used to be for me.

To the extent, that I have started making manual fixes in the code - I haven't had to stoop to this in 2 months.

Max subscription, 100k LOC codebases more or less (frontend and backend - same observations).

suheilaaita12d ago

This blew my mind the first i saw this. Another leap in AI that just swooshes by. In a couple of months, every model will be the same. Can't wait for IDEs like cursor and vs code to update their tooling to adap for this massive change in claude models.

dominotw12d ago

can someone tell me how to make this instruction work in claude code

"put high level description of the change you are making in log.md after every change"

works perfectly in codex but i just cant get calude to do it automatically. I always have to ask "did you update the log".

sergiotapia12d ago

use claude hooks - in .claude/settings.json you can have it run on different claude events like "PreToolUse" or "Stop" and in those events you pass in commands you want it to run.

You can have stuff like for the "stop" event, run foobar.sh and in foobar.sh do cool stuff like format your code, run tests, etc.

prettyblocks12d ago

I imagine you can do this with a hook that fires every time claude stops responding:

https://code.claude.com/docs/en/hooks-guide

8note12d ago

whats the need? you have the session in a file as a dag. you can summarize to a log whenever you want. doesnt need to be as it goes.

earlier today i actually spent a bit of time asking claude to make an mcp to introspect that - break the session down into summarized topics, so i could try dropping some out or replacing the detailed messages with a summary - the idea being to compact out a small chunk to save on context window, rather than getting it back to empty.

the file is just there though, you can run jq against it to get a list of writes, and get an agent to summarize

1 more reply

steve-atx-760012d ago

Backup your config and ask Claude. I’ve done this for all kinds of things like mcp and agent config.

yubainu12d ago

1M is truly amazing. However, what is the incidence of hallucination? I haven't found a benchmark, but I feel that maintaining context at 1M would likely increase hallucination. Is there some kind of mechanism to suppress hallucination?

vicchenai12d ago

The no-degradation-at-scale claim is the interesting part. Context rot has been the main thing limiting how useful long context actually is in practice — curious to see what independent evals show on retrieval consistency across the full 1M window.

apetresc12d ago

I don't think they're claiming "no degradation at scale", are they? They still report a 91.9->78.3 drop. That's just a better drop than everyone else (is the claim).

miohtama11d ago

I just tested this with Jupytwr Notebooks for a day. LLMs have struggled with them because notebooks contain a lot of token as the data of rendered cells.

With Opus 1M, LLM edit was very robust and finally useable

jFriedensreich11d ago

My testing was extremely disappointing, this is not a context window that magically extends your breathing room for a conversation. I can tell blindly at this point when 150 - 200 k tokens are reached because the coding quality and coherence just drops by one or two generations. Its great for the case you really need a giant context for specific task but it changes nothing for needing to compact or handover at 200k.

heraldgeezer11d ago

I feel like I'm the only one here using AI as just a chatbot for research, shopping, advice etc and for one off regex or bash/ps scripts... then again not a programmer so.

sailfast11d ago

This is great news. The 1M context is much easier to work with than compacting all the time and seems to perform and remember quite well despite the insane amount of data.

thebigspacefuck11d ago

I used this for a bit and I felt like it was slower and generally worse than using 200K with context compaction. Context compaction does lose some things though.

gaigalas12d ago

I'm getting close to my goal of fitting an entire bootstrappable-from-source system source code as context and just telling Claude "go ahead, make it better".

glimshe9d ago

Is "generally available" the proper wording when Claude is generally unavailable so often these days?

arjie12d ago

This is fantastic. I keep having to save to memory with instructions and then tell it to restore to get anywhere on long running tasks.

aliljet12d ago

Are there evals showing how this improves outputs?

apetresc12d ago

Improves outputs relative to what? Compared to previous contexts of 1M, it improves outputs by allowing them to exist (because previously you couldn't exceed 200K). Compared to contexts of <200K, it degrades outputs rather than improves them, but that's what you'd expect from longer contexts. It's still better than compaction, which was previously the alternative.

mvrckhckr12d ago

I never get to more than 20% of the 1M context window, and it’s working great. (Have the same experience in Codex with 5.4.)

AbstractH2411d ago

Am I crazy or wasn’t this announced like 2 weeks ago?

Or was that a different company or not GA. It’s all becoming a blur.

aarmenante12d ago

Hot take... the 1MM context degrades performance drastically.

aenis12d ago

Same. First time in 2 months that I found it easier to fix the bugs it created manually, rather than get it to fix. Its google-code-CLI-on-gemini-2.5 level bad for me today. Meaning, almost comically bad.

jmkozko11d ago

Do subscription users still need to tap into "extra usage" spending to go above 200K tokens?

8cvor6j844qw_d612d ago

Oh nice, does it mean less game of /compact, /clear, and updating CLAUDE.md with Claude Code?

fnordpiglet12d ago

I’ve been using 1M for a while and it defers it and makes it worse almost when it happens. Compacting a context that big loses a ton of fidelity. But I’ve taken to just editing the context instead (double esc). I also am planning to build an agent to slice the session logs up into contextually useful and useless discarding the useless and keeping things high fidelity that way. (I.e., carve up with a script the jsonl and have subagent haiku return the relevant parts and reconstructing the jsonl)

1 more reply

thunkle12d ago

Just have to ask. Will I be spending way more money since my context window is getting so much bigger?

isbvhodnvemrwvn12d ago

Yes, full context is used to generate each new token.

8note12d ago

im guessing this is why the compacts have started sucking? i just finished getting me some nicer tools for manipulating the graph so i could compact less frequently, and fish out context from the prior session.

maybe itll still be useful, though i only have opus at 1M, not sonnet yet

sergiotapia12d ago

maybe i'm thinking too small, or maybe it's because i've been using these ai systems since they were first launched, but it feels wrong to just saturate the hell out of the context, even if it can take 1 million tokens.

maybe i need to unlearn this habit?

gskm12d ago

I think your instinct is right. More context isn't free, even when the window supports it, and the model still has to attend to everything in there, and noise dilutes the signal. A cleaner, smaller context consistently gives better outputs than a bloated one, regardless of window size. For sure, the 1M window is great for not having to compact mid-task. But "I can fit more" and "I should put more in" are very different things. At least in my mind.

swader99912d ago

I notice Claude steadily consuming less tokens, especially with tool calling every week too

TZubiri11d ago

Remember folks, just because you can use 1m tokens doesn't mean you should

fittingopposite12d ago

I don't get the announcement. Is this included in the standard 5 or 20x Max plans?

throw0317201912d ago

Pentagon may switch to Claude knowing OpenAI has the premium rates for 1M context.

zmmmmm12d ago

Noticed this just now - all of a sudden i have 1M context window (!!!) without changing anything. It's actually slightly disturbing because this IS a behavior change. Don't get me wrong, I like having longer context but we really need to pin down behaviour for how things are deployed.

steve-atx-760012d ago

You can pin to specific models with —-model. Check out their doc. See https://support.claude.com/en/articles/11940350-claude-code-.... You can also pin to a less specific tag like sonnet-4.5[1m] (that’s from memory might be a little off).

1 more reply

phist_mcgee12d ago

Anthropic is famous for changing things under your feet. Claude code is basically alpha software with a global footprint.

dkpk12d ago

Is this also applicable for usage in Claude web / mobile apps for chat?

alienchow12d ago

If this is a skill issue, feel free to let me know. In general Claude Code is decent for tooling. Onduty fullstack tooling features that used to sit ignored in the on-caller ticket queue for months can now be easily built in 20 minutes with unit tests and integration tests. The code quality isn't always the best (although what's good code for humans may not be good code for agents) but that's another specific and directed prompt away to refactor.

However, I can't seem to get Opus 4.6 to wire up proper infrastructure. This is especially so if OSS forks are used. It trips up on arguments from the fork source, invents args that don't exist in either, and has a habit of tearing down entire clusters just to fix a Helm chart for "testing purposes". I've tried modifying the CLAUDE.md and SPEC.md with specific instructions on how to do things but it just goes off on a tangent and starts to negotiate on the specs. "I know you asked for help with figuring out the CNI configurations across 2 clusters but it's too complex. Can we just do single cluster?" The entire repository gets littered with random MD files everywhere for directory specific memories, context, action plans, deprecated action plans, pre-compaction memories etc. I don't quite know which to prune either. It has taken most of the fun out of software engineering and I'm now just an Obsidian janitor for what I can best describe as a "clueless junior engineer that never learns". When the auto compaction kicks in it's like an episode of 50 first dates.

Right now this is where I assume is the limitation because the literature for real-world infrastructure requiring large contexts and integration is very limited. If anyone has any idea if Claude Opus is suitable for such tasks, do give some suggestions.

vips7L12d ago

Friends, just write the code. It’s not that hard.

AussieWog9312d ago

I hear what you're saying, but for a lot of people coding isn't something we can throw 40+ hours per week at.

My main job is running a small eComm business, and I have to both develop software automations for the office (to improve productivity long-term) while also doing non-coding day to day tasks. On top of this, I maintain an open source project after hours. I've also got a young family with 3 kids.

I'm not saying Claude is the damn singularity or anything, but stuff is getting done now that simply wasn't being addressed before.

1 more reply

righthand12d ago

You're witnessing the rise of the Developer Technician or Software Technician. They can get a machine to print out an application but you will still need an engineer to know how it works or to get it working. This used to be juniors learning to be senior devs/engineers. Now it is a split between technicians and engineers. The market will be up shit creek when all their technicians can't vibe code their way out of not understanding the code.

mrgaro12d ago

Not hard, but time consuming. In the past two weeks I've had Claude Code write me around 35k lines of code across 350 commits. It's a project which is giving positive impact to the company, but we would never have started it without CC as the effort would have been too big compared to the impact.

nkzd12d ago

It's not that interesting.

andrewstuart12d ago

Only someone not using Claude could equate human coding.

1 more reply

efeecllk11d ago

finally. before 1m, i must speak 60k context for just telling the past chat and project

hnipps11d ago

Why would anyone need this much context? Genuine question. It's not worth the drop in quality IMO.

tommek407711d ago

You sound like: "Why would anyone need more than 640KB RAM?!".

1 more reply

ionwake11d ago

Have we reached the point where its "normal" to mostly use AI to code? Im just wondering because Im sure it was less than a month ago when I said I havent coded manually for over 6 months and I had several comments about how my code must be terrible.

Im not butt hurt Im just wondering if the overton window has shifted yet.

ofisboy11d ago

i think it's buggy. i keep getting "compacting conversation" even though i restarted the cli. and i'm for sure not using 5 times more.

nemo44x12d ago

Has anyone started a project to replace Linux yet?

dude25071111d ago

No, because it's not a hello-world Electron/React "app".

shanjai_raj712d ago

are the costs the same as the 200k context opus 4.6?

compaction has been really good in claude we don't even recognize the switch

alienbaby12d ago

is this the market played in front of our eyes slice by slice: ok, maybe not, but watching these entities duke it out is kinda amusing? There will be consequences but may as well sit it out for the ride, who knows where we are going?

jf___12d ago

there is a parallel between managing context windows and hard real-time system engineering.

A context window is a fixed-size memory region. It is allocated once, at conversation start, and cannot grow. Every token consumed — prompt, response, digression — advances a pointer through this region. There is no garbage collector. There is no virtual memory. When the space is exhausted, the system does not degrade gracefully: it faults.

This is not metaphor by loose resemblance. The structural constraints are isomorphic:

No dynamic allocation. In a hard realtime system, malloc() at runtime is forbidden — it fragments the heap and destroys predictability. In a conversation, raising an orthogonal topic mid-task is dynamic allocation. It fragments the semantic space. The transformer's attention mechanism must now maintain coherence across non-contiguous blocks of meaning, precisely analogous to cache misses over scattered memory.

No recursion. Recursion risks stack overflow and makes WCET analysis intractable. In a conversation, recursion is re-derivation: returning to re-explain, re-justify, or re-negotiate decisions already made. Each re-entry consumes tokens to reconstruct state that was already resolved. In realtime systems, loops are unrolled at compile time. In LLM work, dependencies should be resolved before the main execution phase.

Linear allocation only. The correct strategy in both domains is the bump allocator: advance monotonically through the available region. Never backtrack. Never interleave. The "brainstorm" pattern — a focused, single-pass traversal of a problem space — works precisely because it is a linear allocation discipline imposed on a conversation.

rhubarbtree12d ago

There is compaction, which is analogous to gc

j / k navigate · click thread line to collapse

520 comments

jeremychone11d ago

Interesting, I’ve never needed 1M, or even 250k+ context. I’m usually under 100k per request.

Gemini Pro and Flash have had 1M context for a long time, but even though I use Flash 3 a lot, and it’s awesome, I’ve never needed more than 200k.

For production coding, I use

- Then, use a bigger model, GPT 5.4 or Opus 4.6, to do the work. At this point, context is typically between 30k and 80k max.

What I’ve found is that this process is surprisingly effective at getting a high-quality response in one shot. It keeps everything focused on what’s needed for the job.

Higher precision on the input typically leads to higher precision on the output. That’s still true with AI.

For context, 75% of my code is Rust, and the other 25% is TS/CSS for web UI.

Anyway, it’s always interesting to learn about different approaches. I’d love to understand the use case where 1M context is really useful.

daemonk11d ago

You have to make sure the semantic summary takes up significantly less tokens than just reading the code or its just a waste of token/time.

Then have a skill that uses git version logs to perform lazy summary cache when needed.

1 more reply

smusamashah11d ago

tontinton11d ago

Yeah we all converge to the same workflow, in my ai coding agent I'm working on now, I've added an "index" tool that uses tree-sitter to compress and show the AI a skeleton of a code file.

Here's the implementation for the interested: https://github.com/tontinton/maki/blob/main/maki-code-index%...

jeremychone11d ago

Oh, that's great.

I've always wanted to explore how to fit tree-sitter into this workflow. It's great to know that this works well too.

Thanks for sharing the code.

firemelt11d ago

whenever I see post like this

i said well yeah, but its too sophiscated to be practical

jeremychone11d ago

Fair point, but because I spent a year building and refining my custom tool, this is now the reality for all of my AI requests.

I prompt, press run, and then I get this flow: dev setup (dev-chat or plan) code-map (incremental 0s 2m for initial) auto-context (~20s to 40s) final AI query (~30s to 2m)

For example, just now, in my Rust code (about 60k LOC), I wanted to change the data model and brainstorm with the AI to find the right design, and here is the auto-context it gave me:

- Reducing 381 context files ( 1.62 MB)

- Now 5 context files ( 27.90 KB)

- Reducing 11 knowledge files ( 30.16 KB)

- Now 3 knowledge files ( 5.62 KB)

The knowledge files are my "rust10x" best practices, and the context files are the source files.

(edited to fix formatting)

1 more reply

adammarples11d ago

It's not sophisticated at all, he just uses a model to make some documentation before asking another model to work using the documentation

lukeundtrug11d ago

In the end it's hard to measure but personally I feel that my agent rarely misses any context for a given task, so I'm pretty happy with it.

That allows me to just install the correct language extension and thus enable support for that specific language. The extension has to provide symbol information which most do through LSP.

In the end it was way more effort than just using tree-sitter, however, and I'm thinking of doing a slow migration to that approach sooner or later.

Anyways, I created an extension that spins up an mcp server and provides several tools that basically replace the vanilla discovery tools in my workflow.

The tool also specifies which other symbols call the one in question and which others it calls, respectively.

Right now, the daily usage limit is 2000 tool calls per day, which should be enough for anybody.

Would love to hear what you think :)

<https://marketplace.visualstudio.com/items?itemName=LuGoSoft...>

jeremychone10d ago

I looked at your solution and extension README, and it's very interesting and well thought out.

The fact that you've been using it for six months and that it performs well says a lot. At the end of the day, that's what counts.

I like your idea of piggybacking on top of the LSP services, and I can imagine that this was quite a bit of work. Doing it as an MCP server makes it usable across different tools.

I also really like the name "Context Master."

In my case, it's much more niche since it's for the tool I built. Though it's open source, the key difference is that the "indexing" is only agentic at this point.

I can see value in mixing the two. LSP integration scares me because of the amount of work involved, and tree-sitter seems like a good path.

In that case, in the code map, for each item, there could be both the LLM response info and some deterministic info, for example, from tree-sitter.

That being said, the current approach works so well that I think I am going to keep using and fine-tuning it for a while, and bring in deterministic context only when or if I need it.

Anyway, what you built looks great. If it works, that's great.

1 more reply

cloverich11d ago

This is really interesting; ive done very high level code maps but the entire project seems wild, it works?

So, small model figures out which files to use based on the code map, and then enriches with snippets, so big model ideally gets preloaded with relevant context / snippets up front?

Where does code map live? Is it one big file?

jeremychone11d ago

So, I have a pro@coder/.cache/code-map/context-code-map.json.

I also have a `.tmpl-code-map.jsonl` in the same folder so all of my tasks can add to it, and then it gets merged into context-code-map.json.

I keep mtime, but I also compute a blake3 hash, so if mtime does not match, but it is just a "git restore," I do not redo the code map for that file. So it is very incremental.

Then the trick is, when sending the code map to AI, I serialize it in a nice, simple markdown format.

- path/to/file.rs - summary: ... - when to use: ... - public types: .., .., .. - public functions: .., .., ..

- ...

So the AI does not have to interpret JSON, just clean, structured markdown.

Funny, I worked on this addition to my tool for a week, planning everything, but even today, I am surprised by how well it works.

I have zero sed/grep in my workflow. Just this.

My prompt is pro@coder/coder-prompt.md, the first part is YAML for the globs, and the second part is my prompt.

There is a TUI, but all input and output are files, and the TUI is just there to run it and see the status.

CuriouslyC11d ago

1M context is super useful with Gemini, not so much for coding, but for data analysis.

jeremychone11d ago

Even there, I use AI to augment rows and build the code to put data into Json or Polars and create a quick UI to query the data.

speakbits11d ago

exceptione11d ago

  > - a code map strategy on a big repo. Per file: summary, when_to_use, public_types, public_functions. This is done per file and saved until the file changes. With a concurrency of 32, I can usually code-map a huge repo in minutes. (Typically Flash, cheap, fast, and with very good results)

jeremychone11d ago

Very good point. I had two options:

1) Deterministic

  - Using a tree-sitter/AST-like approach, I could extract types, functions, and perhaps comments, and put them into an index map.

  - Cons:

    - The tricky part of this approach is that what I extract can be pretty large per file, for example, comments.

    - Then, I would probably need an agentic synthesis step for those comments anyway.

2) Agentic

  - Since Flash is dirt cheap, I wanted to experiment and skip #1, and go directly to #2.

  - Because my tool is built for concurrency, when set to 32, it's super fast.

  - The price is relatively low, perhaps $1 or $2 for 50k LOC, and 60 to 90 seconds, about 30 to 45 minutes of AI work.

  - What I get back is relatively consistent by file, size-wise, and it's just one trip per file.

So, this is why I started with #2.

And then, the results in real coding scenarios have been astonishing.

Way above what I expected.

The way those indexes get combined with the user prompt gets the right files 95% of the time, and with surprisingly high quality.

So, I might add deterministic aspects to it, but since I think I will need the agentic step anyway, I have deprioritized it.

rafael-lua11d ago

Well, out of all the workflows I have seen, this one is rather nice, might give it a try.

I imagine if the context were being commited and kept up-to-date with CI would work for others to use as well.

Also, in general, is your map of a file relatively smaller than the file itself, even for very small files?

jeremychone11d ago

- The ..-code-map.json files are per "developer folder," which would create too many conflicts if they were kept in Git.

- As a dev, you provide them in the top YAML section of the coder-prompt.md.

- The auto-context sub-agent calls the code-map sub-agent. Sub-agents can add to or narrow the given globs, and that is the goal of the auto-context agent.

It looks complicated, but it actually works like a charm.

Hopefully, I answered some of your questions.

I need to make a video about it.

But regardless, I really think it's not about the tools, it's about the techniques. This is where the true value is.

2 more replies

LuxBennu11d ago

Myrmornis11d ago

mh-11d ago

I think it's worth remembering that for any offering like that, it necessarily needs to be ~one-size-fits-all, while what you come up with.. doesn't.

They're solving a different problem than you. So I think it's very plausible that you could come up with something that, for your use case, performs considerably better than their "defaults".

sphilipakis9d ago

ra711d ago

jeremychone10d ago

Yes, there is a parallel here. Now, some of those "indexing" steps can be performed by an LLM.

And that does not prevent mixing and matching the two, as some comments in this thread suggest.

Anyway, it's a great time for production coding.

Weryj11d ago

My approach has been using static analysis to produce a Mermaid diagram of all Classes:Methods and their caller/callees.

make_it_sure11d ago

very interested in this approach and many other people are for sure. Please do a blog post.

dimitri-vs12d ago

The big change here is:

> Standard pricing now applies across the full 1M window for both models, with no long-context premium. Media limits expand to 600 images or PDF pages.

For Claude Code users this is huge - assuming coherence remains strong past 200k tok.

MikeNotThePope12d ago

No vibes allowed: https://youtu.be/rmvDxxNubIg?is=adMmmKdVxraYO2yQ

furyofantares12d ago

I'd been on Codex for a while and with Codex 5.2 I:

1) No longer found the dumb zone

2) No longer feared compaction

Switching to Opus for stupid political reasons, I still have not had the dumb zone - but I'm back to disliking compaction events and so the smaller context window it has, has really hurt.

I hope they copy OpenAI's compaction magic soon, but I am also very excited to try the longer context window.

5 more replies

kaizenb12d ago

Thanks for the video.

His fix for "the dumb zone" is the RPI Framework:

● RESEARCH. Don't code yet. Let the agent scan the files first. Docs lie. Code doesn't.

5 more replies

alecco11d ago

Offtopic: I find it remarkable the shortened YT url has a tracking cost of 57% extra length. We live in stupid times.

1 more reply

SkyPuncher12d ago

Yes. I've recently become a convert.

ogig12d ago

dimitri-vs12d ago

ricksunny12d ago

Barbing12d ago

Looking at this URL, typo or YouTube flip the si tracking parameter?

  youtu.be/rmvDxxNubIg?is=adMmmKdVxraYO2yQ

1 more reply

dev_l1x_be12d ago

I never use these giant context windows. It is pointless. Agents are great at super focused work that is easy to re-do. Not sure what is the use case for giant context windows.

hrmtst9383712d ago

wat1000011d ago

maskull12d ago

virtualritz11d ago

I haven't hit the "dumb zone" any more since two months. I think this talk is outdated.

I'm using CC (Opus) thinking and Codex with xhigh on always.

Codex did all the planning and verification, CC wrote the code.

This would have not been possible six months ago at all from my experience.

Maybe with a lot of handholding; but I doubt it (I tried).

saaaaaam12d ago

That video is bizarre. Such a heavy breather.

2 more replies

bushbaba12d ago

Yes. I’ve used it for data analysis

twodave12d ago

That said, 120k is pleeenty if you’re just building front-end components and have your API spec on hand already.

Bombthecat11d ago

If it's not coding, even with 200k context it starts to write gibberish, even with the correct information in the context.

I tried to ask questions about path of exile 2. And even with web research on it gave completely wrong information... Not only outdated. Wrong

I think context decay is a bigger problem then we feel like.

AnotherGoodName11d ago

Fwiw put a copy of the game folder in a directory and tell claude to extract game files and dissasemble the game in preparation for questions about the game.

As an example of doing this in a session with jagged alliance 3 (an rpg) https://pastes.io/jagged-all-69136

Claude extracting game archives and dissasembling leads to far more reliable results than random internet posts.

3 more replies

Lord-Jobo11d ago

Context decay is noticeable within 3 messages, nearly every time. Maybe not substantial, but definitely noticeable.

It’s lead to me starting new chats with bigger and bigger starting ‘summary, prompts to catch the model up while refreshing it. Surely there’s a way to automate that technique.

2 more replies

eric_cc11d ago

It could also be a skill problem. It would be more helpful if when people made llm sucks claims they shared their prompt.

The people I work with who complain about this type of thing horribly communicate their ask to the llm and expect it to read their minds.

4 more replies

staticman211d ago

Adding web search doesn't necessarily lead to better information at any context.

In my experience the model will assume the web results are the answer even if the search engine returns irrelevant garbage.

For example you ask it a question about New Jersey law and the web results are about New York or about "many states" it'll assume the New York info or "many states" info is about New Jersey.

blueblisters11d ago

holoduke11d ago

Number one thing you always need to accomplish are feedback loops for Claude so it's able to shotgun program itself to a solution.

gorjusborg11d ago

The question that comes to mind for me after reading your comment is how can a question about a game require that much context?

1 more reply

wouldbecouldbe11d ago

I feel like few weeks ago i suddenly had a week where even after 3 messages it forgot what we did. Seems fixed now.

turbostyler11d ago

We need an MCP for path of building

__MatrixMan__11d ago

Agreed, there's no getting around the "break it into smaller contexts" problem that lies between us and generally useful AI.

> show me the small contexts to use here, give me prompts re: their interfaces with their neighbors, and show me which distillations are best suited to those tasks

...and then a network of local models could handle it from there. But the providers have no incentive to go in that direction, so progress will likely be slow.

reactordev11d ago

1 more reply

jnovek11d ago

a_e_k12d ago

(Note that I'm using it in more of a hands-on pair-programming mode, and not in a fully-automated vibecoding mode.)

chatmasta12d ago

So a picture is worth 1,666 words?

islewis12d ago

The quality with the 1M window has been very poor for me, specifically for coding tasks. It constantly forgets stuff that has happened in the existing conversation. n=1, ymmv

robwwilliams12d ago

hagen812d ago

j4511d ago

This might burn through usage faster too though.

jFriedensreich11d ago

yeah it totally does not remain coherent past 200k, would have been too nice.

__MatrixMan__11d ago

I bet it depends how homogenous the context is. I bet it works ok near 1M in some cases, but as far as I can tell, those cases are rare.

syntaxing12d ago

necovek12d ago

jeremyjh11d ago

I think its pretty good at Elixir, so that tracks.

1 more reply

mettamage11d ago

Can you recommend some books that teach these idioms? I know not everything is in books but I suspect a bit of it is

n_u12d ago

I've found it's ok at Rust. I think a lot of existing Rust code is high quality and also the stricter Rust compiler enforces that the output of the LLM is somewhat reasonable.

lemagedurage12d ago

Yes, it's nice to have a strict compiler, so the agent has to keep fixing its bugs until it actually compiles. Rust and TypeScript are great for this.

1 more reply

raincole11d ago

Quite sure it's not about the language but the domain.

1 more reply

ipnon12d ago

ex-aws-dude12d ago

I've had a similar experience as a graphics programmer that works in C++ every day

Writing quick python scripts works a lot better than niche domain specific code

nullpoint42012d ago

1 more reply

trenchgun12d ago

LLMsdo great with Rust though

ricardobeat11d ago

It is really good at writing C++ for Arduino, can one-shot most programs.

NanoWar11d ago

I'd say the chance of me one shotting C++ is veeeery low. Same for bash scripts etc. This is where the LLM really shines for me.

Skidaddle11d ago

How was the career transition from MechE? Looking to do the same myself.

dzonga11d ago

nor web engineers (backend) that are not doing standard crud work.

I have seen these shine on frontend work

convenwis12d ago

esperent12d ago

> As you got closer to 100k performance degraded substantially

In practice, I haven't found this to be the case at all with Claude Code using Opus 4.6. So maybe it's another one of those things that used to be true, and now we all expect it to be true.

And of course when we expect something, we'll find it, so any mistakes at 150k context use get attributed to the context, while the same mistake at 50k gets attributed to the model.

peacebeard12d ago

My personal experience is that Opus 4.6 degrades after a while but the degradation is more subtle and less catastrophic than in the past. I still aggressively clear sessions to keep it sharp though.

dcre12d ago

tyleo12d ago

I mentioned this at work but context still rots at the same rate. 90k tokens consumed has just as bad results in 100k context window or 1M.

minimaxir12d ago

The benchmark charts provided are the writeup. Everything else is just anecdata.

FartyMcFarter12d ago

Isn't transformer attention quadratic in complexity in terms of context size? In order to achieve 1M token context I think these models have to be employing a lot of shortcuts.

I'm not an expert but maybe this explains context rot.

vlovich12312d ago

1 more reply

wewewedxfgdf12d ago

The weirdest thing about Claude pricing is their 5X pricing plan is 5 times the cost of the previous plan.

Normally buying the bigger plan gives some sort of discount.

At Claude, it's just "5 times more usage 5 times more cost, there you go".

apetresc12d ago

They would probably implement _diminishing_-value pricing if pure pricing efficiency was their only concern.

auggierose12d ago

It is not the plan they want you to buy. It is a pricing strategy to get you to buy the 20x plan.

radley12d ago

1 more reply

operatingthetan12d ago

I think they are both subsidized so either is a great deal.

cush11d ago

Yeah the free lunch on tokens is almost over. Get them while they’re still cheap

merrvk12d ago

5 times the already subsidised rate is still a discount.

tclancy12d ago

We’ll make it up on volume.

Zambyte12d ago

5 for 5

minimaxir12d ago

Claude Code 2.1.75 now no longer delineates between base Opus and 1M Opus: it's the same model. Oddly, I have Pro where the change supposedly only for Max+ but am still seeing this to be case.

EDIT: Don't think Pro has access to it, a typical prompt just hit the context limit.

The removal of extra pricing beyond 200k tokens may be Anthropic's salvo in the agent wars against GPT 5.4's 1M window and extra pricing for that.

auggierose12d ago

No change for Pro, just checked it, the 1M context is still extra usage.

zaptrem12d ago

I have Max 20x and they're still separate on 2.1.75.

hackyon110d ago

Mine took restarting. Is it still separate for you? Also might not work yet if you have CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 on

Frannky12d ago

Opus 4.6 is nuts. Everything I throw at it works. Frontend, backend, algorithms—it does not matter.

I start with a PRD, ask for a step-by-step plan, and just execute on each step at a time. Sometimes ideas are dumb, but checking and guiding step by step helps it ship working things in hours.

It was also the first AI I felt, "Damn, this thing is smarter than me."

The other crazy thing is that with today's tech, these things can be made to work at 1k tokens/sec with multiple agents working at the same time, each at that speed.

koreth112d ago

It probably still saved time compared to making all the changes myself. But it was way more frustrating.

dcre12d ago

4 more replies

olalonde12d ago

1 more reply

conception12d ago

1 more reply

a13n12d ago

If you use eslint and tell it how to run lint in CLAUDE.md it will run lint itself and find and fix most issues like this.

Definitely not ideal, but sure helps.

jdkoeck12d ago

Undefined variable references? Did you not instruct it to run typescript after changes?

stpedgwdgfhgdd11d ago

Start over, create a new plan with the lessons learned.

You need to converge on the requirements.

dyauspitr12d ago

You’re using it wrong. As soon as it starts going off the rails once you’ve repeated yourself, you drop the whole session and start over.

1 more reply

sarchertech12d ago

What kinds of things are you building? This is not my experience at all.

Well the tests didn’t actually call any of the functions under test. They just directly implemented the logic I asked for in the tests.

When I managed to fix that, it decided that it needed to rebuild multiple docker components before each test and test them down after each one.

Yes I’m using planning mode and clearing context and being specific with requirements and starting new sessions, and every other piece of advice I’ve read.

jhatemyjob12d ago

2 more replies

dcre12d ago

1 more reply

JoeMerchant11d ago

Try https://github.com/gsd-build/get-shit-done. It's been a game changer for me.

godd211d ago

> After about 4 hours and $75

Huh? The max plan is $200/month. How are you spending $75 in 4 hrs?

1 more reply

extr12d ago

- I ask for something highly general and claude explores a bit and responds.

- We go back and forth a bit on precisely what I'm asking for. Maybe I correct it a few times and maybe it has a few ideas I didn't know about/think of.

- It writes some kind of plan to a markdown file. In a fresh session I tell a new instance to execute the plan.

- After it's done, I skim the broad strokes of the code and point out any code/architectural smells.

- I ask it to review it's own work and then critique that review, etc. We write tests.

Perhaps that sounds like a lot but typically this process takes around 30-45 minutes of intermittent focus and the result will be several thousand lines of pretty good, working code.

5 more replies

eknkc12d ago

I find that Opus misses a lot of details in the code base when I want it to design a feature or something. It jumps to a basic solution which is actually good but might affect something elsewhere.

GPT 5.4 on codex cli has been much more reliable for me lately. I used to have opus write and codex review, I now to the opposite (I actually have codex write and both review in parallel).

So on the latest models for my use case gpt > opus but these change all the time.

toraway12d ago

Kinda funny how you don't actually need to use coercion if you put in the engineering work to build a product that's competitive on its own technical merits...

ai_fry_ur_brain12d ago

Im convinced everyone saying this is building the simplest web apps, and doing magic tricks on themselves.

hparadiz12d ago

I've been building a new task manager in C for Linux.

If you're not using AI you are cooked. You just don't realize it yet.

https://i.imgur.com/YXLZvy3.png

2 more replies

raldi12d ago

What evidence would convince you otherwise?

1 more reply

marginalia_nu11d ago

My experience is that it gets you 80-90% of the way at 20x the speed, but coaxing it into fixing the remaining 10-20% happens at a staggeringly slow speed.

Big part of why everyone seems to be vibe coding apps, but almost nobody seems to be shipping anything.

interpol_p12d ago

I'm not disagreeing with your experience, but in my experience it is largely the same as what I had with Opus 4.5 / Codex / etc.

toraway12d ago

fbrncci12d ago

I am starting to believe it’s not OPUS but developers getting better at using LLMs across the board. And not realizing they are just getting much better at using these tools.

danielbln12d ago

Opus is not an acronym.

2 more replies

copperx12d ago

I had similar thoughts regarding "we are simply getting better at using them", but the man I tried Gemini again and reconsidered.

olalonde12d ago

> PRD

Is it Baader-Meinhof or is everyone on HN suddenly using obscure acronyms?

shujito12d ago

It stands for Product Requirements Document, it is something commonly used in project planning and management.

1 more reply

nvarsj11d ago

Seems commonly used in Big Tech - first time I heard it was in my current job. Now it's seared into my brain since it's used so much. Among many other acronyms which I won't bore you with.

Aperocky12d ago

I had been able to get it into the classic AI loop once.

I did feel like with a human, there's no way that those 3 loop would happen by the second time. Or at least the majority of us. But there is just no way to get through to opus 4.6

schainks12d ago

> It was also the first AI I felt, "Damn, this thing is smarter than me."

1000% agree. It's also easy to talk to it about something you're not sure it said and derive a better, more elegant solution with simple questioning.

Gemini 3.1 also gives me these vibes.

eru12d ago

> [...] with multiple agents working at the same time, each at that speed.

Horizontal parallelising of tasks doesn't really require any modern tech.

But I agree that Opus 4.6 with 1M context window is really good at lots of routine programming tasks.

1 more reply

rafaelmn12d ago

I've seen a few instances of where Claude showed me a better way to do something and many many more instances of where it fails miserably.

Super simple problem :

Capability isn't the impressive part it's the tenacity/endurance.

hrishikesh-s12d ago

Opus-4.6 is so far ahead of the rest that I think Anthropic is the winner in winner-take-all

1 more reply

raincole12d ago

It's so far the best model that answers my questions about Wolfram language.

That being said it's the only use case for me. I won't subscribe to something that I can't use with third party harness.

1 more reply

fooker12d ago

I have a PhD in a niche field and this can do my job ;)

Not sure if this means I should get a more interesting job or if we are all going to be at the mercy of UBI eventually.

2 more replies

ed_elliott_asc12d ago

It’s still pretty poor writing powershell

slopinthebag11d ago

> It was also the first AI I felt, "Damn, this thing is smarter than me."

Sounds like it is.

vessenes12d ago

1 more reply

scroogey12d ago

dzink12d ago

4 more replies

devld11d ago

But does it still generate slop?

1 more reply

vips7L12d ago

Bullshit.

phendrenad212d ago

2 more replies

vessenes12d ago

p.s. what's up with sonnet 4.5 getting comparatively better as context got longer?

steve-atx-760012d ago

1 more reply

mattfrommars12d ago

Random: are you personally paying for Claude Code or is it paid by you employer?

My employer only pays for GitHub copilot extension

2 more replies

johnwheeler12d ago

This is incredible. I just blew through $200 last night in a few hours on 1M context. This is like the best news I've heard all year in regards to my business.

What is OpenAIs response to this? Do they even have 1M context window or is it still opaque and "depends on the time of day"

dominotw12d ago

rarely go over 25 percent in codex but i hit 80 on claude code in just a short time.

hagen812d ago

Did u use the API or subscription?

1 more reply

iandanforth11d ago

tariky12d ago

This is amazing. I have to test it with my reverse engineering workflow. I don't know how many people use CC for RE but it is really good at it.

Also it is really good for writing SketchUp plugins in ruby. It one shots plugins that are in some versions better then commercial one you can buy online.

CC will change development landscape so much in next year. It is exciting and terrifying in same time.

aragonite12d ago

Do long sessions also burn through token budgets much faster?

dathery12d ago

1 more reply

jasondclinton12d ago

If you use context cacheing, it saves quite a lot on the costs/budgets. You can cache 900k tokens if you want.

geminiboy10d ago

My companies brand guidelines document was 600 ish pages long and claude desktop couldnt handle it.

As soon as I saw the announcement , tried again and created a working design skill that can create design artifacts following the brand guidelines.

While these improvements seem incremental, they have a compounding effect on usefulness.

My AI doomsday calculator just got decremented by anothet 6 months.

Slav_fixflex11d ago

anshumankmr11d ago

praddlebus11d ago

What model(s) do you use with OpenCode? Can you use opus4.6 1m? Is it better in terms of usage if you use the same model?

1 more reply

pixelpoet12d ago

Spooky2312d ago

Yeah, morning eastern time Claude is brutal.

LoganDark12d ago

aenis12d ago

I find it hard to understand that people consider $200 p/m a lot for what they are getting. Expensive compared to what? A netflix sub?

2 more replies

bob102912d ago

I've been avoiding context beyond 100k tokens in general. The performance is simply terrible. There's no training data for a megabyte of your very particular context.

drcongo12d ago

xvector12d ago

I think this is just the nature of a nondeterministic system; occasionally you're gonna be unlucky enough to encounter the leftmost segment of the bell curve.

In my experience dumping a summary + starting a fresh session helps in these cases.

margorczynski12d ago

What about response coherence with longer context? Usually in other models with such big windows I see the quality to rapidly drop as it gets past a certain point.

sporkland11d ago

PeterStuer11d ago

The thing that would get me more excited is how far they could push context coherence before the model loses track. I'm hoping 250k.

k__12d ago

I heard, the middle of the context is often ignored.

Do long context windows make much sense then or is this just a way of getting people to use more tokens?

jwilliams11d ago

I'm fairly sure that your best throughput is single-prompt single-shot runs with Claude (and that means no plan, no swarms, etc) -- just with a high degree of work in parallel.

So for me this is a pretty huge change as the ceiling on a single prompt just jumped considerably. I'm replaying some of my less effective prompts today to see the impact.

chaboud12d ago

(And, yeah, I'm all Claude Code these days...)

LarsDu8812d ago

The stuff I built with Opus 4.6 in the past 2.5 weeks:

Full clone of Panel de Pon/Tetris attack with full P2P rollback online multiplayer: https://panel-panic.com

An emulator of the MOS 6502 CPU with visual display of the voltage going into the DIP package of the physical CPU: https://larsdu.github.io/Dippy6502/

adamm25511d ago

That game is AWESOME! The fact that was vibe coded is insane.

1 more reply

cubefox11d ago

> Standard pricing now applies across the full 1M window for both models, with no long-context premium.

Does that mean it's likely not a Transformer with quadratic attention, but some other kind of architecture, with linear time complexity in sequence length? That would be pretty interesting.

bob102911d ago

It's almost certainly not quadratic at 1M. This would be wildly infeasible at scale. 10^6^2 = 10^12. That's a trillion things.

I think the Recursive Language Model paper has a very good take on how this might go. I've seen really good outcomes in my local experimentation around this concept:

https://arxiv.org/abs/2512.24601

You can get exponential scaling with proper symbolic stack frames. Handling a gigabyte of context is feasible, assuming it fits the depth first search pattern.

1 more reply

FartyMcFarter11d ago

1 more reply

holoduke12d ago

cbg012d ago

aenis12d ago

Sample of one and all that, but it's way, way more sloppy than it used to be for me.

To the extent, that I have started making manual fixes in the code - I haven't had to stoop to this in 2 months.

Max subscription, 100k LOC codebases more or less (frontend and backend - same observations).

suheilaaita12d ago

dominotw12d ago

can someone tell me how to make this instruction work in claude code

"put high level description of the change you are making in log.md after every change"

works perfectly in codex but i just cant get calude to do it automatically. I always have to ask "did you update the log".

sergiotapia12d ago

use claude hooks - in .claude/settings.json you can have it run on different claude events like "PreToolUse" or "Stop" and in those events you pass in commands you want it to run.

You can have stuff like for the "stop" event, run foobar.sh and in foobar.sh do cool stuff like format your code, run tests, etc.

prettyblocks12d ago

I imagine you can do this with a hook that fires every time claude stops responding:

https://code.claude.com/docs/en/hooks-guide

8note12d ago

whats the need? you have the session in a file as a dag. you can summarize to a log whenever you want. doesnt need to be as it goes.

the file is just there though, you can run jq against it to get a list of writes, and get an agent to summarize

1 more reply

steve-atx-760012d ago

Backup your config and ask Claude. I’ve done this for all kinds of things like mcp and agent config.

yubainu12d ago

vicchenai12d ago

apetresc12d ago

I don't think they're claiming "no degradation at scale", are they? They still report a 91.9->78.3 drop. That's just a better drop than everyone else (is the claim).

miohtama11d ago

I just tested this with Jupytwr Notebooks for a day. LLMs have struggled with them because notebooks contain a lot of token as the data of rendered cells.

With Opus 1M, LLM edit was very robust and finally useable

jFriedensreich11d ago

heraldgeezer11d ago

I feel like I'm the only one here using AI as just a chatbot for research, shopping, advice etc and for one off regex or bash/ps scripts... then again not a programmer so.

sailfast11d ago

This is great news. The 1M context is much easier to work with than compacting all the time and seems to perform and remember quite well despite the insane amount of data.

thebigspacefuck11d ago

I used this for a bit and I felt like it was slower and generally worse than using 200K with context compaction. Context compaction does lose some things though.

gaigalas12d ago

I'm getting close to my goal of fitting an entire bootstrappable-from-source system source code as context and just telling Claude "go ahead, make it better".

glimshe9d ago

Is "generally available" the proper wording when Claude is generally unavailable so often these days?

arjie12d ago

This is fantastic. I keep having to save to memory with instructions and then tell it to restore to get anywhere on long running tasks.

aliljet12d ago

Are there evals showing how this improves outputs?

apetresc12d ago

mvrckhckr12d ago

I never get to more than 20% of the 1M context window, and it’s working great. (Have the same experience in Codex with 5.4.)

AbstractH2411d ago

Am I crazy or wasn’t this announced like 2 weeks ago?

Or was that a different company or not GA. It’s all becoming a blur.

aarmenante12d ago

Hot take... the 1MM context degrades performance drastically.

aenis12d ago

jmkozko11d ago

Do subscription users still need to tap into "extra usage" spending to go above 200K tokens?

8cvor6j844qw_d612d ago

Oh nice, does it mean less game of /compact, /clear, and updating CLAUDE.md with Claude Code?

fnordpiglet12d ago

1 more reply

thunkle12d ago

Just have to ask. Will I be spending way more money since my context window is getting so much bigger?

isbvhodnvemrwvn12d ago

Yes, full context is used to generate each new token.

8note12d ago

maybe itll still be useful, though i only have opus at 1M, not sonnet yet

sergiotapia12d ago

maybe i need to unlearn this habit?

gskm12d ago

swader99912d ago

I notice Claude steadily consuming less tokens, especially with tool calling every week too

TZubiri11d ago

Remember folks, just because you can use 1m tokens doesn't mean you should

fittingopposite12d ago

I don't get the announcement. Is this included in the standard 5 or 20x Max plans?

throw0317201912d ago

Pentagon may switch to Claude knowing OpenAI has the premium rates for 1M context.

zmmmmm12d ago

steve-atx-760012d ago

1 more reply

phist_mcgee12d ago

Anthropic is famous for changing things under your feet. Claude code is basically alpha software with a global footprint.

dkpk12d ago

Is this also applicable for usage in Claude web / mobile apps for chat?

alienchow12d ago

vips7L12d ago

Friends, just write the code. It’s not that hard.

AussieWog9312d ago

I hear what you're saying, but for a lot of people coding isn't something we can throw 40+ hours per week at.

I'm not saying Claude is the damn singularity or anything, but stuff is getting done now that simply wasn't being addressed before.

1 more reply

righthand12d ago

mrgaro12d ago

nkzd12d ago

It's not that interesting.

andrewstuart12d ago

Only someone not using Claude could equate human coding.

1 more reply

efeecllk11d ago

finally. before 1m, i must speak 60k context for just telling the past chat and project

hnipps11d ago

Why would anyone need this much context? Genuine question. It's not worth the drop in quality IMO.

tommek407711d ago

You sound like: "Why would anyone need more than 640KB RAM?!".

1 more reply

ionwake11d ago

Im not butt hurt Im just wondering if the overton window has shifted yet.

ofisboy11d ago

i think it's buggy. i keep getting "compacting conversation" even though i restarted the cli. and i'm for sure not using 5 times more.

nemo44x12d ago

Has anyone started a project to replace Linux yet?

dude25071111d ago

No, because it's not a hello-world Electron/React "app".

shanjai_raj712d ago

are the costs the same as the 200k context opus 4.6?

compaction has been really good in claude we don't even recognize the switch

alienbaby12d ago

jf___12d ago

there is a parallel between managing context windows and hard real-time system engineering.

This is not metaphor by loose resemblance. The structural constraints are isomorphic:

rhubarbtree12d ago

There is compaction, which is analogous to gc

j / k navigate · click thread line to collapse