About 80% of my code is AI-generated, with a controlled workflow using dev-chat.md and spec.md. I use Flash for code maps and auto-context, and GPT-4.5 or Opus for coding, all via API with a custom tool.
Gemini Pro and Flash have had 1M context for a long time, but even though I use Flash 3 a lot, and it’s awesome, I’ve never needed more than 200k.
For production coding, I use
- a code map strategy on a big repo. Per file: summary, when_to_use, public_types, public_functions. This is done per file and saved until the file changes. With a concurrency of 32, I can usually code-map a huge repo in minutes. (Typically Flash, cheap, fast, and with very good results)
- Then, auto context, but based on code lensing. Meaning auto context takes some globs that narrow the visibility of what the AI can see, and it uses the code map intersection to ask the AI for the proper files to put in context. (Typically Flash, cheap, relatively fast, and very good)
- Then, use a bigger model, GPT 5.4 or Opus 4.6, to do the work. At this point, context is typically between 30k and 80k max.
What I’ve found is that this process is surprisingly effective at getting a high-quality response in one shot. It keeps everything focused on what’s needed for the job.
Higher precision on the input typically leads to higher precision on the output. That’s still true with AI.
For context, 75% of my code is Rust, and the other 25% is TS/CSS for web UI.
Anyway, it’s always interesting to learn about different approaches. I’d love to understand the use case where 1M context is really useful.
You have to make sure the semantic summary takes up significantly less tokens than just reading the code or its just a waste of token/time.
Then have a skill that uses git version logs to perform lazy summary cache when needed.
Here's the implementation for the interested: https://github.com/tontinton/maki/blob/main/maki-code-index%...
I've always wanted to explore how to fit tree-sitter into this workflow. It's great to know that this works well too.
Thanks for sharing the code.
(Here is the AIPack runtime I built, MIT: https://github.com/aipack-ai/aipack), and here is the code for pro@coder (https://github.com/aipack-ai/packs-pro/tree/main/pro/coder) (AIPack is in Rust, and AI Packs are in md / lua)
i said well yeah, but its too sophiscated to be practical
I prompt, press run, and then I get this flow: dev setup (dev-chat or plan) code-map (incremental 0s 2m for initial) auto-context (~20s to 40s) final AI query (~30s to 2m)
For example, just now, in my Rust code (about 60k LOC), I wanted to change the data model and brainstorm with the AI to find the right design, and here is the auto-context it gave me:
- Reducing 381 context files ( 1.62 MB)
- Now 5 context files ( 27.90 KB)
- Reducing 11 knowledge files ( 30.16 KB)
- Now 3 knowledge files ( 5.62 KB)
The knowledge files are my "rust10x" best practices, and the context files are the source files.
(edited to fix formatting)
In the end it's hard to measure but personally I feel that my agent rarely misses any context for a given task, so I'm pretty happy with it.
I used a different approach than tree-sitter because I thought I found a nice way to get around having to write language-specific code. I basically use VSCode as a language backend and wrote some logic to basically rebuild the AST tree from VSCode's symbol data and other API.
That allows me to just install the correct language extension and thus enable support for that specific language. The extension has to provide symbol information which most do through LSP.
In the end it was way more effort than just using tree-sitter, however, and I'm thinking of doing a slow migration to that approach sooner or later.
Anyways, I created an extension that spins up an mcp server and provides several tools that basically replace the vanilla discovery tools in my workflow.
The approach is similar to yours, I have an overview tool which runs different centrality ranking metrics over the whole codebase to get the most important symbols and presents that as an architectural overview to the LLM.
Then I have a "get-symbol-context" tool which allows the AI to get all the information that the AST holds about a single symbol, including a parameter to include source code which completely replaces grepping and file reading for me.
The tool also specifies which other symbols call the one in question and which others it calls, respectively.
But yeah, sorry for this being already a quite long comment, if you want to give it a try, I published it on the VSCode marketplace a couple of days ago, and it's basically free right now, although I have to admit that I still want to try to earn a little bit of money with it at some point.
Right now, the daily usage limit is 2000 tool calls per day, which should be enough for anybody.
Would love to hear what you think :)
<https://marketplace.visualstudio.com/items?itemName=LuGoSoft...>
The fact that you've been using it for six months and that it performs well says a lot. At the end of the day, that's what counts.
I like your idea of piggybacking on top of the LSP services, and I can imagine that this was quite a bit of work. Doing it as an MCP server makes it usable across different tools.
I also really like the name "Context Master."
In my case, it's much more niche since it's for the tool I built. Though it's open source, the key difference is that the "indexing" is only agentic at this point.
I can see value in mixing the two. LSP integration scares me because of the amount of work involved, and tree-sitter seems like a good path.
In that case, in the code map, for each item, there could be both the LLM response info and some deterministic info, for example, from tree-sitter.
That being said, the current approach works so well that I think I am going to keep using and fine-tuning it for a while, and bring in deterministic context only when or if I need it.
Anyway, what you built looks great. If it works, that's great.
So, small model figures out which files to use based on the code map, and then enriches with snippets, so big model ideally gets preloaded with relevant context / snippets up front?
Where does code map live? Is it one big file?
I also have a `.tmpl-code-map.jsonl` in the same folder so all of my tasks can add to it, and then it gets merged into context-code-map.json.
I keep mtime, but I also compute a blake3 hash, so if mtime does not match, but it is just a "git restore," I do not redo the code map for that file. So it is very incremental.
Then the trick is, when sending the code map to AI, I serialize it in a nice, simple markdown format.
- path/to/file.rs - summary: ... - when to use: ... - public types: .., .., .. - public functions: .., .., ..
- ...
So the AI does not have to interpret JSON, just clean, structured markdown.
Funny, I worked on this addition to my tool for a week, planning everything, but even today, I am surprised by how well it works.
I have zero sed/grep in my workflow. Just this.
My prompt is pro@coder/coder-prompt.md, the first part is YAML for the globs, and the second part is my prompt.
There is a TUI, but all input and output are files, and the TUI is just there to run it and see the status.
> - a code map strategy on a big repo. Per file: summary, when_to_use, public_types, public_functions. This is done per file and saved until the file changes. With a concurrency of 32, I can usually code-map a huge repo in minutes. (Typically Flash, cheap, fast, and with very good results)
Thanks, but why use any AI to generate this? I would say: you document your functions-in-code, types are provided from the compiler service, so it should all be deterministically available in seconds iso minutes, without burning tokens. Am I missing something?1) Deterministic
- Using a tree-sitter/AST-like approach, I could extract types, functions, and perhaps comments, and put them into an index map.
- Cons:
- The tricky part of this approach is that what I extract can be pretty large per file, for example, comments.
- Then, I would probably need an agentic synthesis step for those comments anyway.
2) Agentic - Since Flash is dirt cheap, I wanted to experiment and skip #1, and go directly to #2.
- Because my tool is built for concurrency, when set to 32, it's super fast.
- The price is relatively low, perhaps $1 or $2 for 50k LOC, and 60 to 90 seconds, about 30 to 45 minutes of AI work.
- What I get back is relatively consistent by file, size-wise, and it's just one trip per file.
So, this is why I started with #2.And then, the results in real coding scenarios have been astonishing.
Way above what I expected.
The way those indexes get combined with the user prompt gets the right files 95% of the time, and with surprisingly high quality.
So, I might add deterministic aspects to it, but since I think I will need the agentic step anyway, I have deprioritized it.
I imagine if the context were being commited and kept up-to-date with CI would work for others to use as well.
However, I'm a little confused on the autocontext/globs narrowing part. Do you, the developer, provide them? Or you feed the full code map to flash + your prompt so it returns the globs based on your prompt?
Also, in general, is your map of a file relatively smaller than the file itself, even for very small files?
- I have two main globs, which are lists of globs: knowledge_globs and context_globs. Knowledge can be absolute and should be relatively static. context_globs have to be relative to the workspace, since they are the working files.
- As a dev, you provide them in the top YAML section of the coder-prompt.md.
- The auto-context sub-agent calls the code-map sub-agent. Sub-agents can add to or narrow the given globs, and that is the goal of the auto-context agent.
It looks complicated, but it actually works like a charm.
Hopefully, I answered some of your questions.
I need to make a video about it.
But regardless, I really think it's not about the tools, it's about the techniques. This is where the true value is.
They're solving a different problem than you. So I think it's very plausible that you could come up with something that, for your use case, performs considerably better than their "defaults".
And that does not prevent mixing and matching the two, as some comments in this thread suggest.
Anyway, it's a great time for production coding.
> Standard pricing now applies across the full 1M window for both models, with no long-context premium. Media limits expand to 600 images or PDF pages.
For Claude Code users this is huge - assuming coherence remains strong past 200k tok.
No vibes allowed: https://youtu.be/rmvDxxNubIg?is=adMmmKdVxraYO2yQ
1) No longer found the dumb zone
2) No longer feared compaction
Switching to Opus for stupid political reasons, I still have not had the dumb zone - but I'm back to disliking compaction events and so the smaller context window it has, has really hurt.
I hope they copy OpenAI's compaction magic soon, but I am also very excited to try the longer context window.
His fix for "the dumb zone" is the RPI Framework:
● RESEARCH. Don't code yet. Let the agent scan the files first. Docs lie. Code doesn't.
● PLAN. The agent writes a detailed step-by-step plan. You review and approve the plan, not just the output. Dex calls this avoiding "outsourcing your thinking." The plan is where intent gets compressed before execution starts.
● IMPLEMENT. Execute in a fresh context window. The meta-principle he calls Frequent Intentional Compaction: don't let the chat run long. Ask the agent to summarize state, open a new chat with that summary, keep the model in the smart zone.
For me, it's less about being able to look back -800k tokens. It's about being able to flow a conversation for a lot longer without forcing compaction. Generally, I really only need the most recent ~50k tokens, but having the old context sitting around is helpful.
youtu.be/rmvDxxNubIg?is=adMmmKdVxraYO2yQI'm using CC (Opus) thinking and Codex with xhigh on always.
And the models have gotten really good when you let them do stuff where goals are verifiable by the model. I had Codex fix a Rust B-rep CSG classification pipeline successfully over the course of a week, unsupervised. It had a custom STEP viewer that would take screenshots and feed them back into the model so it could verify the progress resp. the triangle soup (non progress) itself.
Codex did all the planning and verification, CC wrote the code.
This would have not been possible six months ago at all from my experience.
Maybe with a lot of handholding; but I doubt it (I tried).
I mean both the problem for starters (requires a lot of spatial reasoning and connected math) and the autonomous implementation. Context compression was never an issue in the entire session, for either model.
That said, 120k is pleeenty if you’re just building front-end components and have your API spec on hand already.
I tried to ask questions about path of exile 2. And even with web research on it gave completely wrong information... Not only outdated. Wrong
I think context decay is a bigger problem then we feel like.
As an example of doing this in a session with jagged alliance 3 (an rpg) https://pastes.io/jagged-all-69136
Claude extracting game archives and dissasembling leads to far more reliable results than random internet posts.
It’s lead to me starting new chats with bigger and bigger starting ‘summary, prompts to catch the model up while refreshing it. Surely there’s a way to automate that technique.
The people I work with who complain about this type of thing horribly communicate their ask to the llm and expect it to read their minds.
In my experience the model will assume the web results are the answer even if the search engine returns irrelevant garbage.
For example you ask it a question about New Jersey law and the web results are about New York or about "many states" it'll assume the New York info or "many states" info is about New Jersey.
It'll remain a human job for quite a while too. Separability is not a property of vector spaces, so modern AIs are not going to be good at it. Maybe we can manage something similar with simplical complexes instead. Ideally you'd consult the large model once and say:
> show me the small contexts to use here, give me prompts re: their interfaces with their neighbors, and show me which distillations are best suited to those tasks
...and then a network of local models could handle it from there. But the providers have no incentive to go in that direction, so progress will likely be slow.
(Note that I'm using it in more of a hands-on pair-programming mode, and not in a fully-automated vibecoding mode.)
I have not tested, but I would expect more niche ecosystems like Rust or Haskell or Erlang to have better overall training set (developer who care about good engineering focus on them), and potentially produce the best output.
For C and C++, I'd expect similar situation with Python: while not as approachable, it is also being pushed on beginning software engineers, and the training data would naturally have plenty of bad code.
Writing quick python scripts works a lot better than niche domain specific code
I have seen these shine on frontend work
In practice, I haven't found this to be the case at all with Claude Code using Opus 4.6. So maybe it's another one of those things that used to be true, and now we all expect it to be true.
And of course when we expect something, we'll find it, so any mistakes at 150k context use get attributed to the context, while the same mistake at 50k gets attributed to the model.
Personally, I’m on a 6M+ line codebase and had no problems with the old window. I’m not sending it blindly into the codebase though like I do for small projects. Good prompts are necessary at scale.
I'm not an expert but maybe this explains context rot.
Normally buying the bigger plan gives some sort of discount.
At Claude, it's just "5 times more usage 5 times more cost, there you go".
They would probably implement _diminishing_-value pricing if pure pricing efficiency was their only concern.
EDIT: Don't think Pro has access to it, a typical prompt just hit the context limit.
The removal of extra pricing beyond 200k tokens may be Anthropic's salvo in the agent wars against GPT 5.4's 1M window and extra pricing for that.
I start with a PRD, ask for a step-by-step plan, and just execute on each step at a time. Sometimes ideas are dumb, but checking and guiding step by step helps it ship working things in hours.
It was also the first AI I felt, "Damn, this thing is smarter than me."
The other crazy thing is that with today's tech, these things can be made to work at 1k tokens/sec with multiple agents working at the same time, each at that speed.
It gave me an impressive plan of attack, including a reasonable way to determine which code it could safely modify. I told it to start with just a few files and let me review; its changes looked good. So I told it to proceed with the rest of the code.
It made hundreds of changes, as expected (big code base). And most of them were correct! Except the places where it decided to do things like put its "const x = useMemo(...)" call after some piece of code that used the value of "x", meaning I now had a bunch of undefined variable references. There were some other missteps too.
I tried to convince it to fix the places where it had messed up, but it quickly started wanting to make larger structural changes (extracting code into helper functions, etc.) rather than just moving the offending code a few lines higher in the source file. Eventually I gave up trying to steer it and, with the help of another dev on my team, fixed up all the broken code by hand.
It probably still saved time compared to making all the changes myself. But it was way more frustrating.
I have heard from people who regularly push a session through multiple compactions. I don’t think this is a good idea. I virtually never do this — when I see context getting up to even 100k, I start making sure I have enough written to disk to type /new, pipe it the diff so far, and just say “keep going.” I learned recently that even essentials like the CLAUDE.md part of the prompt get diluted through compactions. You can write a hook to re-insert it but it's not done by default.
This fresh context thing is a big reason subagents might work where a single agent fails. It’s not just about parallelism: each subagent starts with a fresh context, and the parent agent only sees the result of whatever the subagent does — its own context also remains clean.
There's probably a parallel with the CMSes and frameworks of the 2000s (e.g. WordPress or Ruby on Rails). They massively improved productivity, but as a junior developer you could get pretty stuck if something broke or you needed to implement an unconventional feature. I guess it must feel a bit similar for non-developers using tools like Claude Code today.
Definitely not ideal, but sure helps.
You need to converge on the requirements.
Just today I asked Claude using opus 4.6 to build out a test harness for a new dynamic database diff tool. Everything seemed to be fine but it built a test suite for an existing diff tool. It set everything up in the new directory, but it was actually testing code and logic from a preexisting directory despite the plan being correct before I told it to execute.
I started over and wrote out a few skeleton functions myself then asked it write tests for those to test for some new functionality. Then my plan was to the ask it to add that functionality using the tests as guardrails.
Well the tests didn’t actually call any of the functions under test. They just directly implemented the logic I asked for in the tests.
After $50 and 2 hours I finally got something working only to realize that instead of creating a new pg database to test against, it found a dev database I had lying around and started adding tables to it.
When I managed to fix that, it decided that it needed to rebuild multiple docker components before each test and test them down after each one.
After about 4 hours and $75, I managed to get something working that was probably more code than I would have written in 4 hours, but I think it was probably worse than what I would have come up with on my own. And I really have no idea if it works because the day was over and I didn’t have the energy left to review it all.
We’ve recently been tasked at work with spending more money on Claude (not being more productive the metric is literally spending more money) and everyone is struggling to do anything like what the posts on HN say they are doing. So far no one in my org in a very large tech company has managed to do anything very impressive with Claude other than bringing down prod 2 days ago.
Yes I’m using planning mode and clearing context and being specific with requirements and starting new sessions, and every other piece of advice I’ve read.
I’ve had much more luck using opus 4.6 in vs studio to make more targeted changes, explain things, debug etc… Claude seems too hard to wrangle and it isn’t good enough for you to be operating that far removed from the code.
I think it's the big investors' extremely powerful incentives manifesting in the form of internet comments. The pace of improvement peaked at GPT-4. There is value in autocomplete-as-a-service, and the "harnesses" like Codex take it a lot farther. But the people who are blown away by these new releases either don't spend a lot of time writing code, or are being paid to be blown away. This is not a hockey stick curve. It's a log curve.
Bigger context windows are a welcome addition. And stuff like JSON inputs is nice too. But these things aren't gonna like, take your SWE job, if you're any good. It's just like, a nice substitute for the Google -> Stack Overflow -> Copy/Paste workflow.
Huh? The max plan is $200/month. How are you spending $75 in 4 hrs?
- I ask for something highly general and claude explores a bit and responds.
- We go back and forth a bit on precisely what I'm asking for. Maybe I correct it a few times and maybe it has a few ideas I didn't know about/think of.
- It writes some kind of plan to a markdown file. In a fresh session I tell a new instance to execute the plan.
- After it's done, I skim the broad strokes of the code and point out any code/architectural smells.
- I ask it to review it's own work and then critique that review, etc. We write tests.
Perhaps that sounds like a lot but typically this process takes around 30-45 minutes of intermittent focus and the result will be several thousand lines of pretty good, working code.
GPT 5.4 on codex cli has been much more reliable for me lately. I used to have opus write and codex review, I now to the opposite (I actually have codex write and both review in parallel).
So on the latest models for my use case gpt > opus but these change all the time.
Edit: also the harness is shit. Claude code has been slow, weird and a resource hog. Refuses to read now standardized .agents dirs so I need symlink gymnastics. Hides as much info as it can… Codex cli is working much better lately.
Kinda funny how you don't actually need to use coercion if you put in the engineering work to build a product that's competitive on its own technical merits...
If you're not using AI you are cooked. You just don't realize it yet.
All programming is like this to some extent, but Claude's 80/20 behavior is so much more extreme. It can almost build anything in 15-30 minutes, but after those 15-30 minutes are up, it's only "almost built". Then you need to spend hours, days, maybe even weeks getting past the "almost".
Big part of why everyone seems to be vibe coding apps, but almost nobody seems to be shipping anything.
I had Opus 4.6 tell me I was "seeing things wrong" when I tried to have it correct some graphical issues. It got stuck in a loop of re-introducing the same bug every hour or so in an attempt to fix the issue.
I'm not disagreeing with your experience, but in my experience it is largely the same as what I had with Opus 4.5 / Codex / etc.
It started by insisting I was repeatedly making a typo and still would not budge even after I started copy/pasting the full terminal history of what I was entering and the unabridged output, and eventually pivoted to darkly insinuating I was tampering with my shell environment as if I was trying to mislead it or something.
Ultimately it turned out that it forgot it was supposed to be applying the fixes to the actual server instead of the local dev environment, and had earlier in the conversation switched from editing directly over SSH to pushing/pulling the local repo to the remote due to diffs getting mangled.
I also thought it was OPUS 4.5 (also tested a lot with 4.6) and then in February switched to only using auto mode in the coding IDEs. They do not use OPUS (most of the times), and I’m ending up with a similar result after a very rough learning curve.
Now switching back to OPUS I notice that I get more out of it, but it’s no longer a huge difference. In a lot of cases OPUS is actually in the way after learning to prompt more effectively with cheaper models.
The big difference now is that I’m just paying 60-90$ month for 40-50hrs of weekly usage… while I was inching towards 1000$ with OPUS. I chose these auto modes because they don’t dig into usage based pricing or throttling which is a pretty sweet deal.
Is it Baader-Meinhof or is everyone on HN suddenly using obscure acronyms?
It was about a problem with calculation around filling a topographical water basin with sedimentation where calculation is discrete (e.g. turn based) and that edge case where both water and sediments would overflow the basin; To make the matter simple, fact was A, B, C, and it oscillated between explanation 1 which refuted C, explanation 2 which refuted A and explanation 3 that refuted B.
I'll give it to opus training stability that my 3 tries using it all consistently got into this loop, so I decided to directly order it to do a brute force solution that avoided (but didn't solve) this problem.
I did feel like with a human, there's no way that those 3 loop would happen by the second time. Or at least the majority of us. But there is just no way to get through to opus 4.6
1000% agree. It's also easy to talk to it about something you're not sure it said and derive a better, more elegant solution with simple questioning.
Gemini 3.1 also gives me these vibes.
Horizontal parallelising of tasks doesn't really require any modern tech.
But I agree that Opus 4.6 with 1M context window is really good at lots of routine programming tasks.
Super simple problem :
I had a ZMK keyboard layout definition I wanted it to convert it to QMK for a different keyboard that had one key less so it just had to trim one outer key. It took like 45 minutes of back and forth to get it right - I could have done it in 30 min manually tops with looking up docs for everything.
Capability isn't the impressive part it's the tenacity/endurance.
That being said it's the only use case for me. I won't subscribe to something that I can't use with third party harness.
Not sure if this means I should get a more interesting job or if we are all going to be at the mercy of UBI eventually.
Sounds like it is.
Also shout out to beads - I highly recommend you pair it with beads from yegge: opus can lay out a large project with beads, and keep track of what to do next and churn through the list beautifully with a little help.
I'm late to the party and I'm just getting started with Antrophic models. I have been finding Sonnet decent enough, but it seems to have trouble naming variables correctly (it's not just that most names are poor and undescriptive, sometimes it names it wrong, confusing) or sometimes unnecessarily declaring, re-declaring variables, encoding, decoding, rather than using the value that's already there etc. Is Opus better at this?
The stats claim Opus at 1M is about like 5.4 at 256k -- these needle long context tests don't always go with quality reasoning ability sadly -- but this is still a significant improvement, and I haven't seen dramatic falloff in my tests, unlike q4 '25 models.
p.s. what's up with sonnet 4.5 getting comparatively better as context got longer?
My employer only pays for GitHub copilot extension
What is OpenAIs response to this? Do they even have 1M context window or is it still opaque and "depends on the time of day"
Also it is really good for writing SketchUp plugins in ruby. It one shots plugins that are in some versions better then commercial one you can buy online.
CC will change development landscape so much in next year. It is exciting and terrifying in same time.
If the chat client is resending the whole conversation each turn, then once you're deep into a session every request already includes tens of thousands of tokens of prior context. So a message at 70k tokens into a conversation is much "heavier" than one at 2k (at least in terms of input tokens). Yes?
As soon as I saw the announcement , tried again and created a working design skill that can create design artifacts following the brand guidelines.
While these improvements seem incremental, they have a compounding effect on usefulness.
My AI doomsday calculator just got decremented by anothet 6 months.
Next step should be to allow fast mode to draw from the $200/mo usage balance. Again, I pay $200/mo, I should at least be able to send a single message without being asked to cough up more. (One message in fast mode costs a few dollars each) One would think $200/mo would give me any measure of ability to use their more expensive capabilities but it seems it's bucketed to only the capabilities that are offered to even free users.
A 1hr of a senior dev is at least $100, depending where one lives. Since Claude saves me hours every day, it pays for itself almost instantly. I think the economic value of the Claude subscription is on the order of $20-40k a month for a pro.
If you are really interested in deep NIAH tasks, external symbolic recursion and self-similar prompts+tools are a much bigger unlock than more context window. Recursion and (most) tools tend to be fairly deterministic processes.
I generally prohibit tool calling in the first stack frame of complex agents in order to preserve context window for the overall task and human interaction. Most of the nasty token consumption happens in brief, nested conversations that pass summaries back up the call stack.
In my experience dumping a summary + starting a fresh session helps in these cases.
Do long context windows make much sense then or is this just a way of getting people to use more tokens?
So for me this is a pretty huge change as the ceiling on a single prompt just jumped considerably. I'm replaying some of my less effective prompts today to see the impact.
(And, yeah, I'm all Claude Code these days...)
Full clone of Panel de Pon/Tetris attack with full P2P rollback online multiplayer: https://panel-panic.com
An emulator of the MOS 6502 CPU with visual display of the voltage going into the DIP package of the physical CPU: https://larsdu.github.io/Dippy6502/
I'm impressed as fuck, but a part of me deep down knows that I know fuck all about the 6502 or its assembly language and architecture, and now I'll probably never be motivated to do this project in a way that I would've learned all the tings I wanted to learn.
Does that mean it's likely not a Transformer with quadratic attention, but some other kind of architecture, with linear time complexity in sequence length? That would be pretty interesting.
They are probably doing something like putting the original user prompt into the model's environment and providing special tools to the model, along with iterative execution, to fully process the entire context over multiple invokes.
I think the Recursive Language Model paper has a very good take on how this might go. I've seen really good outcomes in my local experimentation around this concept:
https://arxiv.org/abs/2512.24601
You can get exponential scaling with proper symbolic stack frames. Handling a gigabyte of context is feasible, assuming it fits the depth first search pattern.
To the extent, that I have started making manual fixes in the code - I haven't had to stoop to this in 2 months.
Max subscription, 100k LOC codebases more or less (frontend and backend - same observations).
"put high level description of the change you are making in log.md after every change"
works perfectly in codex but i just cant get calude to do it automatically. I always have to ask "did you update the log".
You can have stuff like for the "stop" event, run foobar.sh and in foobar.sh do cool stuff like format your code, run tests, etc.
earlier today i actually spent a bit of time asking claude to make an mcp to introspect that - break the session down into summarized topics, so i could try dropping some out or replacing the detailed messages with a summary - the idea being to compact out a small chunk to save on context window, rather than getting it back to empty.
the file is just there though, you can run jq against it to get a list of writes, and get an agent to summarize
With Opus 1M, LLM edit was very robust and finally useable
Or was that a different company or not GA. It’s all becoming a blur.
maybe itll still be useful, though i only have opus at 1M, not sonnet yet
maybe i need to unlearn this habit?
However, I can't seem to get Opus 4.6 to wire up proper infrastructure. This is especially so if OSS forks are used. It trips up on arguments from the fork source, invents args that don't exist in either, and has a habit of tearing down entire clusters just to fix a Helm chart for "testing purposes". I've tried modifying the CLAUDE.md and SPEC.md with specific instructions on how to do things but it just goes off on a tangent and starts to negotiate on the specs. "I know you asked for help with figuring out the CNI configurations across 2 clusters but it's too complex. Can we just do single cluster?" The entire repository gets littered with random MD files everywhere for directory specific memories, context, action plans, deprecated action plans, pre-compaction memories etc. I don't quite know which to prune either. It has taken most of the fun out of software engineering and I'm now just an Obsidian janitor for what I can best describe as a "clueless junior engineer that never learns". When the auto compaction kicks in it's like an episode of 50 first dates.
Right now this is where I assume is the limitation because the literature for real-world infrastructure requiring large contexts and integration is very limited. If anyone has any idea if Claude Opus is suitable for such tasks, do give some suggestions.
My main job is running a small eComm business, and I have to both develop software automations for the office (to improve productivity long-term) while also doing non-coding day to day tasks. On top of this, I maintain an open source project after hours. I've also got a young family with 3 kids.
I'm not saying Claude is the damn singularity or anything, but stuff is getting done now that simply wasn't being addressed before.
Im not butt hurt Im just wondering if the overton window has shifted yet.
compaction has been really good in claude we don't even recognize the switch
A context window is a fixed-size memory region. It is allocated once, at conversation start, and cannot grow. Every token consumed — prompt, response, digression — advances a pointer through this region. There is no garbage collector. There is no virtual memory. When the space is exhausted, the system does not degrade gracefully: it faults.
This is not metaphor by loose resemblance. The structural constraints are isomorphic:
No dynamic allocation. In a hard realtime system, malloc() at runtime is forbidden — it fragments the heap and destroys predictability. In a conversation, raising an orthogonal topic mid-task is dynamic allocation. It fragments the semantic space. The transformer's attention mechanism must now maintain coherence across non-contiguous blocks of meaning, precisely analogous to cache misses over scattered memory.
No recursion. Recursion risks stack overflow and makes WCET analysis intractable. In a conversation, recursion is re-derivation: returning to re-explain, re-justify, or re-negotiate decisions already made. Each re-entry consumes tokens to reconstruct state that was already resolved. In realtime systems, loops are unrolled at compile time. In LLM work, dependencies should be resolved before the main execution phase.
Linear allocation only. The correct strategy in both domains is the bump allocator: advance monotonically through the available region. Never backtrack. Never interleave. The "brainstorm" pattern — a focused, single-pass traversal of a problem space — works precisely because it is a linear allocation discipline imposed on a conversation.