> We recommend using the word "think" to trigger extended thinking mode, which gives Claude additional computation time to evaluate alternatives more thoroughly. These specific phrases are mapped directly to increasing levels of thinking budget in the system: "think" < "think hard" < "think harder" < "ultrathink." Each level allocates progressively more thinking budget for Claude to use.
I had a poke around and it's not a feature of the Claude model, it's specific to Claude Code. There's a "megathink" option too - it uses code that looks like this:
let B = W.message.content.toLowerCase();
if (
B.includes("think harder") ||
B.includes("think intensely") ||
B.includes("think longer") ||
B.includes("think really hard") ||
B.includes("think super hard") ||
B.includes("think very hard") ||
B.includes("ultrathink")
)
return (
l1("tengu_thinking", { tokenCount: 31999, messageId: Z, provider: G }),
31999
);
if (
B.includes("think about it") ||
B.includes("think a lot") ||
B.includes("think deeply") ||
B.includes("think hard") ||
B.includes("think more") ||
B.includes("megathink")
)
return (
l1("tengu_thinking", { tokenCount: 1e4, messageId: Z, provider: G }), 1e4
);
Notes on how I found that here: https://simonwillison.net/2025/Apr/19/claude-code-best-pract.....as if we're operating this machine as analog synth
Surely Anthropic could do a better job implementing dynamic thinking token budgets.
/thinking-tokens 32k
Or, shorthand: /think 32kSimilarly, I do miss an —add command line flag to manual specify the context (files) during the session. Right now I pretty much end up copy pasting the relative paths from VSCode and supply to Claude. Aider has much better semantics for such stuff.
You can use English or —add if you want to tell Claude to reference them.
https://www.paritybits.me/think-toggles-are-dumb/
https://nilock.github.io/autothink/
LLMs with broad contextual capabilities shouldn't need to be guided in this manor. Claude can tell a trivial task from a complex one just as easily as I can, and should self-adjust, up to thresholds of compute spending, etc.
I mean finding your way around a manor can be hard, it's easier in an apartment.
Also 14 string scans seems a little inefficient!
---
If you get a hang of controlling costs, it's much cheaper. If you're exhausting the context window, I would not be surprised if you're seeing high cost.
Be aware of the "cache".
Tell it to read specific files (and only those!), if you don't, it'll read unnecessary files, or repeatedly read sections of files or even search through files.
Avoid letting it search - even halt it. Find / rg can have a thousands of tokens of output depending on the search.
Never edit files manually during a session (that'll bust cache). THIS INCLUDES LINT.
The cache also goes away after 5-15 minutes or so (not sure) - so avoid leaving sessions open and coming back later.
Never use /compact (that'll bust cache, if you need to, you're going back and forth too much or using too many files at once).
Don't let files get too big (it's good hygiene too) to keep the context window sizes smaller.
Have a clear goal in mind and keep sessions to as few messages as possible.
Write / generate markdown files with needed documentation using claude.ai, and save those as files in the repo and tell it to read that file as part of a question. I'm at about ~$0.5-0.75 for most "tasks" I give it. I'm not a super heavy user, but it definitely helps me (it's like having a super focused smart intern that makes dumb mistakes).
If i need to feed it a ton of docs etc. for some task, it'll be more in the few $, rather than < $1. But I really only do this to try some prototype with a library claude doesn't know about (or is outdated). For hobby stuff, it adds up - totally.
For a company, massively worth it. Insanely cheap productivity boost (if developers are responsible / don't get lazy / don't misuse it).
I use Aider. It's awesome. You explicitly specify the files. You don't have to do work to limit context.
I do find Claude Code to be really good at exploration though - like checking out a repository I'm unfamiliar with and then asking questions about it.
The output of /cost looks like this:
> /cost
⎿ Total cost: $0.1331
Total duration (API): 1m 13.1s
Total duration (wall): 1m 21.3sWhat do you use for the model? Claude? Gemini? o3?
Developers tend to seriously underestimate the opportunity cost of their own time.
Hint - it’s many multiples of your total compensation broken down to 40 hour work weeks.
Substitute “cost” with “time” in the above post and all of the same tips are still valuable.
I don’t do much agentic LLM coding but the speed (or lack thereof) was one of my least favorite parts. Using any tricks that narrow scope, prevent reprocessing files over and over again, or searching through the codebase are all helpful even if you don’t care about the dollar amount.
Mostly I just use the 4o model instead of the newer better models because it is faster. It's good enough mostly and I prefer getting a good enough answer quickly than the perfect answer after a few minutes. Mostly what I ask is not rocket science so perfect is the enemy of good here. I rarely have to escalate to better models. The reasoning models are annoyingly slow. Especially when they go down the wrong track, which happens a lot.
And my cost is a predictable 20$/month. The downside is that the scope of what I can ask is more limited. I'd like it to be able to "see" my whole code base instead of just 1 file and for me to not have to micro manage what the model looks at. Claude can do that if you don't care about money. But if you do, you are basically micro managing context. That sounds like monkey work that somebody should automate. And it shouldn't require an Einstein sized artificial brain to do that.
There must be people that are experimenting with using locally running more limited AI models to do all the micromanaging that then escalate to remote models as needed. That's more or less what Apple pitched for Apple AI at some point. Sounds like a good path forward. I'd be curious to learn about coding tools that do something like that.
In terms of cost, I don't actually think it's unreasonable to spend a few hundred dollars per month on this stuff. But I question the added value over the 20$ I'm spending. I don't think the improvement is 20x better. more like 1.5x. And I don't like the unpredictability of this and having to think about how expensive a question is going to be.
I think a lot of the short term improvement is going to be a mix of UX and predictable cost. Currently the tools are still very clunky and a bit dumb. The competition is going to be about predictable speed, cost and quality. There's a lot of room for improvement here.
If I spend $2 instead of $0.50 on a session but I had to spend 6 minutes thinking about context, I haven't gained any money.
Correlation or causation aside, the same people I see complain about cost, complain about quality.
It might indicate more tightly controlled sessions may also produce better results.
Or maybe it's just people that tend to complain about one thing, complain about another.
So instead of Write Hit Hit Hit
It's Write Write Hit Hit Hit
I definitely get value out of it- more than any other tool like it that I've tried.
"please fix the authorization bug in /api/users/:id".
You'd start by grepping the code base and trying to understand it.
Compare that to, "fix the permission in src/controllers/users.ts in the function `getById`. We need to check the user in the JWT is the same user that is being requested"
VIM, Emacs and Excel are obvious power tools which may require you to think but often produce unrivalled productivity for power users
So I don't think the verdict that the product has a bad UI is fair. Natural language interfaces is such a step up from old school APIs with countless flags and parameters
I guess "context length" in AIs is what I intuitively tracked with people already. It can be a struggle to connect the Zabbix alert, the ticket and the situation on the system already, even if you don't track down all the zabbix code and scripts. And then we throw in Ansible configuring the thing, and then the business requriements by more, or less controlled dev-teams. And then you realize dev is controlled by impossible sales-terms.
These are scope -- or I guess context -- expansions that cause people to struggle.
And most of all Claude Code is overeager to start messing with your code and run unnecessary $$ instead of making sensible plan.
This isn't problem with Claude Sonnet - it is fundamnetal problem with Claude Code.
Yesterday I gave up and disabled my format-on-save config within VSCode. It was burning way too many tokens with unnecessary file reads after failed diffs. The LLMs still have a decent number of failed diffs, but it helps a lot.
Just to make sure we're on the same page. There are two things in play. First, a language model's ability to know what file you are referring to. Second, an assistant's ability to make sure the right file is in the context window. In your experience, how does Claude Code compare to Copilot w.r.t (1) and (2)?
In most cases, it is because I am asking the model to do too much at once. Which is fine, I am learning the right level of abstraction/instruction where the model is effective consistently.
But when I read these best practices, I can't help but think of the cost. The multiple CLAUDE.md files, the files of context, the urls to documentation, the planning steps, the tests. And then the iteration on the code until it passes the test, then fixing up linter errors, then running an adversarial model as a code review, then generating the PR.
It makes me want to find a way to work at Anthropic so I can learn to do all of that without spending $100 per PR. Each of the steps in that last paragraph is an expensive API call for us ISV and each requires experimentation to get the right level of abstraction/instruction.
I want to advocate to Anthropic for a scholarship program for devs (I'd volunteer, lol) where they give credits to Claude in exchange for public usage. This would be structured similar to creator programs for image/audio/video gen-ai companies (e.g. runway, kling, midjourney) where they bring on heavy users that also post to social media (e.g. X, TikTok, Twitch) and they get heavily discounted (or even free) usage in exchange for promoting the product.
There are ways to use LLMs cheaply, but it will always be expensive to get the most out of them. In fact, the top end will only get more and more costly as the lengths of tasks AIs can successfully complete grows.
It would be no different than me saying "it sucks university is so expensive, I wish I could afford to go to an expensive college but I don't have a scholarship" and someone then answers: why should it be cheap.
So, allow me the space to express my feelings and propose alternatives, of which scholarships are one example and creative programs are another. Another one I didn't mention would be the same route as universities force now: I could take out a loan. And I could consider it an investment loan with the idea it will pay back either in employment prospects or through the development of an application that earns me money. Other alternatives would be finding employment at a company willing to invest that $100/day through me, the limit of that alternative being working at an actual foundational model company for presumably unlimited usage.
And of course, I could focus my personal education on squeezing the most value for the least cost. But I believe the balance point between slightly useful and completely transformative usages levels is probably at a higher cost level than I can reasonably afford as an independent.
There's an ocean of B2B SaaS services that would save customers money compared to building poor imitations in-house. Despite the Joel Test (almost 25 years old! craxy...) asking whether you buy your developers the best tools that money can buy, because they're almost invariably cheaper than developer salaries, the fact remains that most companies treat salaries as a fixed cost and everything else threatens the limited budget they have.
Anybody who has ever tried to sell developer tooling knows, you're competing with free/open-source solutions, and it aint a fair fight.
Not when you need an SWE in order for it to work successfully.
I don't find this particularly problematic because I can quickly see the unnecessary changes in git and revert them.
Like, I guess it would be nice if I didn't have to do that, but compared to the value I'm getting it's not a big deal.
I have become obsessive about doing git commits in the way I used to obsess over Ctrl-S before the days of source control. As soon as I get to a point I am happy, I get the LLM to do a check-point check in so I can minimize the cost of doing a full directory revert.
But from a time and cost perspective, I could be doing much better. I've internalized the idea that when the LLM goes off the rails it was my fault. I should have prompted it better. So I am now consider: how do I get better faster? And the answer is I do it as much as I can to learn.
I don't just want to whine about the process. I want to use that frustration to help me improve, while avoiding going bankrupt.
You can protect your files in a non-AI way: by simply not giving write access to Aider.
Also, apparently Aider is a bit more economic with tokens than other tools.
I am hesitant because I am paying for Cursor now and I get a lot of model usage included within that monthly cost. I'm cheap, perhaps to a fault even when I could afford it, and I hate the idea of spending twice when spending once is usually enough. So while Aider is potentially cheaper than Claude Code, it is still more than what I am already paying.
I would appreciate any comments on people who have made the switch from Cursor to Aider. Are you paying more/less? If you are paying more, do you feel the added value is worth the additional cost? If you are paying less, do you feel you are getting less, the same or even more?
Tweaking claude.md files until the desired result is achieved is similar to a back and forth email chain with the contractor. The difference being that the contractor can be held accountable in our human legal system and can be made to follow their "prompt" very strictly. The LLM has its own advantages, but they seem to be a subset since the human contractor can also utilize an LLM.
Those who get a lot of uplift out of the models are almost certainly using them in a cybernetic manner wherein the model is an integral part of an expert's thinking loop regarding the program/problem. Defining a pile of policies and having the LLM apply them to a codebase automatically is a significantly less impactful use of the technology than having a skilled human developer leverage it for immediate questions and code snippets as part of their normal iterative development flow.
If you've got so much code that you need to automate eyeballs over it, you are probably in a death spiral already. The LLM doesn't care about the terrain warnings. It can't "pull up".
Faced with us as a client, the LLM has infinite patience at linear but marginal cost (relative to your thinking/design time cost, and the value of instant iteration as you realize what you meant to picture and say).
With offshoring, telling them they're getting it wrong is not just horrifically slow thanks to comms and comprehension latency, it makes you a problem client, until soon you'll find the do-over cost becomes neither linear nor marginal.
Don't sleep on the power of small fast iterations (not vibes, concrete iterations), with an LLM tool that commits as you go and can roll back both code and mental model when you're down a garden path.
> We humans undervisualize until we see concrete results.
But in my experience, the user has to be better than an Infosys employee to know how to convey the task to the LLM and then verify iteratively.
So more like an experienced engg outsourcing work to a service company engg.
`aider --model gemini --architect --editor-model claude-3.7` and aider will take care of all the fiddly bits including git commits for you.
right now `aider --model o3 --architect` has the highest rating on the Aider leaderboards, but it costs wayyy more than just --model gemini.
Reading the thread, somehow people are paying. It is mindblowing how in place of getting cheaper, development just got more expensive for businesses.
I find LLM-based tools helpful, and use them quite regularly but not 20 bucks+, let alone 100+ per month that claude code would require to be used effectively.
I find this argument very bizarre. $100 is pay for 1-2 hours of developer time. Doesn't it save at least that much time in a whole month?
On a serious note, there is no clear evidence that any of the LLM-based code assistants will contribute to saving developer time. Depends on the phase of the project you are in and on a multitude of factors.
(I'm not judging a specific person here, this is more of a broad commentary regarding our relationship/sense of responsibility/entitlement/lack of empathy when it comes to supporting other people's work when it helps us)
After 2 years of GPT4 release, we can safely say that LLMs don't make finding PMF that much easier nor improve general quality/UX of products, as we still see a general enshittification trend.
If this spending was really game-changing, ChatGPT frontend/apps wouldn't be so bad after so long.
https://news.ycombinator.com/item?id=43683012
Developer time "saved" indeed ;-)
> Have multiple checkouts of your repo
I don’t know why this never occurred to me probably because it feels wrong to have multiple checkouts, but it makes sense so that you can keep each AI instance running at full speed. While LLM‘s are fast, this is one of the annoying parts of just waiting for an instance of Aider or Claude Code to finish something.
Also, I had never heard of git worktrees, that’s pretty interesting as well and seems like a good way to accomplish effectively having multiple checkouts.
Disclaimer, I haven’t tried it personally - if you do, let us know how you go!
How do you keep tabs on multiple agents doing multiple things in a codebase? Is the end deliverable there a bunch of MRs to review later? Or is it a more YOLO approach of trusting the agents to write the code and deploy with no human in the loop?
I like to start by describing the problem and having it do research into what it should do, writing to a markdown file, then get it to implement the changes. You can keep tabs on a few different tasks at a time and you don't need to approve Yolo mode for writes, to keep the cost down and the model going wild.
- Aider - https://aider.chat/ | https://github.com/Aider-AI/aider
- Plandex - https://plandex.ai/ | https://github.com/plandex-ai/plandex
- Goose - https://block.github.io/goose/ | https://github.com/block/goose
Unfortunately I can only give an anecdotal answer here, but I get better results from Cursor than the alternatives. The semantic index is the main difference, so I assume that's what's giving it the edge.
Wasn't it clearly bad when facebook would get real close to buying another company... then decide naw, we got developers out the ass lets just steal the idea and put them out of business
After spending a couple of hours trying to get aider and plandex to run (and then with Google Gemini 2.5 pro), my conclusion is that these tools have a long way to go until they are usable. The breakage is all over the place. Sure, there is promise, but today I simply can't get them to work reasonably. And my time is expensive.
Claude Code just works. I run it (even in a slightly unsupported way, in a Docker container on my mac) and it works. It does stuff.
PS: what is it with all "modern" tools asking you to "curl somewhere.com/somescript.sh | bash". Seriously? Ship it in a docker container if you can't manage your dependencies.
My 2 cents on value for money and effectiveness of Claude vs Gemini for coding:
I've been using Windsurf, VS Code and the new Firebase Studio. The Windsurf subscription allowance for $15 per month seems adequate for reasonable every day use. I find Claude Sonnet 3.7 performs better for me than Gemini 2.5 pro experimental.
I still like VS Code and its way of doing things, you can do a lot with the standard free plan.
With Firebase Studio, my take is that it should good for building and deploying simple things that don't require much developer handholding.
https://kylekukshtel.com/vibecoding-claude-code-cline-sonnet...
Would definitely recommend people reading it for some insight into hands on experience with the tool.
Yep. Here are the API pricing numbers for Gemini vs Claude. All per 1M tokens.
1. Gemini 2.5: in: $0.15; out: $0.60 non-thinking or $3.50 thinking
2. Claude 3.7: in: $3.00; out: $15
[1] https://ai.google.dev/gemini-api/docs/pricing [2] https://www.anthropic.com/pricing#api
Did anyone have equal results with the „unofficial“ fork „Anon Kode“? Or with Roo Code with Gemini Pro 2.5?
Are they saying Claude needs to do the git interaction in order to work and/or will generate better code if it does?
The only problem is that this loss is permanent! As far as I can tell, there's no way to go back to the old conversation after a `/clear`.
I had one session last week where Claude Code seemed to have become amazingly capable and was implementing entire new features and fixing bugs in one-shot, and then I ran `/clear` (by accident no less) and it suddenly became very dumb.
Then, you can edit the file at your leisure if you want to.
And when you want to load that context back in, ask it to read the file.
Works better than `/compact`, and is a lot cheaper.
Edit: It so happens I had a Claude Code session open in my Terminal, so I asked it:
Save your current context to a file.
Claude produced a 91 line md file... surely that's not the whole of its context? This was a reasonably lengthy conversation in which the AI implemented a new feature.And there's CLAUDE.md. it's like cursorrules. You can also have it modify it's own CLAUDE.md.
With Claude Code, all bets are off there. You get a better understanding of your code in each prompt, and the bill can rack up, what, 50x faster?
If I got really stuck on a problem involving many lines of code, I could see myself spinning up Claude Code for that one issue, then quickly going back to Windsurf.
Why doesn’t Claude Code usage count against the same plan that usage of Claude.ai and Claude Desktop are billed against?
I upgraded to the $200/month plan because I really like Claude Code but then was so annoyed to find that this upgrade didn’t even apply to my usage of Claude Code.
So now I’m not using Claude Code so much.
Some base usage included in the plan might be a good balance
It would definitely get me to use it more.
It’s less autonomous, since it’s based on the Claude chat interface, and you need to write “continue” every so often, but it’s nice to save the $$
But every Claude Code user is a 1000 requests per day user, so the economics don't work anymore.
Anthropic make it seem like Claude Code is a product categorized like Claude Desktop (usage of which gets billed against your Claude.ai plan). This is how it signs off all its commits:
Generated with [Claude Code](https://claude.ai/code)
At the very least, this is misleading. It misled me.Once I had purchased the $200/month plan, I did some reading and quickly realized that I had been too quick to jump to conclusions. It still left me feeling like they had pulled a fast on one me.
I am not saying Claude should stop making money, I'm just advocating for giving users the value of getting some Code coverage when you migrate from the basic plan to the pro or max.
Does that make sense?
I'm surprised at the complexity and correctness at which it infers from very simple, almost inadequate, prompts.
Claude Code uses the API interface and API pricing, and writes and edits code directly on your machine, this is a level past simply interacting with a separate chat bot. It seems a little disingenuous to say it's "hostile" to users, when the reality is yeah, you do pay a bit more for more reliable usage tier, for a task that requires it. It also shows you exactly how much it's spent at any point.
Genuinely interested: how's so?
A 2.5 hour session with Claude Code costs me somewhere between $15 and $20. Taking $20/2.5 hours as the estimate, $100 would buy me 12.5 hours of programming.
1. My company cannot justify this cost at all.
2. My company can justify this cost but I don't find it useful.
3. My company can justify this cost, and I find it useful.
4. I find it useful, and I can justify the cost for personal use.
5. I find it useful, and I cannot justify the cost for personal use.
That aside -- 200/day/dev for a "nice to have service that sometimes makes my work slightly faster" is much in the majority of the world.