While you can see them as a productivity enhancing tool, in times of tight budgets, they can be useful to lay off more programmers because a single one is now way more productive than pre-LLM.
I feel that LLMs will increase the barrier to entry for newcomers while also make it easier for companies to layoff more devs as you don't need as many. All in all, I expect salaries for non FAANG devs to decrease while salaries for FAANG devs to increase slightly (given the increased value they can now make).
Any thoughts on this?
Developers (often juniors) use LLM code without taking time to verify it. This leads to bugs and they can't fix it because they don't understand the code. Some senior developers also trust the tool to generate a function, and don't take the time to review it and catch the edge cases that the tool missed.
They rely on ChatGPT to answer their questions instead of taking time to read the documentation or a simple web search to see discussions on stack overflow or blogs about the subject. This may give results in the short term, but they don't actually learn to solve problems themselves. I am afraid that this will have huge negative effects on their career if the tools improve significantly.
Learning how to solve problems is an important skill. They also lose access to the deeper knowledge that enable you to see connections, complexities and flows that the current generation of tools are unable to do. By reading the documentation, blogs or discussions you are often exposed to a wider view of the subject than the laser focused answer of ChatGPT
There will be less room for "vibe coders" in the future, as these tools increasingly solve the simple things without requiring as much management. Until we reach AGI (I doubt it will happen within the next 10 years) the tools will require experienced developers to guide them for the more complex issues. Older experienced developers, and younger developers who have learned how to solve problems and have deep knowledge, will be in demand.
Documentation is not written with answers in mind. Every little project wants me to be an expert in their solution. They want to share with me the theory behind their decisions. I need an answer now.
Web search no longer provides useful information within the first few results. Instead, I get content farms who are worse than recipe pages - explaining why someone would want this information, but never providing it.
A junior isn’t going to learn from information that starts from the beginning (“if you want to make an apple pie from scratch, you must first invent the universe.”) 99.999% of them need a solution they can tweak as needed so they can begin to understand the thing.
LLMs are good at processing and restructuring information so I can ask for things the way I prefer to receive them.
Ultimately, the problem is actually all about verification.
When you see 30-50 years of change you realise this was inevitable and in every generation there's new engineers entering with limited understanding of the layers beneath. Even the code produced. Do I understand the lexers and the compilers that turn my code in to machine code or instruction sets? Heck no. Doesn't mean I shouldn't use the tools available to me now.
Here's an overview of what we've done:
1. *Created Assembly Code for Apple Silicon*: We wrote ARM64 assembly code specifically for your Apple M1 Max processor running macOS, rather than x86 assembly which wouldn't work on your architecture.
2. *Explained the Compilation Process*: We covered how to compile and link the code using the `as` assembler and `ld` linker with the proper flags for macOS on ARM64.
3. *Addressed Development Environment*: We confirmed that you don't need to install a separate assembler since it comes with Apple's Command Line Tools, and provided instructions on how to verify or install these tools.
4. *Optimized the Code*: We refined the code with better alignment for potential performance improvements, though noted that for a "Hello World" program, system call overhead is the main performance factor.
5. *Used macOS-Specific Syscalls*: The assembly code uses the appropriate syscall numbers and conventions specific to macOS on ARM64 architecture (syscalls 4 for write and 1 for exit).
This gives you a basic introduction to writing assembly directly for Apple Silicon, which is quite different from traditional x86 assembly programming.
And as with any automation, there will be a select few who will understand it's inner workings, and a vast majority that will enjoy/suffer the benefits.
Well... is this something new? Previously the trend was to copy and paste Stackoverflow answers, without understanding what it did. Perhaps with LLM code it's an incremental change but the concept is fairly familiar.
1. Developers are building these tools/applications because it's far faster and easier for them to build and iterate on something that they can use and provide feedback on directly without putting a marketer, designer, process engineer in the loop.
2. The level of 'finish' required to ship these kinds of tools to devs is lower. If you're shipping an early beta of something like 'Cursor for SEO Managers' the product would need to be much more user friendly. Look at all the hacking people are doing to make MCP servers and get them to work with Cursor. Non-technical folks aren't going to make that work.
So then, once there is a convergence on 'how' to build this kind of stuff for devs. There will be a huge amount of work to go and smooth out the UX and spread equivalents out across other industries. Claude releasing remote MCPs as 'integrations' in their web ui is the first step of this IMO.
When this wave crashes across the broader SaaS/FAANG world I could imagine more demand for devs again, but you're unlikely going to ever see anything like the early 2020s ever again.
What worries me isn't layoffs but that entry-level roles become rare, and juniors stop building real intuition because the LLM handles all the hard thinking.
You get surface-level productivity but long-term skill rot.
This was a real problem pre-LLM anyway. A popular article from 2012, How Developers Stop Learning[0], coined the term "expert beginner" for developers who displayed moderate competency at typical workflows, e.g. getting a feature to work, without a deeper understanding of lower levels, or a wider high-level view.
Ultimately most developers don't care, they want to collect a paycheck and go home. LLMs don't change this; the dev who randomly adds StackOverflow snippets to "fix" a crash without understanding the root cause was never going to gain a deeper understanding, the same way the dev who blindly copy&pastes from an LLM won't either.
[0] https://daedtech.com/how-developers-stop-learning-rise-of-th...
That's not to say that you're wrong. Most people who use those things don't have a very good idea of what's going on in the next layer down. But it's not new.
I find it interesting how these sort of things are often viewed as a function of technological advancement. I would think that AI development tools would have a marginal effect on wages as opposed to things like interest rates or the ability to raise capital.
Back to the topic at hand however, assuming these tools do get better, it would seemingly greatly increase competition. Assuming these tools get better, a highly skilled team with such tools could prove to be formidable competition to longstanding companies. This would require all companies to up the ante to avoid being outcompeted, requiring even more software to be written.
A company could rest on their laurels, laying off a good portion of their employees, and leaving the rest to maintain the same work, but they run the risk of being disrupted themselves.
Alas, at the job I'm at now my team can't seem to release a rather basic feature, despite everyone being enhanced with AI: nobody seems to understand the code, all the changes seem to break something else, the code's a mess... maybe next year AI will be able to fix this.
The first problem they have gained traction on is programming auto complete, and it is useful.
Generating summaries, pretty marginal benefit (personally I find it useless). Writing emails, quicker just to type "FYI" and press send than instruct the ai. More problems that needed solving will emerge, but it will take time.
At my non-tech job, I can show you three programs written entirely by LLMs that have allowed us to forgo paid software solutions. There is still a moat, IDE's are not consumer friendly, but that is pretty solvable. It will not be long before one of the big AI houses is doing a direct code to offline desktop app IDE that your grandma could use.
It's been valuable to engage with the suggestions and understand how they work—much like using a search engine, but more efficient and interactive.
LLMs have also been helpful in deepening my understanding of math topics. For example, I’ve been wanting to build intuition around linear algebra which for me is a slow process. By asking questions to LLM I find explanations make the underlying concepts more accessible.
For me it's about using these tools to learn more effectively.
So many people benefit from basic things like sorting tables, searching and filtering data etc.
Things were I might just use excel or a small script, they can now use an LLM for it.
And for now, we are still in dire need for more developers and not less. But yes I can imagine that after a golden phase of 5-15 years it will start to go down to the bottom when automaisation and ai got too good / better than the avg joe.
Nonetheless a good news is also that coding LLMs enable researchee too. People who often struggle learning to code.
What happens when most companies do this?
During the 10s, every dev out there was screaming "everyone should learn to code and get a job coding". During the 20s, many devs are being laid off.
For a field full of self-professed smart and logic people, devs do seem to be making tons of irrational choices.
Are we in need of more devs or in need of more skilled devs? Do we necessarily need more software written? Look at npm, the world is flooding in poorly written software that is a null reference exception away from crashing.
Some of us will always maintain code, but most will move higher in the stack to focus on products and their real world application.
In regards to jobs and job losses I have no idea how this is going to impact individual salaries over time in different positions, but I honestly doubt its going to do much. Language models are still pretty bad at working with large projects in a clean and effective way. Maybe that will get better, but I think this generational breakthrough of technology is slowing down a lot.
Even if they do get better, they still need direction and validation. Both of which still require some understanding of what is going on (even vibe coding works better with a skilled engineer).
I suspect there is going to be more "programmers" in the world as a result, but most of them will be producing small boutique single webpage tools and designs that are higher quality than "made by my cousin's kid" that a lot of small businesses have now. Companies > ~30 people with software engineers on staff seem to be using it as a performance enhancer rather than a work replacement tool.
There will always be shitty managers and short-sighted executives that are looking to replace their human staff with some tool, and there will be layoffs but I don't think the overall pool of jobs is going to reduce. For the same reason I don't think there is going to be significant pay adjustments but a dramatic increase in the long-tail of cheap projects that don't make much money on their own.
You could argue that it makes the bar lower to be productive so the candidate pool is much greater, but you're arguing the opposite, increasing the barrier to entry.
I'm open to arguments either way and I'm undecided, but you have to have a coherent economic model.
You need less engineers to do the same, demand gets lower, offer remains as high.
No, you're in a tech bubble. I'm in healthcare, and you'd think that AI note takers and summary generators were the reason LLMs were invented and the lion's share of use. I get a new pitch every day, "this product will save your providers hours every day!" They're great products, and our providers love ours, but it's not saving hours.
There's also a huge push for LLMs to work in search and data-retrieval chatbots. The push there is huge, and now Mistal just released Le Chat Enterprise for that exact market.
LLMs for code are so common because they're really easy to create. It's notepad plus chatGPT. Sure, it's actually VS Code and CoPilot, but you get the idea, it's actually not more complicated than regular chatbots.
The fact is you could be one is the most insanely valuable and productive engineers in the planet might only write a few lines of code most days, but you'll be writing them in a programming language, OS, or kernel. Value is created by understanding direction and by theory-building, and LLMs do neither.
I built a genuinely new product by working hard as a single human while all my competitors tried to be really productive with LLMs. I'm sure their metrics are great, but at the end of the day I, a human working with my hands and brain and sharpening my OWN intelligence have created what productivity metrics cannot buy: real innovation
You decide on a second opinion, and find an old wizened guide who says they always walk not run, never picks a path more quickly than 5 minutes, and promises you that no matter what sales pitch the other guide gives they can get you across the desert in half the time and half the risk to your life.
Both can't be true. Who do you believe and why?
It does everything for him and it gives him results.
So no, I don’t think it’s most useful for programmers, in fact I feel people who are not very techy and not good at Googling for solutions benefit the most as chatGPT (and LLM in general) will hand hold them through every problem they have in life, and is always patient and understanding.
Right now there seem to be two extremely valuable LLM use cases:
1. sidekick/assistant for software developers
2. a tool to let people rapidly explore new knowledge and new ideas; unlike an encyclopedia, being able to ask questions, suggest references and get summaries, etc.
I suspect that the next $$$ valuable use case will be scientific research assistants.
EDIT: I would add that AI in k-12 education will be huge, freeing human teachers to spend more 1 on 1 time with kids while AIs will be patient teaching kids, providing extra time and material as needed.
They might not be aware of this, they don't know how to use an IDE, but the hardest part - the code writing part, is solved.
Every week Rachel in [small company] accounting is manually scanning the same column in the same structured excel documents for amounts that don't align with the master amount for that week. She then creates another excel document to properly structure these findings. Then she fills out a report to submit it.
Rachel is a paragraph prompt away from not ever having to do that again, she just hasn't been given the right nudge yet.
Stable odourless on-demand light was in short supply, so it helped to jump-start a new industry and network.
The real range of possible uses is near endless, for tech available today. It is just a coincidence that coding is in short supply today.
There is some mental overhead switching projects. Meaning even if a developer is more efficient per project he wont get more money (usually less actually) while increasing mental load (more projects, more managers, more requirements, etc).
Will be interesting to watch
Are you implying that non-FAANG devs aren't able to do more with LLMs?
Don’t discount scamming and spreading misinformation. There’s a lot of money to be made there, specially in mass manipulation to destroy trust in governments and journalists. LLMs and image generators are a treasure trove. Even if they’re imperfect, the overwhelmingly majority of people can’t even distinguish a real image from a blatantly false one, let alone biased text.
Programmers aren't paid for coding, they're paid for following a formal spec in a particular problem domain. (Something that LLM's can't do at all.)
Improving coding speed is a red herring and a scam.
It's also irrelevant if LLM's can follow them - the way I use Claude Code is to have it get things roughly working, supply test cases showing where it fails, then review and clean up the code or go additional rounds with more test cases.
That's not much different to how I work with more junior engineers, who are slower and not all that much less error-prone, though the errors are different in character.
If you can't improve coding speed with LLM's, maybe your style of working just isn't amenable to it, or maybe you don't know the tooling well enough - for me it's sped things up significantly.
Very easy to do, sure, but the LLM did this in one minute, recognized the context and correctly converted binary values where as this would have taken me maybe 30 minutes of looking up standards and docs and typing in friendly key names.
I also told it to create five color themes and apply them to the CSS. It worked on the first attempt and it looks good, much better than what I could have had produced by thinking of themes, picking colors and copying RGB codes back and forth. Also I'm not fluent in CSS.
Though I wasn't paid for this, it's a hobby project, which I wouldn't have started in the first place without an LLM performing the boring tedious tasks.
For me it's mainly adding quick and dirty hooks to Wordpress websites from berating marketing c-suits for websites that are gonna disappear or never visited anymore in less than a few months.
For that, whatever Claude spits out is more than enough. I'm reasonably confident I'm not going to write much better code in the less-than-30-minutes I'm allowed to spend to fix whatever issue comes up.
And in all 3 cases, AI has increased my productivity, and I could ship things even when I'm really sleepy or if I have very little time between things, I can send a prompt to an agent and then review things, and then when I have more time, I can clean up some of the mess.
Now my stance is really at "Whoever doesn't take advantage of it is NGMI"
You're specifically very wrong at "LLM's cannot do: following a formal spec in a particular problem domain". It does take skill to ensure that they will, though, for sure.
TLDR: Skill issue
I'd love to have an all-you-can-eat, but $100 p/m isn't compelling enough compared to copy/paste for $20 p/m via chat.
That's not to say the value doesn't exceed $100, I just don't want to pay it.
Yes, and thats why phone contracts migrated from "$0.0X per minute" to "$X for up to 500 minutes", and finally "$X for unlimited calls".
When the service you provide has near zero marginal cost, you'd prefer the customer use it as much as possible, because then it'll provide more value to them and they'll be prepared to pay more.
When I switched to DSL the stress went away, and I found myself using internet in different ways than before, because I could explore freely without time pressure.
I think this applies to Claude as well. I will probably feel more free to experiment if I don't have to worry about costs. I might do things I would never think of if I'm only focused on using it as little as possible to save money.
100% with you that how you access something can add constraints and stress - in my case there while we paid per minute, the big factor was the time windows. To maximise utility you wanted to include something useful in as many of the exchanges as possible.
With Claude Code as it is now, I often clear context more often than ideal because it will drive up cost. I could probably add a lot more details to CLAUDE.md in my repos, but it'll drive up tokens as well.
Some of it I'll still do because it affects speed as well, but it'll be nice not to have to pay attention to it.
However, as long as Microsoft is offering copilot at (presumably subsidized) $10/mo, I'm not interested in paying 10x as much and still having limits. It would have to be 10x as useful, and I doubt that.
Claude is still the gold standard for AI assisted coding. All your Geminis and o3s of the world still don’t match up to Claude.
I started using Claude code once it became a fixed price with my Claude max subscription. And it’s taken a little getting used to vs Cline, but I think it’s closer to Cline in performance rather than cursor (Cline being my personal gold standard). $100 is something most people on this forum could make back in 1 day of work.
$100 per month for the value is nothing and for what it’s worth I have tried to hit the usage limit and the only thing that got me close was using their deep research feature. I’ve maxed out Claude code without hitting limits.
I might be missing something, but you can use Claude 3.7 in Copilot Chat:
https://docs.github.com/en/copilot/using-github-copilot/ai-m...
VS Code with your favorite model in Copilot is rapidly catching up with Cursor, etc. It's not there yet, but the trajectory is good.
(Maybe you meant code completion? But even smaller, local models do pretty well in code completion.)
I expect so. The question is "How many days does the limit last for?"
Maybe they have a per-day limit, maybe it's per-month (I'm not sure), but paying $100/m and hitting the limit in the first day is not economical.
Claude is still the gold standard for AI assisted coding. All your Geminis and o3s of the world still don’t match up to Claude.
Out of date I think in this fast moving space.
Sonnet has long been the gold-standard, but that position is looking very shaky at the moment; Gemini in particular has been working wonders for me and others when Sonnet has stumbled.
VS Code/Copilot has improved massively in Cursor's wake, but yes, still some way to go to catch up.
Absolutely though - the value we are getting is incredible.
From the internet, we got used to get everything for nothing, thus ppl beg for a lower price, even if it doesn't make sense.
Basically anything that isnt gpt4o is premium, and I find gpt4o near useless compared to Claude and Gemini in copilot.
It's a hit and miss IMO.
I like it for C#/dotnet but completely useless for the rest of the stuff I do (mostly web frontend).
I'm not sure about my usage but if I hit those premium limits I'm probably going to cancel Copilot.
In contrast - I’m not interested in using cheaper, less-than, services for my livelihood.
I'm curious, what was the return? What did you do with the 1k?
This might mean the $10/month is the best. Depends entirely on how it works for you.
(Caps obviously impact the total benefit so I agree there.)
Just to give you one example - last BigCo I worked for had a schematic for new projects which resulted in... 2k EUR per month cloud cost for serving a single static html file.
At one point someone up top decided that kubes is the way to go and scrambled an impromptu schematic for new projects which could be simply described as a continental class dreadnought of a kubernetes cluster on AWS.
And it was signed off, and later followed like a scripture.
Couple stories lower we're having hard time arguing for 50 EUR budget for a weekly beer for the team, but the company is A fine with paying 2K EUR for a landing page.
But it doesn't really matter, because the C-level has been consumed by the hype like nothing I've ever seen. It could cost an arm and a leg and they'd still be pushing for it because the bubble is all-consuming, and anyone not touting AI use doesn't get funding from other similarly clueless and sucked-in VCs.
They don't. They toss a coin.
And it seems like community realizes it and invents different solutions. RooCode has task orchestration built in already, there is a claude task-manager that allows splitting and remembering tasks so AI agent can pick it up quicker, there are different solutions with files like memory bank. Windsurf cursor upgraded their .widsurf/rules functionality to allow more solutions like that for instructing AI agents about the codebase/tasks. Some people even write their own scripts that feed every file to LLM and store the summary description in the separate file that AI agent tool can use instead of searching codebase.
I'm eager to see how some of these solutions will become embedded into every AI agent solution. It's one of the missing stones to make AI agents order of magnitude more efficient and productive.
You can try it for cheap with the normal pay-as-you-go way.
Limits are a given on any plan. It would be too easy for a vibe coder to hammer away 8 hours a day for 20 days a week if there was nothing stopping them.
The real question is whether this is a better value than pay as you go for some people.
Your vibe coders are on a different dimension than mine.
Only reason I can see is you're lacking agregate capacity and are unwilling or unable to build out faster. Is that the case?
I don't think this is the right way to look at it. If CoPilot helps you earn an extra $100 a month (or saves you $100 worth of time), and this one is ~2x better, it still justifies the $100 price tag.
Additionally, when you’re in a compact distribution, being 5% better might be 100x more valuable to you.
Basically, this assumes that the marginal value is associated with cost. I don’t think most things, economically, seem to match that pattern. I will sometimes pay 10x the cost for a good meal that has fewer calories (nutritional value)
I am glad people like you exist, but I don’t think the proposition you suggest makes sense.
You have to puppeteer it and build a meta context/tasking management system. I spend a lot of time setting Claude code up for success. I usually start with Gemini for creating context, development plans, and project tasking outlines (I can feed large portions of codebase to Gemini and rely on its strategy). I’ve even put entire library docsites in my repos for Claude code to use - but today they announced web search.
They also have todos built in which make the above even more powerful.
The end result is insane productivity - I think the only metric I have is something like 15-20k lines of code for a recent distributed processing system from scratch over 5 days.
> I spend a lot of time setting Claude code up for success.
Normally I wouldn't post this because it's not constructive, but this piece stuck out to me and had me wondering if it's worth the trade-off. Not to mention programmers have spent decades fighting against LoC as a metric, so let's not start using it now!
A lot of people seem to have these magic incantations that somehow make LLMs work really well, at the level marketing and investor hype says they do. However, I rarely see that in the real world. I'm not saying this is true for you, but absent vaguely replicable examples that aren't just basic webshit, I find it super hard to believe they're actually this capable.
The interactions and results are roughly in line with what I'd expect from a junior intern. E.g. don't expect miracles, the answers will sometimes be wrong, the solutions will be naive, and you have to describe what you need done in detail.
The great thing about Claude code is that (as opposed to most other tools) you can start it in a large code base and it will be able to find its way, without me manually "attaching files to context". This is very important, and overlooked in competing solutions.
I tried using aider and plandex, and none of them worked as well. After lots of fiddling I could get mediocre results. Claude Code just works, I can start it up and start DOING THINGS.
It does best with simple repetitive tasks: add another command line option similar to others, add an API interface to functions similar to other examples, etc.
In other words, I'd give it a serious thumbs up: I'd rather work with this than a junior intern, and I have hope for improvement in models in the future.
https://gist.github.com/rachtsingh/e3d2e2b495d631b736d24b56e...
Is it correct? Sort of; I don't trust the duration benchmark because benchmarking is hard, but the size should be approximately right. It gave me a pretty clear answer to the question I had and did it quickly. I could have done it myself but it would have taken me longer to type it out.
I don't use it in large codebases (all agentic tools for me choke quickly), but part of our skillset is taking large problems and breaking them into smaller testable ones, and I give that to the agents. It's not frequent (~1/wk).
Where is the breakpoint here? What number of lines of code or tokens in a codebase when it becomes not worth it?
Claude code, too?
I found that it is the only one that does a good job in a large codebase. It seems to be very different from others I've tested (aider, plandex).
If you don't like what it suggests, undo the changes, tweak your prompt and start over. Don't chat with it to fix problems. It gets confused.
Example;
I'm wrapping up, right now, an updated fork of the PHP extension `phpredis` because Redis 8 recently was released with support for a new data type, Vector Set but the phpredis extension (which is far more performant that non-extension redis libraries for PHP) doesn't support the new vector-related commands. I forked the extension repo, which is in C (I'm a PHP developer, I had to install CLion for the first time just to work along with CC) and fired up claude code with the initial prompt/task of analyzing the extensions code and documenting the purpose, conventions, and anything that it (claude) felt would benefit the bootstrapping process of future sessions such that whole files wouldn't need to be read into a CLAUDE.md file.
This initially, depending on the size of the codebase, could be "expensive". Being that this is merely a PHP extension and isn't a huge codebase, I was fine letting it just rip through the whole thing however it saw fit - were this a larger codebase I'd take a more measured approach to this initial "indexing" of the codebase.
This results in a file that claude uses like we do a readme.
Next I end this session, start a new one and tell it to review that CLAUDE.md file (I specifically tell it to do this, every single new session start moving forward) and then generate a general overview/plan of what needs to be done in order to implement the new Vector Set related commands so that I can use this custom phpredis extension in my PHP environments. I indicated that I wanted to generate a suite of tests focused on ensuring each command works with all of it's various required and optional parameters and that I wanted to use docker containers for the testing rather than mess up my local dev environment.
$22 in API costs and ~6 hours spent and I have the extension, working, in my local environment with support for all of the commands I want/need to use. (there's still 5 commands that I don't intend to use that I haven't implemented)
Not only would I have certainly never embarked upon trying to extend a C PHP extension, I wouldn't have done so over the course of an evening and morning.
Another example:
Before this redis vector sets thing I used CC to build a python image and text embedding pipeline backed by Redis streams and Celery that consumes tasks pushed to the stream by my Laravel application that currently manages ~120 million unique strings and ~65 million unique images that I've been generating embeddings for. Prior to this I'd spent very little time with Python and zero with anything related to ML. Now I have a performant python service that's portable that I run from my Macbook (M2 Pro) or various GPU-having Windows machines in my home that generate the embeddings on an 'as available' basis, pushing the results back to a redis stream that my Laravel app then consumes and processes.
The results of these embeddings and the similarity-related features that they've brought to the Laravel application are honestly staggering. And while I'm sure I could have spent months stumbling through all of this on my own - I wouldn't have, I don't have that much time for side project curiosities.
Somewhat related - these similarity features have directly resulted in this side project becoming a service people now pay me to use.
On a day to do - the effectiveness is a learned skill. You really need to learn how to work with it in the same way you, as a layperson, wouldn't stroll up to a highly specialized piece of aviation technology and just infer how to use it optimally. I hate to keep parroting "skill issue" but - it's just wild to me how effective these tools are and how there's so many people who don't seem to be able to find any use.
If it's burning through cash, you're not being focused enough with it. If it's writing code that's always slightly wrong, stop it and make adjustments. Those adjustments likely/potentially need to be documented in something like I described above in a long-running document used similarly to a prompt.
From my own experience, I watch the "/settings/logs" route on anthropics website while CC is working once I know that we're getting rather heavy with the context. Once it gets into the 50-60,000 tokens range I either aim to wrap up whatever the current task is, or I understand that things are going to start getting a little wonky into the 80k+ range. It'll keep on working up into the 120-140k tokens or more - but you're likely going to end up with lots of "dumb" stuff happening. You really don't want to be here unless you're _sooooo close_ to getting done what you're trying to. When the context gets too high and you need/want to reset but you're mid task - /compact [add notes here about next steps] and it'll generate a summary that will then be used to bootstrap the next session. (Don't do this more than once, really, as it starts losing a lot of context - just reset the session fully after the first /compact)
If you're constantly running into huge contexts you're not being focused enough. If you can't even work on anything without reading files with thousands of lines - either break up those files somehow or you're going to have to be _really_ specific with the initial prompt and context - which I've done lots of. Say I have a model that belongs to a 10+ year old project that is 6000 lines long and I want to work on a specific method in that model - I'll just tell claude in the initial message/prompt which line that method starts on, ends on and what number of lines from the start of the model it should read (so it can get the namespace, class name, properties, etc) and then let it do it's thing. I'll tell it specifically not to read more than 50 lines of that file at a time when looking for something or reviewing something, or even to stop and ask me to locate a method/usages of things, etc rather than reading whole files into context.
So, again, if it's burning through money - focus your efforts. If you think you can just fire it up and give it a generic task - you're going to burn money and get either complete junk, or something that might technically work but is hideous, at least to you. But, if you're disciplined and try to set or create boundaries and systems that it can adhere to - it does, for the most part.
Does anyone have advice for maintaining this feeling but also going with the flow and using LLMs to be more productive (since it feels like it'll be required in the next few years at many jobs)? Do I just have to accept that work will become work and I'll have to get my fix through hobby projects?
I think LLM's are really good for the "drudge work" when you're coding. I always say it's excellent for things where the actual task is easy but the bottleneck is how fast you can type.
As an example I had a project where I was previously extracting all strings in the UI into an object. For a number of reasons I wanted to move away from this but this codebase is well over 50k LOC and I have probably 5k lines of strings. Doing this manually would have been very tedious and would have taken me quite some time so I leveraged AI to help me and managed to refactor all the strings in my app in a little over an hour.
Last night I used it to look through some project in an open source code base in a language I’m not familiar with to get a report on how that project works. I wanted to know what are its capabilities and integrations with these other specialized tools, because the documentation is so limited. It saved me time and didn’t help me write code. Beyond that it’s good for asking really stupid questions about complex topics that you’d get roasted on for stack overflow.
Coding with LLMs brought me so much more joy coding. Not always, but it is getting better. Sometimes is quite frustrating, but when you have some good idea, explain it well and get the model to generate the code the way you would code or even better and you can use it to build new things faster, that's magical. Many devs are having this experience, some earlier, some now, some later. But for sure I would not say that using LLMs to code made it less enjoyable.
You don't have to go with the flow. I took a step back from AI tech because a lot of startups in that field come with extra cultural baggage that doesn't sit well with me.
Then you already use levers to build code.
LLMs are a new kind of tool. They’re weird and probabilistic and only sometimes useful. We don’t yet know quite how and when to use them. But they seem like one more lever for us to wield.
Do people really get that much value from these tools?
I use Github's Copilot for $10 and I'm somewhat happy for what I get... but paying 10x or 20x that just seems insane.
In the end, I was able to rescue the code part, rebuilding a 3 month long 10 person project in 2 weeks, with another 2 weeks to implement a follow-up series of requirements. The sheer amount of discussion and code creation would have been impossible without AI, and I used the full limits I was afforded.
So to answer your question, I got my money's worth in that specific use case. That said, the previous failing effort also unearthed a ton of unspoken assumptions that I was able to leverage. Without providing those assumptions to the AI, I couldn't have produced the app they wanted. Extracting that information was like extracting teeth so I'm not sure if we would have really had a better situation if we started off with everyone having an OpenAPI Pro account.
* Those who work in enterprise know intuitively what happened next.
The hardest part about enterprise backend development is understanding the requirements. "Understanding" is not about reading comprehension, and "requirements" are not the written requirements somebody gives you. It's about finding out what requirements are undocumented and which parts of the requirements document is misinformation. LLMs would just dutifully try to implement the written requirements with misinformation and missing edge cases, not the actual requirements.
If you cost 20K a month at a 5% average margin, the required ' break even' for a $200 cost increase is 20% not 1% increased productivity.
And it gets worse as you just assumed that increased 'productivity' 100% was converted back into extra margin, which is not obvious at all.
Whether it turns out to be cheaper depends on your usage.
I thought Claude Code was absurdly expensive and not at all more capable than something like chatgpt combined with copilot.
What worked for me was coming up with an extremely opinionated way to develop an application and then generating instructions (mini milestones) by combining it with the requirements.
These instructions end up being very explicit in the sequence of things it should do (write the tests first), how the code should be written and where to place it etc. So the output ended up being very similar regardless of the coding agent being used.
In the codebase I've tried modularity via monorepo, or faux microservices with local apis, monoliths filled with hooks and all the other centralized tricks in the book. Down to the very very simple. Whatever I could do to bring down the context window needed.
Eventually.....your return diminish. And any time you saved is gone.
And by the time you've burned up a context window and you're ready to get out. Now you're expeciting it to output a concise artifact to carry you to the next chat so you don't have to spend more context getting that thread up to speed.
Inevitably the context window and the LLMs eagerness to touch shit that it's not supposed (the likelihood of which increases with context) always gets in the way.
Anything with any kind of complexity ends up in a game of too much bloat or the LLM removing pieces that kill other pieces that it wasn't aware about.
/VENT
Using Gemini 2.5 for generating instructions
This is the guide I use
https://github.com/bluedevilx/ai-driven-development/blob/mai...
When coding with Claude I cherry pick code context, examples etc to provide for tasks so I'm curious to hear what other's workflows are like and what benefits you feel you get using Claude Code or the more expensive plans?
I also haven't run into limits for quite some time now.
Then one day I got nagged to upgrade or wait a few hours. I was pretty annoyed, I didn’t regard my usage as high and felt like a squeeze.
I cancelled my pro plan and now happily using Gemini which costs nothing. These AI companies are still finding their feet commercially!
…and you think this is going to last? :-)
Google will probably put 2.5 Pro behind a Google One account once it is out of preview, but I don't see a compelling reason they wouldn't keep Gemini incredibly price competitive with Claude or ChatGPT.
Also, the 'reputation grind' some of these systems set up where you have to climb 'usage Tiers' before being 'allowed' to use more? Just let me pay and use. I can't compare your system to my current provider before weeks of being throttled at unusable rates? This makes potentially switching to you for serious users way harder than it should be. Is that realy the outcome you want? And no, I am not willing to 'talk to sales' for running a quick feasibilty eval.
[1] https://www.youtube.com/live/khr-cIc7zjc?si=oI9Fj33JBeDlQEYG
It would be cheaper to your company to literally pay your salary while you do nothing.
It's flat if you graph your spend over multiple months :)
It still double downs on non working solutions
If so just get yourself an Israeli mobile virtual number (which can receive SMS)