A couple of times I hit the daily limits and decided to try Gemini CLI with the 2.5 pro model as a replacement. That's not even comparable to Claude Code. The frustration with Gemini is just not worth it.
I couldn't imagine paying >100$/month for a dev tool in the past, but I'm seriously considering upgrading to the Max plans.
For me the sweet spot is for boilerplate (give me a blueprint of a class based on a description), translate a JSON for me into a class, or into some other format. Also "what's wrong with this code? How would a Staff Level Engineer white it?" those questions are also useful. I've found bugs before hitting debug by asking what's wrong with the code I just pounded on my keyboard by hand.
I also have concerns about said junior developers wielding such tools, because yes, without being able to supply the right kind of context and being able to understand the difference between a good solution and a bad solution, they will produce tons of awful, but technically working code.
If this is the case then we better have full AI generated code within the next 10 years since those "juniors" will remain atrophied juniors forever and the old timers will be checking in with the big clock in the sky. IF we, as a field, believe that this can not possibly happen, then we are making a huge mistake leaning on a tool that requires "deep [orthogonal] experience" to operate properly.
The worst is getting, even smallish, PRs with a bunch of changes that look extraneous or otherwise off. After asking questions the code changes without the questions being answered and likely with a new set of problems. I swear I've been prompting an LLM through an engineer/PR middleman :(
A couple of week ago, I had a little down time and thought about a new algorithm I wanted to implement. In my head it seemed simple enough that 1) I thought the solution was already known, and 2) it would be fairly easy to write. So I asked Claude to "write me a python function that does Foo". I spent a whole morning going back and forth getting crap and nothing at all like what I wanted.
I don't know what inspired me, but I just started to pretend that I was talking to one one of my junior engineers. I first asked for a much simpler function that was on the way to what I wanted (well, technically, it was the mathematical inverse of what I wanted), then I asked it to modify it to add one different transform, and then another, and then another. And then finally, once the function was doing what I wanted, I asked it to write me the inverse function. And it got it right.
What was cool about it, is that it turned out to be more complex linear algebra and edge cases than I originally thought, and it would have been weeks for me to figure all of that out. But using it as a research tool and junior engineer in one was the key.
I think if we go down the "vibe coding" route, we will end up with hoards of juniors who don't understand anything and the stuff they produce with AI will be garbage and brittle. But using AI as a tool is starting to feel more compelling to me.
It still writes insane things all the time but I find it really helpful to spit out single use stuff and to brainstorm with. I try to get it to perform tasks I don't know how to accomplish (eg. computer vision experiments) and it never really works out in the end but I often learn something and I'm still very happy with my subscription.
"Review the top-most commit. Did I make any mistakes? Did I leave anything out of the commit message?"
Sometimes I let it write the message for me:
"Write a new commit message for the current commit."
I've had to tell it how to write commit messages though. It likes to offer subjective opinions, use superlatives and guess at why something was done. I've had to tell it to cut that out: "Summarize what has changes. Be concise but thorough. Avoid adjective and superlatives. Use imperative mood."
The interesting thing about all of this vibe coding skepticism, cynicism, and backlash is that many people have their expectations set extremely low. They’re convinced everything the tools produce will be junk or that the worst case examples people provide are representative of the average.
Then they finally go out and use the tools and realize that they exceed their (extremely low) expectations, and are amazed.
Yeah we all know Claude Code isn’t going to generate a $10 billion SaaS with a team of 10 people or whatever the social media engagement bait VCs are pushing this week. However, the tools are more powerful than a lot of people give them credit for.
I have recently found something that’s needed but very niche and the sort of problem that Claude can only give tips on how to go about it.
There are the social media types you mention and their polar opposites, the "LLMs have no possible use" crowd. These people are mostly delusional. At the grown-ups table, there is a spectrum of opinions about the relative usefulness.
It's not contradictory to believe that the average programmer right now has his head buried in the sand and should at least take time to explore what value LLMs can provide, while at the same time taking a more conservative approach when using them to do actual work.
I tried to use it for Python, Rust and Bash. I also tried to use it for crawling and organizing information. I also tried to use it as a debugging buddy. All of the attempts failed.
I simply don't understand how people are using it in a way that improves productivity. For me, all of this is so far a huge timesink with essentially nothing to show for it.
The single positive result was when I asked it to optimize a specific SQL query, and it managed to do it.
Anyway I will keep trying to use it, maybe something needs to click first and it just hasn't yet.
Not trying to argue, since I don’t have counter evidence, but how can you be so sure?
Or they have actually used all these tools, know how they work, and don't buy into hype and marketing.
No, Claude can create logs all across my codebase with much better formatting far faster than I can, so I can focus on actual problem solving. It's frustrating, but par for the course for this forum.
Edit: Dishonest isn't correct, I should have said I just disagree with their statements. I do apologize.
I know I’m being pedantic, but people mean very different things when they talk about this stuff, and I don’t think any credence should be given to vibe coding.
When I try the more complex things I will do multiple passes with AI, have 2-3 LLMs review it and delete deprecated code, refactor, interrogate it and ask it to fix bad patterns, etc. In an evening I can refactor a large code base this way. For example Gemini is meh compared to Claude Opus at new code, but somewhat decent for reviewing code that's already there, since the 1M context window allows it to tie things together Claude wouldn't be able to fit in 256k. I might then bounce a suggestion back from Gemini -> Claude -> Grok to fix something. It's kind of like managing a team of interns with different specialties and personalities.
Maybe you're thinking of slop coding?
Similarly amazed as an experienced dev with 20 YoE (and a fellow Slovak, although US based). The other tools, while helpful, were just not "there" and they were often simply more trouble than they were worth producing a lot of useless garbage. Claude Code is clearly on another level, yes it needs A LOT of handholding; my MO is do Plan Mode until I'm 100% sure it understands the reqs and the planned code changes are reasonable, then let it work, and finally code review what it did (after it auto-fixes things like compiler errors, unit test failures and linting issues). It's kind of like a junior engineer that is a little bit daft but very knowledgeable but works super, super fast and doesn't talk back :)
It is definitely the future, what can I say? This is a clear direction where software development is heading.
The first commit towards the fix was plausible, though still not fully correct, but in the end not only it wasn't able to fix it, each commit was also becoming more and more baroque. I cut it when it wrote almost 100 lines of code to compare version numbers (which already existed in the source). The problem with discussing the plan is that, while debugging, you don't yourself have a full idea of the plan.
I don't call it a total failure because I asked the AI to improve some error messages to help it debug, and I will keep that code. It's pretty good at writing new code, very good at reviewing it, but for me it was completely incapable of performing maintainance.
I wonder, are most devs with a job paying for it themselves, rather than the company they work for?
It is very useful for simpler tasks like writing tests, converting code bases etc where the hard part is already done.
When it comes to actually doing something hard - it is not very useful at least in my experience.
And if you do something even a bit niche - it is mostly useless and its faster do dig into topic on your own that try to have Claude implement it.
And, yet, when I asked it to correct a CMake error in a fully open source codebase (broken dependency declaration), it couldn't work it out. It even started hallucinating version numbers and dependencies that were so obviously broken that at least it was obvious to me that it wasn't helping.
This has been, and continues to be, my experience with AI coding. Every time I hit something that I really, really want the AI to do and get right (like correcting my build system errors), it fails and fails miserably.
It seems like everybody who sings the praises of AI coding all have one thing in common--Javascript. Make of that what you will.
As I've improved planning and context management, the results have been fairly consistent. As long as I can keep a task within the context window, it does a decent job almost every time. And occasionally I have to have it brute-force its way to green lint/typecheck/tests. That's been one of the biggest speed bumps.
I've found that gemini is great at the occasional detailed code-review to help find glaring issues or things that were missed, but having it implement anything has been severely lacking. I have to literally tell it not to do anything because it will gladly just start writing files on a whim. I generally use the opus model to write detailed plans, sonnet to implement, and then opus and gemini to review and plan refactors.
I'm impressed. The progress is SLOW. I'd have gotten to the stage I'm at in 1/3 to 1/2 the time, likely with fewer tests and significantly less process documentation. But the results are otherwise fairly great. And the learning process has kept me motivated to keep this old side-project moving.
I was switching between two accounts for a week while testing, but in the end upgraded to the $100/month plan and I think I've been rate-limited once since. I don't know if I'll be using this for every-day professional work, but I think it's a great tool for a few categories of work.
I have not tried it, for a variety of reasons, but my (quite limited, anecdotal, and gratis) experience with other such tools is, that I can get them to write something I could perhaps get as an answer on StackOverflow: Limited scope, limited length, address at most one significant issue; and perhaps that has to do with what they are trained on. But that once things get complicated, it's hopeless.
You said Claude Code was significantly better than some alternatives, so better than what I describe, but - we need to know _on what_.
Contrast this with the peg parser VM it basically one-shotted but needed a bunch of debug work. A fuzzy spec (basically just the lpeg paper) and a few iterations and it produced a fully tested VM. After that the AST -> Opcode compiler was super easy as it just had to do some simple (fully defined by this point) transforms and Bob's your uncle. Not the best code ever but a working and tested system.
Then my predilection for yak shaving took over as the AST needed to be rewritten to make integration as a python C extension module viable (and generated). And why have separate AST and opcode optimization passes when they can be integrated? Oh, and why even have opcodes in the first place when you can rewrite the VM to use Continuation Passing Style and make the entire machine AST-> CPS Transform -> Optimizer -> Execute with a minimum of fuss?
So, yeah, I think it's fair to say the daffy robots are a little more than a StackOverflow chatbot. Plus, what I'm really working on is a lot more complicated than this, needing to redo the AST was just the gateway drug.
I could see me paying for higher tiers given the productivity gains.
The only issue I can see is that we might end up with a society where those that can afford the best subscriptions have more free time, get more done, make more money and are more successful in general. Even current base level subscriptions are too expensive for huge percentage of the global population.
Sadly, my experience with the Max plan has been extremely poor. It’s not even comparable, I’ve been vastly experimenting with claude code in the last weeks, spending more than 80$ per day, it’s amazing. The problem is that in the Max plan you’re not the one managing the context length, and this ruins the model ability to keep things memory. Of course this is expected, the longer the context the more expensive to run, but it’s so frustrating to fail in a coding task because it’s so obvious the model lost a crucial part of the context.
Agreed that context and chunking are the key to making it productive. The times when I've tried to tell it (in a single prompt) everything I want it to do, were not successful. The code was garbage, and a lot of it just didn't do what I wanted it to do. And when there are a lot of things that need to be fixed, CC has trouble making targeted changes to fix issues one by one. Much better is to build each small chunk, and verify that it fully works, before moving on to the next.
You also have to call its bullshit: sometimes it will try to solve a problem in a way you know is wrong, so you have to stop it and tell it to do it in another way. I suppose I shouldn't call it "bullshit"; if we're going to use the analogy of CC being like an inexperienced junior engineer, then that's just the kind of thing that happens when you pair with a junior.
I still often do find that I give it a task, and when it's done, realize that I could have finished it much faster. But sometimes the task is tedious, and I'm fine with it taking a little longer if I don't have to do it myself. And sometimes it truly does take care of it faster than I would have been able to. In the case of tech that I'm learning myself (React, Tailwindcss, the former of which I dabbled with 10 years ago, but my knowledge is completely out of date), CC has been incredibly useful when I don't really know how to do something. I'm fine letting CC do it, and then I read the code and learn something myself, instead of having to pore over various tutorials of varying quality in order to figure it out on my own.
So I think I'm convinced, and I'll continue to make CC more and more a part of my workflow. I'm currently on the Pro plan, and have hit the usage limits a couple times. I'm still a little shy about upgrading to Max and spending $100/mo on a dev tool... not sure if I'll get over that or not.
Has anyone here used both Claude Code and Amp and can compare the two's effectiveness? I know one is CLI and the other an editor extensions. I'm looking for comparisons beyond that. Thanks!
I had a similar thought about the Turing Test…
It as science fiction for decades… then it passed silently in the night and we barely noticed.
The best results come from working iteratively with it. I reject about 1/3 of edits to request some changes or a change of direction.
If you just try to have it jam on code until the end result appears to work then you will be disappointed. But that’s operator error.
Police it, and give it explicit instructions.
Then after it's done its work prompt it with something like "You're the staff engineer or team lead on this project, and I want you to go over your own git diff like it's a contribution from a junior team member. Think critically and apply judgement based on the architecture of the project describes @HERE.md and @THERE.md."
...and if something is genuinely complex, it will (imo) generally do a bad job. It will produce something that looks like it works superficially, but as you examine it will either not work in a non-obvious way or be poorly designed.
Still very useful but to really improve your productivity you have to understand when not to use it.
For example, heard many say that doing big refactorings is causing problems. Found a way that is working for SwiftUI projects. I did a refactoring, moving files, restructuring large files into smaller components, and standardizing component setup of different views.
The pattern that works for me: 1) ask it to document the architecture and coding standards, 2) ask it to create a plan for refactoring, 3) ask it to do a low-risk refactoring first, 4) ask it to update the refacting plan, and then 5) go through all the remaining refactorings.
The refactoring plan comes with timeline estimates in days, but that is completely rubbish with claude code. Instead i asked it to estimate in 1) number of chat messages, 2) number of tokens, 3) cost based on number of tokens, 4) number of files impacted.
Another approach that works well is to first generate a throw away application. Then ask it to create documentation how to do it right, incorporate all the learning and where it got stuck. Finally, redo the application with these guidelines and rules.
Another tip, sometimes when it gets stuck, i open the project in windsurf, and ask another LLM (e.g., Gemini 2.5 pro, or qwen coder) to review the project and problem and then I will ask windsurf to provide me with a prompt to instruct claude code to fix it. Works well in some cases.
Also, biggest insight so far: don't expect it to be perfect first time. It needs a feedback loop: generate code, test the code, inspect the results and then improve the code.
Works well for SQL, especially if it can access real data: inspect the database, try some queries, try to understand the schema from your data and then work towards a SQL query that works. And then often as a final step it will simplify the working query.
I use an MCP tool with full access to a test database, so you can tell it to run explain plan and look at the statistics (pg_stat_statements). It will draw a mermaid diagram of your query, with performance numbers included (nr records retrieved, cache hit, etc), and will come back with optimized query and index suggestions.
Tried it also on csv and parquet files with duckdb, it will run the explain plan, compare both query, explain why parquet is better, will see that the query is doing predicate push down, etc.
Also when it gets things wrong, instead of inspecting the code, i ask it to create a design document with mermaid diagrams describing what it has built. Quite often that quickly shows some design mistake that you can ask it to fix.
Also with multiple tools on the same project, you have the problem of each using it's own way of keeping track of the plan. I asked claude code to come up with rules for itself and windsurf to collaborate on a project. It came back with a set of rules for CLAUDE.md and .windsurfrules on which files to have, and how to use them (PLAN.md, TODO.md, ARCHITECTURE.md, DECISION.md, COLLABORATION.md)
I’m not going to tell you it’s not useful, it is. But then shine wears off pretty fast and when it does, you’re basically left with a faster way to type. At least in my experience.
It’s amazing , but it’s dumb.
The benefit of Claude Code is that you can pay a fixed monthly fee and get a lot more than you would with API requests alone.
A lot of people are saying that cursor is much worse than Claude Code who have used both.
I agree with many things that the author is doing:
1. Monorepos can save time
2. Start with a good spec. Spend enough time on the spec. You can get AI to write most of the spec for you, if you provide a good outline.
3. Make sure you have tests from the beginning. This is the most important part. Tests (along with good specs) are how an AI agent can recurse into a good solution. TDD is back.
4. Types help (a lot!). Linters help as well. These are guard rails.
5. Put external documentation inside project docs, for example in docs/external-deps.
6. And finally, like every tool it takes time to figure out a technique that works best for you. It's arguably easier than it was (especially with Claude Code), but there's still stuff to learn. Everyone I know has a slightly different workflow - so it's a bit like coding.
I vibe coded quite a lot this week. Among them, Permiso [1] - a super simple GraphQL RBAC server. It's nowhere close to best tested and reviewed, but can be quite useful already if you want something simple (and can wait until it's reviewed.)
Curious how you outline the spec, concretely. A sister markdown document? How detailed is it? etc.
> 3. Make sure you have tests from the beginning. This is the most important part. Tests (along with good specs) are how an AI agent can recurse into a good solution. TDD is back.
Ironically i've been struggling with this. For best results i've found claude to do best with a test hook, but then claude loses the ability to write tests before code works to validate bugs/assumptions, it just starts auto fixing things and can get a bit wonky.
It helps immensely to ensure it doesn't forget anything or abandon anything, but it's equally harmful at certain design/prototype stages. I've taken to having a flag where i can enable/disable the test behavior lol.
Yes. I write the outline in markdown. And then get AI to flesh it out. The I generate a project structure, with stubbed API signatures. Then I keep refining until I've achieved a good level of detail - including full API signatures and database schemas.
> Ironically i've been struggling with this. For best results i've found claude to do best with a test hook, but then claude loses the ability to write tests before code works to validate bugs/assumptions, it just starts auto fixing things and can get a bit wonky.
I generate a somewhat basic prototype first. At which point I have a good spec, and a good project structure, API and db schemas. Then continuously refine the tests and code. Like I was saying, types and linting are also very helpful.
https://github.com/jerpint/context-llemur
CLI for the humans, MCP for the LLMS. Whatever is in the context repository should be used by the LLM for its next steps and you are both responsible for maintaining it as tasks start getting done and the project evolves
I’ve been having good success with it so far
(or -- you can write a spec that is still more fleshed out for humans, if you need to present this to managers. Then ask the LLM to write a separate spec document that is tailored for LLMs)
Yes they can save you some time, but at the cost of Claude's time and lots of tokens making tool calls attempting to find what it needs to find. Aider is much nicer, from the standpoint that you can add the files you need it to know about, and send it off to do its thing.
I still don't understand why Claude is more popular than Aider, which is by nearly every measure a better tool, and can use whatever LLM is more appropriate for the task at hand.
As a user, I don't want to sit there specifying about 15-30 files, then realize that I've missed some and that it ruins everything. I want to just point the tool at the codebase and tell it: "Go do X. Look at the current implementation and patterns, as well as the tests, alongside the docs. Update everything as needed along the way, here's how you run the tests..."
Indexing the whole codebase into Qdrant might also help a little.
Honestly, it's just this. "Claude the bar button on foo modal is broken with a failed splork". And CC hunts down foo.ts, traces that it's an API call to query.ts, pulls in the associated linked model, traces the api/slork.go and will as often as not end up with "I've found the issue!" and fix it. On a one sentence prompt. I think it's called an "Oh fuck" moment the first time you see this work. And it works remarkably reliably. [handwave caveats, stupid llms, etc]
Use /add-dir in Claude
I’ve been working on a Django project with good tests, types and documentation. CC mostly does great, even if it needs guidance from time to time
Recently also started a side project to try to run CC offline with local models. Got a decent first version running with the help of ChatGPT, then decided to switch to CC. CC has been constantly trying to avoid solving the most important issues, sidestepping errors and for almost everything just creating a new file/script with a different approach (instead of fixing or refactoring the current code)
For unit testing, I actually pre-write some tests so it can learn what structure I'm looking for. I go as far as to write mocks and test classes that *constrain* what it can do.
With constraints, it does a much better job than if it were just starting from scratch and improvising.
There's a numerical optimization analogy to this: if you just ask a solver to optimize a complicated nonlinear (nonconvex) function, you will likely get stuck or hit a local optimum. But if you carefully constrain its search space, and guide it, you increase your chances of getting to the optimum.
LLMs are essentially large function evaluators with a huge search space. The more you can herd it (like herding a flock into the right pen), the better it will converge.
Most projects have their documentation on their website. Do you spend time formatting it into a clean Markdown file?
It can, in fact, control your entire computer. If there's a CLI tool, Claude can run it. If there's not a CLI tool... ask Claude anyway, you might be surprised.
E.g. I've used Claude to crop and resize images, rip MP3s from YouTube videos, trim silence from audio files, the list goes on. It saves me incredible amounts of time.
I don't remember life before it. Never going back.
We have Linux instances running an IDE running in cloud vms that we can access through the browser at https://brilliant.mplode.dev. Personally I think this is closer to the ideal UX for operating an agent (our environment doesn't install agents by default yet, but you should be able to just install them manually). You don't have to do anything to set up terminal access or ssh except sign in and wait for your initial instance to start, and once you have any instance provisioned it automatically pauses and resumes based on whether your browser has it open. It's literally Claude + A personal Linux instance + an IDE that you can just open from a link
Pretty soon I should be able run as many of these at a time as I can afford, and control all of their permissions/filesystems/whatever with JWTs and containers. If it gets messed up or needs my attention I open it with the IDE as my UI and can just dive in and fix it. I don't need a regular Linux desktop environment or UI or anything. Just render things in panes of the IDE or launch a container serving a webapp doing what I want and open it instead of the IDE. Haven't ever felt this excited about tech progress
>> I thought I would see a pretty drastic change in terms of Pull Requests, Commits and Line of Code merged in the last 6 weeks. I don’t think that holds water though
The chart basically shows same output with claude than before. Which kinda represents what I felt when using LLMs.
You "feel" more productive and you definitely feel "better" because you don't do the work now, you babysit the model and feel productive.
But at the end of the day the output is the same because all advantages of LLMs is nerfed by time you have to review all that, fix it, re-prompt it etc.
And because you offload the "hard" part - and don't flex that thinking muscle - your skills decline pretty fast.
Try using Claude or another LLM for a month and then try doing a tiny little app without it. Its not only the code part that will seem hard - but the general architecture/structuring too.
And in the end the whole code base slowly (but not that slowly) degrades and in longer term results net negative. At least with current LLMs.
You don't have to try to remember your code as a conceptual whole, what your technical implementation of the next hour of code was going to be like at the same time as a stubborn bug is taunting you.
You just ask Mr smartybots and it deliver anything between proofreading and documentation and whatnot, with some minor fuckups occasionally
I handed it the reigns just out of morbid curiosity, and because I couldn't be bothered continuing for the night, but to my surprise (and with my guidance step by step) it did figure it all out. It found unused kernels, after uninstalling them didn't remove them, it deleted them with rm. It then helped resolve the broken package state and eventually I was back in a clean working state.
Importantly though, it did not know it hadn't actually cleaned up the boot partition initially. I had to insist that it had not in fact just freed up space, and that it would need to remove them.
https://github.com/pchalasani/claude-code-tools
For example this lets CC spawn another CC instance and give it a task (way better than the built-in spawn-and-let-go black box), or interact with CLI scripts that expect user input, or use debuggers like Pdb for token-efficient debugging and code-understanding, etc.
Automation is now trivially easy. I think of another new way to speed up my workflow — e.g. a shell script for some annoying repetitive task — and Claude oneshots it. Productivity gains built from productivity gains.
https://www.wheresyoured.at/the-haters-gui/
... while also exposing the contents of your computer to surveillence.
No Claude Code needed for that! Just hang around r/unixporn and you'll collect enough scripts and tips to realize that mainstream OS have pushed computers from a useful tool to a consumerism toy.
At least that's how I read the comment.
Honestly, that sounds like malware.
This is for a personal project, I haven’t written a ton of C# or done this amount of refactoring before, so this could be educational in multiple ways.
If I were to use Claude for this Id feel like I was robbing myself of something that could teach me a lot (and maybe motivate me to start out with structuring my code better in the future). If I don’t use Claude I feel like Im wasting my (very sparse) free time on a pretty uninspiring task that may very well be automated away in most future jobs, mostly out of some (misplaced? Masochistic?) belief about programming craft.
This sort of back and forth happens a lot in my head now with projects.
Example: Yesterday I was working with an Open API 3.0 schema. I know I could "fix" the schema to conform to a sample input, I just didn't feel like it because it's dull, I've done it before, and I'd learn nothing. So I asked Claude to do it, and it was fine. Then the "Example" section no longer matched the schema, so Claude wrote me a fitting example.
But the key here is I would have learned nothing by doing this.
There are, however, times where I WOULD have learned something. So whenever I find the LLM has shown me something new, I put that knowledge in my "knowledge bank". I use the Anki SRS flashcard app for that, but there are other ways, like adding to your "TIL blog" (which I also do), or taking that new thing and writing it out from scratch, without looking at the solution, a few times and compiling/running it. Then trying to come up with ways this knowledge can be used in different ways; changing the requirements and writing that.
Basically getting my brain to interact with this new thing in at least 2 ways so it can synthesize with other things in your brain. This is important.
Learning a new (spoken) language uses this a lot. Learn a new word? Put it in 3 different sentences. Learn a new phrase? Create at least 2-3 new phrases based on that.
I'm hoping this will keep my grey matter exercised enough to keep going.
Errors compound with LLM coding, and, unless you correct them, you end up with a codebase too brittle to actually be worth anything.
Friends of mine apparently don't have that problem, and they say they have the LLM write enough tests that they catch the brittleness early on, but I haven't tried that approach. Unfortunately, my code tends to not be very algorithmic, so it's hard to test.
My two cents are:
If your goal is learning fully, I would prioritize the slow & patient route (no matter how fast “things” are moving.)
If your goal is to learn quickly, Claude Code and other AI tooling can be helpful in that regard. I have found using “ask” modes more than “agent” modes (where available) can go a long way with that. I like to generate analogies, scenarios, and mnemonic devices to help grasp new concepts.
If you’re just interested in getting stuff done, get good at writing specs and letting the agents run with it, ensuring to add many tests along the way, of course.
I perceive there’s at least some value in all approaches, as long as we are building stuff.
Boring, uninspiring, commodity - and most of all - easily reversible and not critical - to the LLM it goes!
When learning things intrinsic motivation makes one unreasonably effective. So if there is a field you like - just focus on it. This will let you proceed much faster at _valuable_ things which all in all is the best use of ones time in any case.
Software crafting when you are not at a job should be fun. If it’s not fun, just do the least effort that suits your purpose. And be super diligent only about the parts _you_ care about.
IMHO people who think everyone should do everything from first principles with the diligence of a swiss clocksmith are just being difficult. It’s _one_ way of doing it but it’s not the _only right way_.
Care about important things. If a thing is not important and not interesting just deal with it the least painfull way and focus on something value adding.
Really if your goal is to learn something, then no matter what you do there has to be some kind of struggle. I’ve noticed whenever something feels easy, I’m usually not really learning much.
That being said, you have to protect your time as a developer. There are a million things to learn, and if making games is your goal as a junior, porting GDscript code doesn't sound like an amazing use of your time. Even though you will definitely learn from it.
My problem with it is that it produces _good looking_ code that, at a glance, looks 'correct', and occasionally even works. But then i look at it closely, and it's actually bad code, or has written unnecessary additional code that isn't doing anything, or has broken some other section of the app, etc.
So if you don't know enough C# to tell whether the C# it's spitting out is good or not, you're going to have a bad time
Things are moving fast at the moment, but I think it feels even faster because of how slowly things have been moving for the last decade. I was getting into web development in the mid-to-late-90s, and I think the landscape felt similar then. Plugged-in people kinda knew the web was going to be huge, but on some level we also know that things were going to change fast. Whatever we learnt would soon fall by the wayside and become compost for the next new thing we had to learn.
It certainly feels to me like things have really been much more stable for the last 10-15 years (YMMV).
So I guess what I'm saying is: yeah, this is actually kinda getting back to normal. At least that is how I see it, if I'm in an excitable optimistic mood.
I'd say pick something and do it. It may become brain-compost, but I think a good deep layer of compost is what will turn you into a senior developer. Hopefully that metaphor isn't too stretched!
This has largely been true outside of some outlier fundamentals, like TCP.
I have tried Claude code extensively and I feel it’s largely the same. To GP’s point, my suggestion would be to dive into the project using Claude Code and also work to learn how to structure the code better. Do both. Don’t do nothing.
Often I'll use Claude Code to write something that I know how to write, but don't feel like writing, either because it's tedious, or because it's a little bit fiddly (which I know from past experience), and I don't feel like dealing with the details until CC gives me some output that I can test and review and modify.
But sometimes, I'll admit, I just don't really care to learn that deeply. I started a project that is using Rust for the backend, but I need a frontend too. I did some React around 10 years ago, but my knowledge there (what I remember, anyway) is out of date. So sometimes I'll just ask Claude to build an entire section of a page. I'll have Claude do it incrementally, and read the code after each step so I understand what's going on. And sometimes I do tell Claude I'm not happy with the approach, and to do something differently. But in a way I kinda do not care so much about this code, aside from it being functional and maintainable-looking.
And I think that's fine! We don't have to learn everything, even if it's something we need to accomplish whatever it is we've set out to accomplish. I think the problem that you'll run into is that you might be too junior to recognize what are the things you really need to learn, and what are the things you can let something else "learn" for you.
One of the things I really worry about this current time we're in is that companies will start firing their junior engineers, with a belief (however misguided) that their senior engineers, armed with coding assistants, can be just as productive. So junior engineers will lose their normal path to gaining experience, and young adults entering college will shy away from programming, since it's hard to get a job as a junior engineer. Then when those senior engineers start to retire, there will be no one to take their places. Of course, the current crop of company management won't care; they'll have made their millions already and be retired. So... push to get as much experience as you can, and get over the hump into senior engineer territory.
But I do think it could help, for example by showing you a better pattern or language or library feature after you get stuck or finish a first draft. That's not cheating that's asking a friend.
The best way to skill up over the course of one's career is to expose yourself to as broad an array of languages, techniques, paradigms, concepts, etc. So sure, you may never touch C# again. But by spending time to dig in a bit you'll pick up some new ideas that you can bring forward with you to other things you *do* care about later.
You're on the right track in noticing you'll be missing valuable lessons, and this might rob you of better outcomes even with AI in the future. As it is a side project though keeping motivation is important too.
As well, you'll eventually learn those lessons through future work if you keep coding yourself. But if instead you lean more toward assistance it is hard to say if you would become as skilled in the raw skill of coding, and that might affect yoir abilityto wield AI to full effecf.
Having done a lot of work across many languages, including gdscript and C# for various games, I do think you'll learn a huge amount from doing the work yourself and such an opportunity is a bit more rare to come by in paid work.
Then i'd ask it to create a plan to recreate it in c#.
Next i'd ask claude code to generate a new project in c#, following the small steps it defined in the planning document.
Then i'd ask claude code to review its experience building the app and update the original plan document with these insights.
Then throw away the first c# project, and have another go at it. Make sure the plan includes starting with tests.
One day I was fighting Claude on some core Ruby method and it was not agreeing with me about it, so I went to check the actual docs. It was right. I have been using Ruby since 2009.
Programming takes experience to acquire taste for what’s right, what’s not, and what smells bad and will bite you but you can temporarily (yeah) not care. If you let the tool do everything for you you won’t ever acquire that skill, and it’s critical to judge and review your work and work of others, including LLM slop.
I agree it’s hard and I feel lucky for never having to make the LLM vs manual labor choice. Nowadays it’s yet another step in learning the craft, but the timing is wrong for juniors - you are now expected to do senior level work (code reviews) from day 1. Tough!
What bottlenecks are you experiencing?
I'm a developer experienced with Python (GDScript-like) and C#, but am new to Godot and started with GDScript.
1. Immediately change to sonnet (the cli defaults to opus for max users). I tested coding with opus extensively and it never matches the quality of sonnet.
2. Compacting often ends progress - it's difficult to get back to the same quality of code after compacting.
3. First prompt is very important and sets the vibe. If your instance of Claude seems hesitant, doubtful, sometimes even rude, it's always better to end the session and start again.
4. There are phrases that make it more effective. Try, "I'm so sorry if this is a bad suggestion, but I want to implement x and y." For whatever reason it makes Claude more eager to help.
5. Monolithic with docker orchestration: I essentially 10x'd when I started letting Claude itself manage docker containers, check their logs for errors, rm them, rebuild them, etc. Now I can get an entirely new service online in a docker container, from zero to operational, in one Claude prompt.
6. start in plan mode and iterate on the plan until you're happy
7. use slash commands, they are mini prompts you can keep refining over time, including providing starting context and reminding it that it can use tools like gh to interact with Github
not sure I agree on 1.
2. compact when you are at a good stop, not when you are forced to because you are at 0%
This is very interesting. What's your setup, and what kind of prompt might you use to get Claude to work well with Docker? Do you do anything to try and isolate the Claude instance from the rest of your machine (i.e. run these Docker instances inside of a VM) or just YOLO?
It once did a container exec that piped the target file into the projects cli command runner, which did nothing, but gives you an example of the string of wacky ways it will insist on running code instead of just reading it.
I turn off compacting to be manual, makes it easy to find a stopping point and write all context out to an md file before compacting.
First prompt isn't very important to me.
I haven't found i need special phrases. What matters is how context heavy I can make my subagents.
I’m working my way through building a guide to my future self for packaging up existing products in case I forget in 6 months.
At the same time frontier models may improve it, make it worse, or it stays the same, and what I’m after is consistency.
1. It takes away the pain of starting. I have no barrier to writing text but there is a barrier to writing the first line of code, to a large extend coming form just remembering the context, where to import what from, setting up boilerplate etc.
2. While it works I can use my brain capacity to think about what I'm doing.
3. I can now do multiple things in parallel.
4. It makes it so much easier to "go the extra mile" (I don't add "TODOs" anymore in the code I just spin up a new Claude for it)
5. I can do much more analysis, (like spinnig up detailed plotting / analysis scripts)
6. It fixes most simple linting/typing/simple test bugs for me automatically.
Overall I feel like this kind of coding allows me to focus about the essence: What should I be doing? Is the output correct? What can we do to make it better?
Now literally between prompts, I had a silly idea to write a NYT Connections game in the terminal and three prompts later it was done: https://github.com/jleclanche/connections-tui
This especially. I've never worked at a place that didn't skimp on tests or tech debt due to limited resources. Now you can get a decent test suite just from saying you want it.
Will it satisfy purists? No, but lots of mid hanging fruit long left unpicked can now be automatically picked.
For example, last week i decided to play with nushell, i have a somewhat simple .zshrc so i just gave it to claude and asked it to convert it to nushell. The nu it generated for the most part was not even valid, i spent 30 mins with it, it never worked. took me about 10 minutes in the docs to convert it.
So it's miserable experiences like that that make me want to never touch it, because I might get burned again. There are certainly things that I have found value in, but its so hit or miss that i just find my self not wanting to bother.
For extremely simple stuff, it can be useful. I'll have it parse a command's output into JSON or CSV when I'm too lazy to do it myself, or scaffold an empty new project (but like, how often am i doing that?). I've also found it good at porting simple code from like python to JavaScript or typescript to go.
But the negative experiences really far outweigh the good, for me.
> I believe with Claude Code, we are at the
> “introduction of photography” period of
> programming. Painting by hand just doesn’t
> have the same appeal anymore when a single
> concept can just appear and you shape it
> into the thing you want with your code review
> and editing skills.
The comparison seems apt and yet, still people paint, still people pay for paintings, still people paint for fun.I like coding by hand. I dislike reviewing code (although I do it, of course). Given the choice, I'll opt for the former (and perhaps that's why I'm still an IC).
When people talk about coding agents as very enthusiastic but very junior engineering interns, it fills me with dread rather than joy.
But in what environment? It seems to me that most of the crafts that have been replaced by the assembly line are practiced not so much for the product itself, but for an experience both the creator and the consumer can participate in, at least in their imagination.
You don't just order such artifacts on Amazon anonymously; you establish some sort of relationship with the artisan and his creative process. You become part of a narrative. Coding is going to need something similar if it wants to live in that niche.
Also, if we could get AI tooling to do the reviews for us reliably, I'd be a much happier developer.
So yeah if you like coding as an art form, you can still keep doing that. It's probably just a bit harder to make lots of money with it. But most people code to make a product (which in itself could be a form of art). And yeah if it's faster to reach your goals of making a product with the help of AI, then the choice is simple of course.
But yeah in a way I'm also sad that the code monkey will disappear, and we all become more like the lead developer who doesn't really program anymore but only guides the project, reviews code and makes technical decisions. I liked being the code monkey, not having to deal a lot with all the business stuff. But yeah, things change you know.
The painting/photography metaphor stretches way too far imo - photography was fundamentally a new output format, a new medium, an entirely new process. Agentic coding isn't that.
Especially with very precise language. I've heard of people using speech to text to use it which opens up all sorts of accessibility windows.
I’ve experimented with several dictation apps, including super whisper, etc., and I’ve settled on Wispr Flow. I’m very picky about having good keyboard shortcuts for hands-free dictation mode (meaning having a good keyboard shortcut to toggle recording on and off), and of course, accuracy and speed. Wispr Flow seems to fit all my needs for now but I’d love to switch to a local-only app and ditch the $15/mo sub :)
Letting Claude rest was a great point in the article, too. I easily get manifold value compared to what I pay, so I haven't got it grinding on its own on a bunch of things in parallel and offline. I think it could quickly be an accelerator for burnout and cruft if you aren't careful, so I keep to a supervised-by-human mode.
Wrote up some more thoughts a few weeks ago at https://www.modulecollective.com/posts/agent-assisted-coding....
I remember when JetBrains made programming so much easier with their refactoring tools in IntelliJ IDEA. To me (with very limited AI experience) this seems to be a similar step, but bigger.
Not saying this is more useful per se, just saying that different approaches have their pros and cons.
build me a page and the crud endpoints to manage it. -> done in 2 minutes.
1. I am vain and having people link to my stuff fills the void in my broken soul
2. He REALLY put in the legwork to document in a concrete way what it looks like for these tools to enable someone to move up a level of abstraction. The iron triangle has always been Quality, Scope, Time. This innovation is such an accelerant that that ambitious programmers can now imagine game-changing increases in scope without sacrificing quality and in the same amount of time.
For this particular moment we're in, I think this post will serve as a great artifact of what it felt like.
I'm a heavy user of Claude Code and I use it like a coding assistant.
How well you can manage a development team in real life has strong correlations on how much value you get out of an LLM based coding assistant.
If you can't describe what success looks like, expect people to read your mind, and get angry at validating questions, then you will have problems both with coding assistants and leading teams of developers.
If you're just letting the coding assistant do its thing, uncritically, and committing whatever results, then you're vibe coding.
It sounds like you're not vibe coding. That's good. No need to throw away a useful term (even if it's a weird, gen-Z sounding term) that describes a particular (poor) way to use a LLM.
As in the former is hyped, but the latter - stopping to ask questions, reflect, what should we do - is really powerful. I find I'm more thoughtful, doing deeper research, and asking deeper questions than if I was just hacking something together on the weekend that I regretted later.
I use this with legacy code too. “Lines n—n+10 smell wrong to me, but I don’t know why and I don’t know what to do to fix it.” Gemini has done well for me at guessing what my gut was upset about and coming up with the solution. And then it just presses all the buttons. Job done.
What it excels at is translation. This is what LLMs were originally designed for after all.
It could be between programming languages, like "translate this helm chart into a controller in Go". It will happily spit out all the structs and basic reconciliation logic. Gets some wrong but even after correcting those bits still saves so much time.
And of course writing precise specs in English, it will translate them to code. Whether this really saves time I'm not so convinced. I still have to type those specs in English, but now what I'm typing is lost and what I get is not my own words.
Of course it's good at generating boilerplate, but I never wrote much boilerplate by hand anyway.
I've found it's quite over eager to generate swathes of code when you wanted to go step by step and write tests for each new bit. It doesn't really "get" test-driven development and just wants to write untested code.
Overall I think it's without doubt amazing. But then so is a clown at a children's birthday party. Have you seen those balloon animals?! I think it's useful to remain sceptical and not be amazed by something just because you can't do it. Amazing doesn't mean useful.
I worry a lot about what's happening in our industry. Already developers get away with incredibly shoddy practices. In other industries such practices would get you struck off, licences stripped, or even sent to prison. Now we have to contend with juniors and people who don't even understand programming generating software that runs.
I can really see LLMs becoming outlawed in software development for software that matters, like medical equipment or anything that puts the public in danger. But maybe I'm being overly optimistic. I think generally people understand the dangers of an electrician mislabelling a fusebox or something, but don't understand the dangers of shoddy software.
> people understand the dangers of an electrician mislabelling a fusebox or something, but don't understand the dangers of shoddy software
i mean, most software is covered by those big all-caps NO WARRANTY ASSUMED OR IMPLIED / USE AT YOUR OWN RISKif there was legal recourse for most software issues you can bet the current frenzy around ai-agentic coding would be much more carefully done
i think like many things, laws wont appear until a major disaster happens (or you get a president on your side *)
* https://www.onlinesafetytrainer.com/president-theodore-roose...
They know a lot about a lot of things but the details get all jumbled up in their stupid robot brains so you have to help them out a bunch.
I don’t understand how people have the patience to do an entire application just vibe coding the whole time. As the article suggests, it doesn’t even save that much time.
If it can’t be done in one shot with simple context I don’t want it.
Today I (not a programmer, although programming for 20+ years, but mostly statistics) started building with Claude Code via Pro. Burned through my credits in about 3 hours. Got to MVP (happy tear in my eye). Actually one of the best looks I've ever gotten from my son. A look like, wow, dad, that's more than I'd ever think you could manage.
Tips:
- Plan ahead! I've had Claude tell me that a request would fit better way back on the roadmap. My roadmap manages me.
- Force Claude to build a test suite and give debugging info everywhere (backend, frontend).
- Claude and me work together on a clear TODO. He needs guidance as well as I do. It forgot a very central feature of my MVP. Do not yet know why. Asked kindly and it was built.
Questions (not specifically to you kind HN-folks, although tips are welcome):
- Why did I burn through my credits in 3 hours?
- How can I force Claude to keep committed to my plans, my CLAUDE.md, etc.
- Is there a way to ask Claude to check the entire project for consistency? And/Or should I accept that vibing will leave crusts spread around?
You can just ask claude to review your code, write down standard, verify that code is produced according to standards and guidelines. And if it finds that project is not consistent, ask it to make a plan and execute on the plan.
Ask, ask, ask.
Recently they had to lower token allowances because they're haemorrhaging money.
You can run "ccusage" in the background to keep tabs, so you're leas surprised, is all I can say.
Enjoy the cheap inference while you can, unless someone cracks the efficiency puzzle the frontier models might get a lot more expensive at one point.
I mostly spend my days administering SaaS tools, and one of my largest frustrations has always been that I didn’t know enough to really build a good plugin or add-on for whatever tool I was struggling with, and I’d find a limited set of good documentation or open source examples to help me out. With my limited time (full time job) and attendant challenges (ADHD & autism + all the fun trauma that comes from that along with being Black, fat and queer), I struggled to ever start anything out of fear of failure or I’d begin a course and get bored because I wasn’t doing anything that captured my imagination & motivation.
Tools like Claude Code, Cursor, and even the Claude app have absolutely changed the game for me. I’m learning more than ever, because even the shitty code that these tools can write is an opportunity for debugging and exploration, but I have something tangible to iterate on. Additionally, I’ve found that Claude is really good at giving me lessons and learning based on an idea I have, and then I have targeted learning I can go do using source docs and tutorials that are immediately relevant to what I’m doing instead of being faced with choice paralysis. Being able to build broken stuff in seconds that I want to get working (a present problem is so much more satisfying than a future one) and having a tool that knows more than I do about code most of the time but never gets bored of my silly questions or weird metaphors has been so helpful in helping me build my own tools. Now I think about building my own stuff first before I think about buying something!
Being able to override some hard coded vendor crap has been useful.
We have tonnes of code that's been built over a decade with all kinds of idioms and stylistic conventions that are enforced primarily through manual review. This relates in part to working in a regulated environment where we know certain types of things need radical transparency and auditability, so writing code the "normal" way a developer would is problematic.
So I am curious how well it can see the existing code style and then implicitly emulate that? My current testing of other tools seems to suggest they don't handle it very well; typically I am getting code that looks very foreign to the existing code. It exhibits the true "regression to the mean" spirit of LLMs where it's providing me with "how would the average competent engineer write this", which is not at all how we need the code written.
Currently, this is the main barrier to us using these tools in our codebase.
I work on Chromium and my experience improved immensely by using a detailed context document (~3,000 words) with all sorts of relevant information, from the software architecture and folder organisation to the C++ coding style.
(The first draft of that document was created by Claude itself from the project documentation.)
I created some tutorial files which contain ways to do a lot of standard things. Turns out humans found these useful too. With the examples, I've found Opus generally does a good job following existing idioms, while Sonnet struggles.
Nice collapsible HTML logs of agent conversations (inspired by Mario Zechner’s Claude-trace), which took a couple hours of iterations, involving HTML/js/CSS:
https://langroid.github.io/langroid/notes/html-logger/
A migration from Pydantic-v1 to v2, which took around 7 hours of iterations (would have taken a week at least if I even tried it manually and still probably wouldn’t have been as bullet-proof):
By loop I mean you tell it no don’t implement this service, look at this file instead and mimic that and instead it does what it did before.
Among other things I work on database optimizers and there Claude fails spectacularly. It produces wrong code, fails to find the right places where to hook up an abstraction, overlooks affects on other parts of the code, and generally confidently proposes changes that simply do not work at all (to put it mildly).
Your mileage may vary... It seems to be depend heavily on the amount of existing (open) code around.
If your test structure is a pain to interact with, that usually means some bad decisions somewhere in the architecture of your project.
It feels like ChatGPT on cocaine, I mean, I asked for a small change and it came with 5 solutions changing all my codebase.
I seem to remember the 'oh no I suck' one comes out of Microsoft's programmer world? It seems like that must be a tough environment for coders if such feelings run so close to the surface that the LLMs default to it.
YMMV, though, maybe it's the way I was prompting it. Try using Plan Mode and having it only make small changes.
Cursor is a nice balance for me still. I am automating a lot of the writing but it’s still bite size pieces that feel easier to review.
Despite this, I think agents are a very welcome new weapon.
In the meanwhile one the most anticipated game in the industry, a second chapter of an already acclaimed product, has its art totally hand painted
When I hear this, I think of this recently-released study, which showed that LLMs both make coders less productive and convinced that they're more productive:
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
Until there's more research published that shows the opposite effect, I don't think we can really take peoples' squishy feelings like this as faithful to reality.
The problem-space I was exploring was libusb and Python, and I used ChatGPT and also Claude.ai to help debug some issues and flesh out some skeleton code. Claude's output was almost universally wrong. ChatGPT got a few things wrong, but was in general a lot closer to the truth.
AI might be coming for our jobs eventually, but it won't be Claude.ai.
The reason that claude code is “good” is because it can run tests, compile the code, run a linter, etc. If you actually pay attention to what it’s doing, at least in my experience, it constantly fucks up, but can sort of correct itself by taking feedback from outside tools. Eventually it proclaims “Perfect!” (which annoys me to no end), and spits out code that at least looks like it satisfies what you asked for. Then if you just ignore the tests that mock all the useful behaviors out, the amateur hour mistakes in data access patterns, and the security vulnerabilities, it’s amazing!
I have this sense this works best in small teams right now, because Claude wants to produce code changes and PRs. Puzzmo, where OP works, is <5 engineers.
In larger codebases, PRs don't feel like the right medium in every case for provocative AI explorations. If you're going to kick something off before a meeting and see what it might look like to solve it, it might be better to get back a plan, or a pile of regexps, or a list of teams that will care.
Having an AI produce a detailed plan for larger efforts, based on an idea, seems amazing.
At 100 dev shop size you're likely to have plenty of junior and middling devs, for whom tools like CC will act as a net negative in the short-mid term (mostly by slowing down your top devs who have to shovel the shit that CC pushes out at pace and that junior/mids can't or don't catch). Your top devs (likely somewhere around 1/5 of your workforce) will deliver 80% of the benefit of something like CC.
We're not hiring junior or even early-mid devs since around Mar/Apr. These days they cost $200/mo + $X in API spend. There's a shift in the mind-work of how "dev" is being approached. It's.. alarming, but it's happening.
When more ideas make it further you get more shots on goal. You are almost certain to have more hits!
Has anyone who’s gone decent at Clauding had matching success with other tools?
It’s like using ChatGPT in high school: it can be a phenomenal tutor, or it can do everything for you and leave you worse off.
The general lesson from this is that Results ARE NOT everything.
I get it. No one is really making any money yet, including openAI.
As the VC money dries up this is only going to get worse. Like ads in responses worse.
this_variable_name_is_sponsored_by_coinbase bad. Which these vibe chuckleheads will claim is no big deal because only losers read code.
Well - compared to a real developer (even junior one), it's peanuts.
We have to be careful not to anthropomorphize them but LLMs absolutely respond to nuanced word choice and definition of behavior that align with psychology (humanities). How to judge that in an interview? Maybe a “Write instructions for a robot to make a peanut butter and jelly sandwich” exercise. Make them type it. Prospects who did robotics club have an edge?
Can they touch type? I’ve seen experienced devs that chicken peck its painful. What happens when they have to write a stream of prompts, abort, and rephrase rapidly? Schools aren’t mandating typing and I see an increase (in my own home! I tried…) of feral child invented systems like caps lock on/off instead of shift with weird cross keyboard overhand reaches.
senior developers already know how to use AI tools effectively, and are often just as fast as AI, so they only get the benefits out of scaffolding.
really everything comes down to planning, and your success isn't going to come down to people using AI tools, it will come down to the people guiding the process, namely project managers, designers, and the architects and senior developers that will help realize the vision.
juniors that can push tasks to completion can only be valuable if they have proper guidance, otherwise you'll just be making spaghetti.
- Ability to clearly define requirements up front (the equivalent mistake in coding interviews is to start by coding, rather than asking questions and understanding the problem + solution 100% before writing a single line of code). This might be the majority of the interview.
- Ability to anticipate where the LLM will make mistakes. See if they use perplexity/context7 for example. Relying solely on the LLM's training data is a mistake.
- A familiarity with how to parallelize work and when that's useful vs not. Do they understand how to use something like worktrees, multiple repos, or docker to split up the work?
- Uses tests (including end-to-end and visual testing)
- Can they actually deliver a working feature/product within a reasonable amount of time?
- Is the final result looking like AI slop, or is it actually performant, maintainable (by both humans and new context windows), well-designed, and follows best practices?
- Are they able to work effectively within a large codebase? (this depends on what stage you're in; if you're a larger company, this is important, but if you're a startup, you probably want the 0->1 type of interview)
- What sort of tools are they using? I'd give more weight if someone was using Claude Code, because that's just the best tool for the job. And if they're just doing the trendy thing like using Claude Agents, I'd subtract points.
- How efficient did they use the AI? Did they just churn through tokens? Did they use the right model given the task complexity?
I also appreciate the common best practice to write a requirements document in Markdown before letting the agent start. AWS' kiro.dev is really nice in separating the planning stage from the execution stage but you can use almost any "chatbot" even ChatGPT for that stage. If you suffer from ADHD or lose focus easily, this is key. Even if you decide to finish some steps manually.
It doesn't really matter if you use Claude Code (with Claude LLM), Rovo Dev CLI, Kiro, Opencode, gemini-cli, whatever. Pick the ones that offer daily free tokens and try it out. And no, they will almost never complete without any error. But just copy+paste the error to the prompt or ask some nasty questions ("Did you really implement deduplication and caching?") and usually the agent magically sees the issues and starts to fix it.
They don't write like the kind of person you can dismiss out of hand and there's no obvious red flags.
Other than "I don't like AI" - what is so insufferable here?
AI, but refactor
They would be like "but a robot will never ever clean a house as well as I would", well, no shit, but they can still do the overwhelming majority of the work very well (or at least as good as you instruct them to) and leave you with details and orchestration.
I use autocomplete and chat with LLMs as a substitute for stack overflow. They’re great for that. Beyond that, myself and colleagues have found AI agents are not yet ready for serious work. I want them to be, I really do. But more than that I want our software to be reliable, our code to be robust and understandable, and I don’t want to worry about whether we are painting ourselves into a corner.
We build serious software infrastructure that supports other companies’ software and our biggest headache is supporting code that we built earlier this year using AI. It’s poorly written, full of bugs including problems very hard to spot from the code, and is just incomprehensible for the most part.
Other companies I know are vibe coding, and making great progress, but their software is CRUD SaaS and worst case they could start over. We do not have that luxury.
It's not helpful or justified within those conversations.
The bad experience was asking it to produce a relatively non-trivial feature in an existing Python module.
I have a bunch of classes for writing PDF files. Each class corresponds to a page template in a document (TitlePage, StatisticsPage, etc). Under the hood these classes use functions like `draw_title(x, y, title)` or `draw_table(x, y, data)`. One of these tables needed to be split across multiple pages if the number of rows exceeded the page space. So I needed Claude Code to do some sort of recursive top-level driver that would add new pages to a document until it exhausted the input data.
I spent about an hour coaching Claude through the feature, and in the end it produced something that looked superficially correct, but didn't compile. After spending some time debugging, I moved on and wrote the thing by hand. This feature was not trivial even for me to implement, and it took about 2 days. It broke the existing pattern in the module. The module was designed with the idea that `one data container = one page`, so splitting data across multiple pages was a new pattern the rest of the module needed to be adapted to. I think that's why Claud did not do well.
+++
The obviously good experience with Claude was getting it to add new tests to a well-structured suite of integration tests. Adding tests to this module is a boring chore, because most of the effort goes into setting up the input data. The pattern in the test suite is something like this: IntegrationTestParent class that contains all the test logic, and a bunch of IntegrationTestA/B/C/D that do data set up, and then call the parent's test method.
Claude knocked this one out of the park. There was a clear pattern to follow, and it produced code that was perfect. It saved me 1 or 2 hours, but the cool part was that it was doing this in its own terminal window, while I worked on something else. This is a type of simple task I'd give to new engineers to expose them to existing patterns.
+++
The last experience was asking it to write a small CLI tool from scratch in a language I don't know. The tool worked like this: you point it at a directory, and it then checks that there are 5 or 6 files in that directory, and that the files are named a certain way, and are formatted a certain way. If the files are missing or not formatted correctly, throw an error.
The tool was for another team to use, so they could check these files, before they tried forwarding these files to me. So I needed an executable binary that I could throw up onto Dropbox or something, that the other team could just download and use. I primarily code in Python/JavaScript, and making a shareable tool like that with an interpreted language is a pain.
So I had Claude whip something up in Golang. It took about 2 hours, and the tool worked as advertised. Claude was very helpful.
On the one hand, this was a clear win for Claude. On the other hand, I didn't learn anything. I want to learn Go, and I can't say that I learned any Go from the experience. Next time I have to code a tool like that, I think I'll just write it from scratch myself, so I learn something.
+++
Eh. I've been using "AI" tools since they came out. I was the first at my company to get the pre-LLM Copilot autocomplete, and when ChatGPT became available I became a heavy user overnight. I have tried out Cursor (hate the VSCode nature of it), and I tried out the re-branded Copilot. Now I have tried Claude Code.
I am not an "AI" skeptic, but I still don't get the foaming hype. I feel like these tools at best make me 1.5X -- which is a lot, so I will always stay on top of new tooling -- but I don't feel like I am about to be replaced.
LLMs are great at translation. Turn this English into code, essentially. But ask it to solve a novel problem like that without a description of the solution, how will it approach it? If there’s an example in its training set maybe it can recall it. Otherwise it has no capability to derive a solution.