A staff engineer's journey with Claude Code (opens in new tab)

(sanity.io)

550 pointskmelve8mo ago396 comments

396 comments

spicyusername8mo ago

I guess we're just going to be in the age of this conversation topic until everyone gets tired of talking about it.

Every one of these discussions boils down to the following:

- LLMs are not good at writing code on their own unless it's extremely simple or boilerplate

- LLMs can be good at helping you debug existing code

- LLMs can be good at brainstorming solutions to new problems

- The code that is written by LLMs always needs to be heavily monitored for correctness, style, and design, and then typically edited down, often to at least half its original size

- LLMs utility is high enough that it is now going to be a standard tool in the toolbox of every software engineer, but it is definitely not replacing anyone at current capability.

- New software engineers are going to suffer the most because they know how to edit the responses the least, but this was true when they wrote their own code with stack overflow.

- At senior level, sometimes using LLMs is going to save you a ton of time and sometimes it's going to waste your time. Net-net, it's probably positive, but there are definitely some horrible days where you spend too long going back and forth, when you should have just tried to solve the problem yourself.

rafaelmn8mo ago

> but this was true when they wrote their own code with stack overflow.

Searching for solutions and integrating examples found requires effort that develops into a skill. You would rarely get solutions that would just fit into the codebase from SO. If I give a task to you and you produce a correct solution on the initial review I now know I can trust you to deal with this kind of problem in the future. Especially after a few reviews.

If you just vibed through the problem the LLM might have given you the correct solution - but there is no guarantee that it will do it again in the future. Just because you spent less effort on search/official docs/integration into the codebase you learned less about everything surrounding it.

So using LLMs as a junior you are just breaking my trust, and we both know you are not a competent reviewer of LLM code - why am I even dealing with you when I'll get LLM outputs faster myself ? This was my experience so far.

fhd28mo ago

Like with any kind of learning, without a feedback loop (as tight as possible IMHO), it's not gonna happen. And there is always some kind of feedback loop.

Ultra short cycle: Pairing with a senior, solid manual and automated testing during development.

Reasonably short cycle: Code review by a senior within hours and for small subsets of the work ideally, QA testing by a seperate person within hours.

Borderline too long cycle: Code review of larger chunks of code by a senior with days of delay, QA testing by a seperate person days or weeks after implementation.

Terminally long feedback cycle: Critical bug in production, data loss, negative career consequences.

I'm confident that juniors will still learn, eventually. Seniors can help them learn a whole lot faster though, if both sides want that, and if the organisation lets them. And yeah, that's even more the case than in the pre LLM world.

1 more reply

OvbiousError8mo ago

> So using LLMs as a junior you are just breaking my trust, and we both know you are not a competent reviewer of LLM code - why am I even dealing with you when I'll get LLM outputs faster myself ? This was my experience so far.

So much this. I see a 1000 lines super complicated PR that was whipped up in less than a day and I know they didn't read all of it, let alone understand.

1 more reply

dchftcs8mo ago

People obviously turning out LLM code uncritically should be investigated and depending on the findings made redundant. It's a good thing that it allows teams to filter out these people earlier. In my career I have found that a big predictor of the quality of code is the amount of thought put into it and the brains put behind it - procedures like code review can only catch so many things, and a senior's time is saved only when a junior has actually put their brain to work. If someone is shown to never put in thought in their work, they need to make way for people who actually do.

godelski8mo ago

  > you learned less about everything surrounding it.

I think one of the big acceleration points in my skills as a developer was when I moved from searching SO and other similar sources to reading the docs and reading the code. At first, this was much slower. I was usually looking for a more specific thing and didn't usually need the surrounding context. But then as I continued, that surrounding context became important. That stuff I was reading compounded and helped me see much more. These gains were completely invisible and sometimes even looked like losses. In reality, that context was always important, I just wasn't skilled enough to understand why. Those "losses" are more akin to a loss you have when you make an investment. You lost money, but gained a stock.

I mean I still use SO, medium articles, LLMs, and tons of sources. But I find myself just turning to the docs as my first choice now. At worst I get better questions to pay attention to with the other sources.

I think there's this terrible belief that's developed in CS and the LLM crowd targets. The idea that everything is simple. There's truth to this, but there's a lot of complexity to simplicity. The defining characteristic between an expert and a novice is their knowledge of nuance. The expert knows what nuances matter and what don't. Sometimes a small issue compounds and turns into a large one, sometimes it disappears. The junior can't tell the difference, but the expert can. Unfortunately, this can sound like bikeshedding and quibbling over nothings (sometimes it is). But only experts can tell the difference ¯\_(ツ)_/¯

1 more reply

chamomeal8mo ago

Yeah every time I see one of these articles posted on HN I know I'll see a bunch of comments like "well here's how I use claude code: I keep it on a tight leash and have short feedback loops, so that I'm still the driver, and have markdown files that explain the style I'm going for...". Which is fine lol but I'm tired of seeing the exact same conversations.

It's exhausting to hear about AI all the time but it's fun to watch history happen. In a decade we'll look back at all these convos and remember how wild of a time it was to be a programmer.

coldpie8mo ago

I'm thiiiis close to writing a Firefox extension that auto-hides any HN headline with an LLM/AI-related keyword in the title, just so I can find something interesting on here again.

4 more replies

red-iron-pine8mo ago

> Which is fine lol but I'm tired of seeing the exact same conversations.

makes me think the bots are providing these conversations

1 more reply

theshrike798mo ago

It's fun to see Vibe Coders discover the basics of project management in real time without ever reading a book :)

That's what programming with LLMs is, it's just project management: You split the tasks into manageable chunks (ones that can be completed in a single context window), you need to have good onboarding documentation (CLAUDE.md or the equivalent) and good easy to access documentation (docs/ with markdown files).

Exactly what you use to manage a team of actual human programmers.

automatic61318mo ago

>- LLMs utility is high enough that it is now going to be a standard tool in the toolbox of every software engineer, but it is definitely not replacing anyone at current capability.

Right! Problem, billions of dollars have been poured into this wrt to infrastructure, datacenters, compute and salaries. LLMs need to be at the level of replacing vast swathes of us to be worth it. LLMs are not going to be doing that.

This is a collosal malinvestment.

utyop228mo ago

Yeah eventually reality and fantasy have to converge.

Nobody knows when. But it will. TBH the biggest danger is that all the hopes and dreams aren't materialised and the appetite for high-risk investments dissipates.

We've had this period in which you can be money losing and its OK. But I believe we have passed the peak on that - and this is destined to blow up.

dawnerd8mo ago

On your last point I’ve found it about a wash in terms of time savings for me. For boiler plate / throw away code it’s decent enough - especially if I don’t care about code quality and only want a result.

It’s wasted so much time trying to make it write actual production quality code. The consistency and over-verbose nature kill it for me.

sunir8mo ago

All true if you one shot the code.

If you have a sophisticated agent system that uses multiple forward and backward passes, the quality improves tremendously.

Based on my set up as of today, I’d imagine by sometime next year that will be normal and then the conversation will be very different; mostly around cost control. I wouldn’t be surprised if there is a break out popular agent control flow language by next year as well.

The net is that unsupervised AI engineering isn’t really cheaper better or faster than human engineering right now. Does that mean in two years it will be? Possibly.

There will be a lot of optimizations in the message traffic, token uses, foundational models, and also just the Moore’s law of the hardware and energy costs.

But really it’s the sophistication of the agent systems that control quality more than anything. Simply following waterfall (I know, right? Yuck… but it worked) increased code quality tremoundously.

I also gave it the SelfDocumentingCode pattern language that I wrote (on WikiWikiWeb) as a code review agent and quality improved tremendously again.

theshrike798mo ago

> Based on my set up as of today, I’d imagine by sometime next year that will be normal and then the conversation will be very different; mostly around cost control. I wouldn’t be surprised if there is a break out popular agent control flow language by next year as well.

Currently it's just VC funded. The $20 packages they're selling are in no way cost-effective (for them).

That's why I'm driving all available models like I stole them, building every tool I can think of before they start charging actual money again.

By then local models will most likely be at a "good enough" level especially when combined with MCPs and tool use so I don't need to pay per token for APIs except for special cases.

1 more reply

zarzavat8mo ago

> If you have a sophisticated agent system that uses multiple forward and backward passes, the quality improves tremendously.

Just an hour ago I asked Claude to find bugs in a function and it found 1 real bug and 6 hallucinated bugs.

One of the "bugs" it wanted to "fix" was to revert a change that I had made previously to fix a bug in code it had written.

I just don't understand how people burning tokens on sophisticated multi-agent systems are getting any value from that. These LLMs don't know when they are doing something wrong, and throwing more money at the problem won't make them any smarter. It's like trying to build Einstein by hiring more and more schoolkids.

Don't get me wrong, Claude is a fantastic productivity boost but letting it run around unsupervised would slow me down rather than speed me up.

oblio8mo ago

> and also just the Moore’s law of the hardware and energy costs.

What Moore's law?

MontyCarloHall8mo ago

It's almost as if you could recapitulate each of these discussions using an LLM!

theshrike798mo ago

> - The code that is written by LLMs always needs to be heavily monitored for correctness, style, and design, and then typically edited down, often to at least half its original size

For this language matters a lot, if whatever you're using has robust tools for linting and style checks, it makes the LLMs job a lot easier. Give it a rule (or a forced hook) to always run tests and linters before claiming a job is done and it'll iterate until what it produces matches the rules.

But LLM code has a habit of being very verbose and covers every situation no matter how minuscule.

This is especially grating when you're doing a simple project for local use and it's bootstrapping something that's enterprise-ready :D

WorldMaker8mo ago

If you force the LLM to solve every test failure this also can lead to the same breakdown models as very junior developers coding to the tests rather than the problems, I've seen all of:

1) I broke the tests, guess I should delete them.

2) I broke the tests, guess the code I wrote was wrong, guess I should delete all of that code I wrote.

3) I broke the tests, guess I should keep adding more code and scaffolding. Another abstraction layer might work? What if I just add skeleton code randomly, does this add random code whack-a-mole work?

That last one can be particularly "fun" because already verbose LLM code skyrockets into baroque million line PRs when left truly unsupervised, and that PR still won't build or pass tests.

There's no true understanding by an LLM. Forcing it to lint/build can be important/useful, but still not a cure-all, and leads to such fun even more degenerate cases than hand-holding it.

lordgrenville8mo ago

Yes! Reminds me of one of my all-time favourite HN comments https://news.ycombinator.com/item?id=23003595

godelski8mo ago

FWIW, I think this summary is pretty in line with most "anti-LLM" crowd. Being in that "side" myself it is not that I don't use LLMs it is that I do not think LLMs are close to being able to replace me.

I also think there's some big variance in each of the "sides" (I think it is more a bimodal spectrum really) with a lot to you last point. Sometimes they save you lots of time, sometimes they waste a lot of time. I expect more senior people are going to get less benefits from them because they've already spent lots of time developing time saving strategies. Plus, writing lines is only a small part of the job. The planning and debugging stages are much more time intensive and can be much more difficult to wrangle an LLM with. Honestly I think it is a lot about trust. Forgetting "speed", do I trust myself to be more likely to catch errors in code that I write or code that I review?

Personally, I find that most of the time I end up arguing with the LLM over some critical detail and I've found Claude code will sometimes revert things that I asked it to change (these can be time consuming errors because they are often invisible). It gives the appearance of being productive (even feeling that way) but I think it is a lot more like if you spent time in a meeting vs time coding. Meetings can help and are very time consuming, but can also be a big waste of time when over used. Sometimes it is better to have two engineers go try out their methods independently and see what works out within the larger scope. Something is always learned too.

specialist8mo ago

> ...until everyone gets tired of talking about [LLMs]

Small price to pay for shuffling Agile Manifesto off the stage.

dboreham8mo ago

My experience with the latest Claude Code has been: it's not nearly as bad as you say.

swframe28mo ago

Preventing garbage just requires that you take into account the cognitive limits of the agent. For example ...

1) Don't ask for large / complex change. Ask for a plan but ask it to implement the plan in small steps and ask the model to test each step before starting the next.

2) For really complex steps, ask the model to write code to visualize the problem and solution.

3) If the model fails on a given step, ask it to add logging to the code, save the logs, run the tests and the review the logs to determine what went wrong. Do this repeatedly until the step works well.

4) Ask the model to look at your existing code and determine how it was designed to implement a task. Some times the model will put all of the changes in one file but your code has a cleaner design the model doesn't take into account.

I've seen other people blog about their tricks and tips. I do still see garbage results but not as high as 95%.

rco87868mo ago

I feel like I do all of this stuff and still end up with unusable code in most cases, and the cases where I don't I still usually have to hand massage it into something usable. Sometimes it gets it right and it's really cool when it does, but anecdotally for me it doesn't seem to be making me any more efficient.

enobrev8mo ago

> it doesn't seem to be making me any more efficient

That's been my experience.

I've been working on a 100% vibe-coded app for a few weeks. API, React-Native frontend, marketing website, CMS, CI/CD - all of it without changing a single line of code myself. Overall, the resulting codebase has been better than I expected before I started. But I would have accomplished everything it has (except for the detailed specs, detailed commit log, and thousands of tests), in about 1/3 of the time.

2 more replies

jaggederest8mo ago

The key is prompting. Prompt to within an inch of your life. Treat prompts as source code - edit them in files, use @ notation to bring them into the console. Use Claude to generate its own prompts - https://github.com/wshobson/commands/ and https://github.com/wshobson/agents/ are very handy, they include a prompt-engineer persona.

I'm at the point now where I have to yell at the AI once in a while, but I touch essentially zero code manually, and it's acceptable quality. Once I stopped and tried to fully refactor a commit that CC had created, but I was only able to make marginal improvements in return for an enormous time commitment. If I had spent that time improving my prompts and running refactoring/cleanup passes in CC, I suspect I would have come out ahead. So I'm deliberately trying not to do that.

I expect at some point on a Friday (last Friday was close) I will get frustrated and go build things manually. But for now it's a cognitive and effort reduction for similar quality. It helps to use the most standard libraries and languages possible, and great tests are a must.

Edit: Also, use the "thinking" commands. think / think hard / think harder / ultrathink are your best friend when attempting complicated changes (of course, if you're attempting complicated changes, don't.)

4 more replies

nostrademons8mo ago

I've found that an effective tactic for larger, more complex tasks is to tell it "Don't write any code now. I'm going to describe each of the steps of the problem in more detail. The rough outline is going to be 1) Read this input 2) Generate these candidates 3) apply heuristics to score candidates 4) prioritize and rank candidates 5) come up with this data structure reflecting the output 6) write the output back to the DB in this schema". Claude will then go and write a TODO list in the code (and possibly claude.md if you've run /init), and prompt you for the details of each stage. I've even done this for an hour, told Claude "I have to stop now. Generate code for the finished stages and write out comments so you can pick up where you left off next time" and then been able to pick up next time with minimal fuss.

hex4def68mo ago

FYI: You can force "Plan mode" by pressing shift-tab. That will prevent it from eagerly implementing stuff.

1 more reply

yahoozoo8mo ago

How does a token predictor “apply heuristics to score candidates”? Is it running a tool, such as a Python script it writes for scoring candidates? If not, isn’t it just pulling some statistically-likely “score” out of its weights rather than actually calculating one?

2 more replies

plaguuuuuu8mo ago

I've been using a few LLMs/agents for a while and I still struggle with getting useful output from it.

In order for it not to do useless stuff I need to expend more energy on prompting than writing stuff myself. I find myself getting paranoid about minutia in the prompt, turns of phrase, unintended associations in case it gives shit-tier code because my prompt looked too much like something off experts-exchange or whatever.

What I really want is something like a front-end framework but for LLM prompting, that takes away a lot of the fucking about with generalised stuff like prompt structure, default to best practices for finding something in code, or designing a new feature, or writing tests..

Mars0088mo ago

> What I really want is something like a front-end framework but for LLM prompting

It's not simple to even imagine ideal solution. The more you think about it the more complicated your solution becomes. Simple solution will be restricted to your use cases. Generic is either visual or a programming language. I's like to have visual constructor, graph of actions, but it's complicated. The language is more powerful.

dontlaugh8mo ago

At that point, why not just write the code yourself?

lucasyvas8mo ago

I reached this conclusion pretty quickly. With all the hand holding I can write it faster - and it’s not bragging, almost anyone experienced here could do the same.

Writing the code is the fast and easy part once you know what you want to do. I use AI as a rubber duck to shorten that cycle, then write it myself.

3 more replies

kyleee8mo ago

Partly it seems to be less taxing for the human delivering the same amount of work. I find I can chat with Claude, etc and work more. Which is a double edged sword obviously when it comes to work/life balance etc. But also I am less mentally exhausted from day job and able to enjoy programming and side projects again.

1 more reply

harrall8mo ago

I don’t do much of the deep prompting stuff but I find AI can write some code faster than I can and accurately most of the time. You just need to learn what those things are.

But I can’t tell you any useful tips or tricks to be honest. It’s like trying to teach a new driver the intuition of knowing when to brake or go when a traffic light turns yellow. There’s like nothing you can really say that will be that helpful.

1 more reply

utyop228mo ago

I'm finding what's happening right now kinda bizarre.

The funny thing is - we need less. Less of everything. But an up-tick in quality.

This seems to happen with humans with everything - the gates get opened, enabling a flood of producers to come in. But this causes a mountain of slop to form, and overtime the tastes of folks get eroded away.

Engineers don't need to write more lines of code / faster - they need to get better at interfacing with other folks in the business organisation and get better at project selection and making better choices over how to allocate their time. Writing lines of code is a tiny part of what it takes to get great products to market and to grow/sustain market share etc.

But hey, good luck with that - ones thinking power is diminished overtime by interacing with LLMs etc.

1 more reply

MangoCoffee8mo ago

I've been vibe coding a couple of personal projects. I've found that test-driven development fits very well with vibe coding, and it's just as you said break up the problem into small, testable chunks, get the AI to write unit tests first, and then implement the actual code

yodsanklai8mo ago

Actually, all good engineering principles which reduce cognitive load for humans work for AI as well.

2 more replies

alexsmirnov8mo ago

TDD is exactly that I unable to get from AI tools. Probably, because training sets always have both code and tests. I tried multiply models from all major providers, and all failed to create tests without seen the code. One workflow that helps is to create dirty implementation and generate tests for it. Then throw away the first code and use different model for final implementation.

The best way is to create tests yourself, and block any attempts to modify them

MarkMarine8mo ago

Works great until it’s stuck and it starts just refactoring the tests to say true == true and calling it a day. I want the inverse of black box testing, like the inside of the box has the model in it with the code and it’s not allowed to reach outside the box and change the grades. Then I can just do the Ralph Wiggum as a software engineer loop to get over the reward hacking tendencies

1 more reply

jason_zig8mo ago

I've seen people post this same advice and I agree with you that it works but you would think they would absorb this common strategy and integrate it as part of the underlying product at this point...

noosphr8mo ago

The people who build the models don't understand how to use the models. It's like asking people who design CPUs to build data-centers.

I've interviewed with three tier one AI labs and _no-one_ I talked to had any idea where the business value of their models came in.

Meanwhile Chinese labs are releasing open source models that do what you need. At this point I've build local agentic tools that are better than anything Claude and OAI have as paid offerings, including the $2,000 tier.

Of course they cost between a few dollars to a few hundred dollars per query so until hardware gets better they will stay happily behind corporate moats and be used by the people blessed to burn money like paper.

2 more replies

nostrademons8mo ago

A lot of it is integrated into the product at this point. If you have a particularly tricky bug, you can just tell Claude "I have this bug. I expected output 'foo' and got output 'bar'. What went wrong?" It will inspect the code and sometimes suggest a fix. If you run it and it still doesn't work, you can say "Nope, still not working", and Claude will add debug output to the whole program, tell you to run it again, and paste the debug output back into the console. Then it will use your example to write tests, and run against them.

tombot8mo ago

Claude Code at least now lets you use its best model for planning mode and its cheapest model for coding mode.

1 more reply

MikeTheGreat8mo ago

Genuine question: What do you mean by " ask it to implement the plan in small steps"?

One option is to write "Please implement this change in small steps?" more-or-less exactly

Another option is to figure out the steps and then ask it "Please figure this out in small steps. The first step is to add code to the parser so that it handles the first new XML element I'm interested in, please do this by making the change X, we'll get to Y and Z later"

I'm sure there's other options, too.

Benjammer8mo ago

My method is that I work together with the LLM to figure out the step-by-step plan.

I give an outline of what I want to do, and give some breadcrumbs for any relevant existing files that are related in some way, ask it to figure out context for my change and to write up a summary of the full scope of the change we're making, including an index of file paths to all relevant files with a very concise blurb about what each file does/contains, and then also to produce a step-by-step plan at the end. I generally always have to tell it to NOT think about this like a traditional engineering team plan, this is a senior engineer and LLM code agent working together, think only about technical architecture, otherwise you get "phase 1 (1-2 weeks), phase 2 (2-4 weeks), step a (4-8 hours)" sort of nonsense timelines in your plan. Then I review the steps myself to make sure they are coherent and make sense, and I poke and prod the LLM to fix anything that seems weird, either fixing context or directions or whatever. Then I feed the entire document to another clean context window (or two or three) and ask it to "evaluate this plan for cohesiveness and coherency, tell me if it's ready for engineering or if there's anything underspecified or unclear" and iterate on that like 1-3 times until I run a fresh context window and it says "This plan looks great, it's well crafted, organized, etc...." and doesn't give feedback. Then I go to a fresh context window and tell it "Review the document @MY_PLAN.md thoroughly and begin implementation of step 1, stop after step 1 before doing step 2" and I start working through the steps with it.

1 more reply

conception8mo ago

I tell it to generate a todo.md file with hyper atomic todos each requiring 20 loc or less. Then have it go through that. If the change is too big, generate phases (5-25) and then do the todos for each phase. That plus some sort of reference docs/high level plan keeps it going along all right.

ants_everywhere8mo ago

What I do is a step is roughly a reviewable commit.

So I'll say something like "evaluate the URL fetcher library for best practices, security, performance, and test coverage. Write this up in a markdown file. Add a design for single flighting and retry policy. Break this down into steps so simple even the dumbest LLM won't get confused.

Then I clear the context window and spawn workers to do the implementation.

com2kid8mo ago

> 1) Don't ask for large / complex change. Ask for a plan but ask it to implement the plan in small steps and ask the model to test each step before starting the next.

I asked Claude Code to read a variable from a .env file.

It proceeded to write a .env parser from scratch.

I then asked it to just use Node's built in .env file parsing....

This was the 2nd time in the same session that it wrote a .env file parser from scratch. :/

Claude Code is amazing, but it'll goes off and does stupid even for simple requests.

NitpickLawyer8mo ago

Check your settings, they might be unable to read .env files as a guardrail.

1 more reply

theshrike798mo ago

It doesn't say no.

For me it built a full-ass YAML parser when it couldn't use Viper to parse the configuration correctly :)

It was a fully vibe-coded project (I like playing stupid and seeing what the LLM does), but it got caught when the config got a bit more complex and its shitty regex-yaml-parser didn't work anymore. :)

ants_everywhere8mo ago

IMO by far the best improvement would be to make it easier for the agent to force the agent to use a success criterion.

Right now it's not easy prompting claude code (for example) to keep fixing until a test suite passes. It always does some fixed amount of work until it feels it's most of the way there and stops. So I have to babysit to keep telling it that yes I really mean for it to make the tests pass.

adastra228mo ago

This is why the jobs market for new grads and early career folks has dried up. A seasoned developer knows that this is how you manage work in general, and just treats the AI like they would a junior developer—and gets good results.

CuriouslyC8mo ago

Why bother handing stuff to a junior when an agent will do it faster while asking fewer questions, and even if the first draft code isn't amazing, you can just quality gate with an LLM reviewer that has been instructed to be brutal and do a manual pass when the code gets by the LLM reviewer.

1 more reply

paulcole8mo ago

> Ask for a plan but ask it to implement the plan in small steps and ask the model to test each step before starting the next.

Tried this on a developer I worked with once and he just scoffed at me and pushed to prod on a Friday.

NitpickLawyer8mo ago

> scoffed at me and pushed to prod on a Friday.

that's the --yolo flag in cc :D

rvnx8mo ago

Your tips are perfect.

Most users will just give a vague tasks like: "write a clone of Steam" or "create a rocket" and then they blame Claude Code.

If you want AI to code for you, you have to decompose your problem like a product owner would do. You can get helped by AI as well, but you should have a plan and specifications.

Once your plan is ready, you have to decompose the problem into different modules, then make sure each modules are tested.

The issue is often with the user, not the tool, as they have to learn how to use the tool first.

wordofx8mo ago

> Most users will just give a vague tasks like: "write a clone of Steam" or "create a rocket" and then they blame Claude Code.

This seems like half of HN with how much HN hates AI. Those who hate it or say it’s not useful to them seem to be fighting against it and not wanting to learn how to use it. I still haven’t seen good examples of it not working even with obscure languages or proprietary stuff.

4 more replies

ccorcos8mo ago

Seems like this logic could all be represented in Claude.md and some agents. Has anyone done this? I’d love to just import that into my project because I’m using some of these tactics but it’s fairly manual and tedious.

biggc8mo ago

Thin sounds a lot like making a change yourself.

therein8mo ago

It appeals to some people because they'd rather manage a bot and get it to do something they told it to do rather than do it themselves.

rmonvfer8mo ago

I’d like to add: keep some kind of development documentation where you describe in detail the patterns and architecture of your application and it’s components.

I’ve seen incredible improvements just by doing this and using precise prompting to get Claude to implement full services by itself, tests included. Of course it requires manual correction later but just telling Claude to check the development documentation before starting work on a feature prevents most hallucinations (that and telling it to use the Context7 MCP for external documentation), at least in my experience.

The downside to this is that 30% of your context window will be filled with documentation but hey, at least it won’t hallucinate API methods or completely forget that it shouldn’t reimplement something.

Just my 2 cents.

salty_frog8mo ago

This is my algorithm for wetware llms.

whateveracct8mo ago

that sounds like just coding it yourself with extra steps

baq8mo ago

Exactly, then you launch ten copies of yourself and write code to manage that yourself, maybe.

renegat0x08mo ago

Huh, I thought that AI was made to be magic. Click and it generates code. Turns out it is like magic, but you are an apprentice, and still have to learn how to wield it.

dotancohen8mo ago

All sufficiently advanced technology...

rhubarbtree8mo ago

Does anyone have a link to a video that uses Claude Code to produce clean robust code that solves a non trivial problem (ie not tic tac toe or a landing page) more quickly than a human programmer can write? I don’t want a “demo”, I want a livestream from an independent programmer unaffiliated with any AI company and thus not incentivised to hype.

I want the code to have subsequently been deployed in production and demonstrably robust, without additional work outside of the livestream.

The livestream should include code review, test creation, testing, PR creation.

It should not be on a greenfield project, because nearly all coding is not.

I want to use Claude and I want to be more productive, but my experience to date is that for writing code beyond autocomplete AI is not good enough and leads to low quality code that can’t be maintained, or else requires so much hand holding that it is actually less efficient than a good programmer.

There are lots of incentives for marketing at the grassroots level. I am totally open to changing my mind but I need evidence.

M4v3R8mo ago

I've live streamed how I've built a tower defense game over a span of a week entirely using AI. I've also written down all the prompts were used to create this game, you can read about it here: https://news.ycombinator.com/item?id=44463967

Mind you I've never wrote a non-trivial game before in my life. It would take me weeks to do this on my own without any AI assistance.

Right now I'm working on a 3d world map editor for Final Fantasy VII that was also almost exclusively vibe-coded. It's almost finished and I plan a write up and a video about it when I'm done.

Now of course you've made so many qualifiers in your post that you'll probably dismiss this as "not production", "not robust enough", "not clean" etc. But this doesn't matter to me. What matters is I manage to finish projects that I would not otherwise if not for the AI coding tools, so having them is a huge win for me.

hvb28mo ago

> What matters is I manage to finish projects that I would not otherwise if not for the AI coding tools, so having them is a huge win for me.

I think the problem is in your definition of finishing a project.

Can you support said code, can you extend it, are you able to figure out where bugs are when they show up? In a professional setting, the answer to all of those should likely be yes. That's what production code is.

1 more reply

sksrbWgbfK8mo ago

Unless you write tower defense games all day long for a living, I don't know how it's interesting.

rhubarbtree8mo ago

> Now of course you've made so many qualifiers in your post that you'll probably dismiss this as "not production", "not robust enough", "not clean" etc

Sure, my interest is whether it’s suitable for production use on an existing codebase, ie for what constitutes most of software engineering.

But - thanks for sharing, I will take a look and watch some of the stream.

infamousclyde8mo ago

Jon Gjengset (of MIT Missing Semester, Rust for Rustsceans, etc) shared a stream doing complex changes of increasing complexity to a geospatial math library in Rust. He’s an excellent engineer, and was able to pick apart AI-suggested changes liberally. The caveat is that the video is a bit long, but segmented nicely.

I think he had a positive experience overall, but it was clear throughout the stream that he was not yielding control to a pure-agent workflow soon.

https://youtu.be/eZ7DVHAK8hw?si=vWW4kz2qiRRceNMQ

toth8mo ago

I think you shared the wrong link. Based on a quick youtube search I think you meant this one

https://youtu.be/EL7Au1tzNxE

rhubarbtree8mo ago

Thanks! This looks great (correct link in the child comment).

ochronus8mo ago

I agree. Based on my very subjective and limited experience (plus friends/colleagues), when it comes to producing solutions, what you get from AI is what you get from your 2-day hackathon—then you spend months making it production-ready.

And your starry-eyed CEO is asking the same old question: How come everything takes so long when a 2-person team over two days was able to produce a shiny new thing?!. sigh

Could be used for early prototyping, though, before you hire your first engineers just to fire them 6 months later.

jf228mo ago

Yeah but you get the two days of hacking in 15 minutes.

And I highly doubt you spend months, as in 5+ weeks at the least making it production ready.

What even is "production readiness?" 100% fully unit tested and ready for planetary hyper scale or something? 95% of the human generated software I work on is awful but somehow makes people money.

1 more reply

coffeeri8mo ago

This video [0] is relevant, though it actually supports your point - it shows Claude Code struggling with non-trivial tasks and needing significant hand-holding.

I suspect videos meeting your criteria are rare because most AI coding demos either cherry-pick simple problems or skip the messy reality of maintaining real codebases.

[0] https://www.youtube.com/watch?v=EL7Au1tzNxE

thecupisblue8mo ago

Great video! Even more, shows a few things - how good it is with such a niche language but also exposes some direct flaws.

First off, Rust represents quite a small part of the training dataset (last I checked it was under 1% of code dataset) in most public sets, so it's got waaay less training then other languages like TS or Java. You added 2 solid features, backed with tests and documentation and nice commit messages. 80% of devs would not deliver this in 2.5 hours.

Second, there was a lot of time/token waste messing around with git and git messages. Few tips I noticed that could help you in the workflow:

#1: Add a subagent for git that knows your style, so you don't poison direct claude context and spend less tokens/time fighting it.

#2: Claude has hooks, if your favorite language has a formatter like rust fmt, just use hooks to run rust fmt and similar.

#3: Limit what they test, as most LLM models tend to write overeager tests, including testing if "the field you set as null is null", wasting tokens.

#5: Saying "max 50 characters title" doesn't really mean anything to the LLM. They have no inherent ability to count, so you are relying on probability, which is quite low since your context is quite filled at this point. If they want to count the line length, they also have to use external tools. This is an inherent LLM design issue and discussing it with an LLM doesn't get you anywhere really.

2 more replies

Aeolun8mo ago

> I suspect videos meeting your criteria are rare because most AI coding demos either cherry-pick simple problems or skip the messy reality of maintaining real codebases.

Or we’re just having too much fun making stuff to make videos to convince people that are never going to be convinced.

1 more reply

simonw8mo ago

Armin Ronacher (long-time Python and Rust open source community figure, creator of Flask and Jinja among others) has several YouTube videos that partially fit the bill. https://www.youtube.com/watch?v=sQYXZCUvpIc and https://www.youtube.com/watch?v=Y4_YYrIKLac and https://www.youtube.com/watch?v=tg61cevJthc

ku1ik8mo ago

I watched one of those videos and it was very underwhelming, imho not really selling Claude Code to anyone who isn’t convinced.

1 more reply

rhubarbtree8mo ago

I’d say they’re a good match, thanks for sharing! Will watch in detail.

MontyCarloHall8mo ago

Forget a livestream, I want to hear from maintainers of complex, actively developed, and widely used open-source projects (e.g. ffmpeg, curl, openssh, sqlite). Highly capable coding LLMs have been out for long enough that if they do indeed have meaningful impact on writing non-trivial, non-greenfield/boilerplate code, it ought to be clearly apparent in an uptick of positive contributions to projects like these.

stitched2gethr8mo ago

This contains some specific data with pretty graphs: https://youtu.be/tbDDYKRFjhk?t=623

But if you do professional development and use something like Claude Code (the current standard, IMO) you'll quickly get a handle on what it's good at and what it isn't. I think it took me about 3-4 weeks of working with it at an overall 0x gain to realize what it's going to help me with and what it will make take longer.

2 more replies

rhubarbtree8mo ago

Yes, this would be really useful.

AI coding should be transforming OSS, and we should be able to get a rough idea of the scale of the speed up in development. It’s an ideal application area.

brookst8mo ago

So what percentage of human programmers, in the entire world, do you think contribute to meaningful projects like those?

1 more reply

stared8mo ago

I wouldn't dive for these. Vibe coding is a slot machine - sometimes you get wonderful results on the first prompt, more than often - not. So, a cherry-picked example is not a proof it works.

If you want me to show an example of vibe coding, I bet I can migrate someone's blog to Astro with Claude Code faster than a frontend engineer.

> It should not be on a greenfield project, because nearly all coding is not.

Well, Claude Code does not work the best for existing projects. (With some exceptions.)

sirpalee8mo ago

I had success with it on a large, established project when using it for refactoring, moving around functions, implementing simple things and writing documentation. It failed when implementing complex new features and horribly went off the rails when trying to debug issues. Almost all its recommendations were wrong, and it kept trying to change things that certainly weren't the problem.

apercu8mo ago

This matches my experience as well. One unexpected benefit is that I learned a couple pieces of hardware inside and out because LLMs make so many mistakes. If I wouldn’t have used an LLM I wouldn’t have gone down all these rabbit holes based on incorrect info - I would have just read the docs and solved my use case but missed out on deeper understanding.

Just reinforces my biases that LLMs are currently garbage for anything new and complicated. But they are a great interactive note taker and brainstorming tool.

vincent_builds8mo ago

Author here. I think that's a great idea.

I've considered live-streaming my work a few times, but all my work is on closed-source backend applications with sensitive code and data. If I ever get to work on an open-source product, I'll ask about live-streaming it. I think it would be a fun experience.

Although I cannot show the live stream or the code, I am writing and deploying production code for a brownfield project.

Two recent production features:

1. Quota crossing detection system for billable metrics - Complex business logic for billing infrastructure - Detects when usage crosses configurable thresholds across multiple metric types - Time: 4 days while working on other smaller tasks in parallel work vs probably 10 days focused without AI

2. Sentry monitoring wrapper for metering cron jobs - Reusable component wrapping all cron jobs with Sentry monitoring capabilities - Time: 1 day parallelled with other tasks vs 2 days focused

As you can probably tell, my work is not glamorous :D. It's all the head-scratching backend work, extending the existing system with more capabilities or to make it more robust.

I agree there is a lot of hand-holding required, but I'm betting on the systems getting better as time goes on. We are only two years into this AI journey, and the capabilities will most likely improve over the next few years.

mathieuh8mo ago

I actually don't think I've ever had AI solve a non-trivial problem by itself. I do find it useful but I always have to give it the breakthrough which it can then implement.

dewey8mo ago

One of these things where you just have to put in the work yourself for a while and see how it works for your workflow and project.

rhubarbtree8mo ago

That’s unusual though? I think programming languages, idioms, features - for example - are adopted by consensus, not by every programmer starting out from scratch and evaluating each one.

1 more reply

boesboes8mo ago

I've been using it to do all my work for the last month or two and have decided it's not worth it. I haven't made any recordings or anything, so this is purely my subjective experience: it's ok at greenfield stuff with some hand-holding to do things properly all the time. It knows the framework well, but won't try to use it correctly and go off on weird detours to 'debug' things that fail because of it. But on a bigger refactor of legacy code, that is well tested and the 'migration' process to the new architecture documented it just was very infuriating. One moment it seems to be doing alright and then suddenly I'm going backwards for days because it just makes things look like they work. It gets stuck on bad idea's and keeps trying them. Keeps making the same mistakes over and over, despite clear instruction on how to do it correctly..

I think it misses a feedback loop. Something that evaluates what went wrong, what works, what wont, and remembers that and then can use that to make better plans. From making sure it runs the tests correctly (instead of trying 5 different methods each time) to how to do TDD and what comments to add.

sunnyam8mo ago

I have the same opinion, but my worry with this attitude is that it's going to hold me back in the long run.

A common thread in articles about developers using AI is that they're not impressed at first but then write more precise instructions and provide context in a more intuitive manner for the AI to read and that's the point at which they start to see results.

Would these principles not apply to regular developers as well? I suspect that most of my disappointment with these tools is that I haven't spend enough time learning how to use them correctly.

With Claude Code you can tell it what it did wrong. It's a bit hit-or-miss as to whether it will take your comments on board (or take them too literally) but I do think it's too powerful a tool to just ignore.

I don't want someone to just come and eat my cake because they've figured out how to make themselves productive with it.

1 more reply

adastral8mo ago

PostgresTV livestreams "vibe coding" 1h sessions implementing small PostgreSQL features with Cursor (mostly claude-4-sonnet model) every week, by experienced PostgreSQL contributors. [0] is their latest stream.

I personally have not watched much, but it sounds just like what you are looking for!

[0] https://www.youtube.com/watch?v=3MleDtXZUlM

rhubarbtree8mo ago

Yes, good enough for me, thanks. I look forward to watching it. This is particularly interesting as it’s a group stream.

sunir8mo ago

I’ve built an agent system to quality control the output following my engineering know how.

The quality is much better but it is much slower than a human engineer. However that’s irrelevant to me. If I can build two projects a day I am more productive than if I can build one. And more importantly I can build projects that increase my velocity and capability.

The difference is I run my own business so that matters to me more than my value or aptitude as an engineer.

sdeframond8mo ago

Does any experienced dev have experience outsourcing to another dev that produces clean robust code that solves a non trivial problem (ie not tic tac toe or a landing page) more quickly than she would by herself?

I think not.

The reason is about missing context. Such non-trivial problems have a lot of specific unwritten context. It takes a lot of effort to share that context. Often more than doing anything one self.

lysecret8mo ago

https://news.ycombinator.com/item?id=44159166

Kiro8mo ago

Very few people want to record themselves doing stuff or have an incentive to convince anyone except for winning internet arguments.

nosianu8mo ago

> Very few people .... have an incentive to convince anyone

We are already only talking about the subset the writes AI blog posts, not about all of humanity.

rhubarbtree8mo ago

Streaming coding is a popular and widespread activity, and it is usually nothing to do with “convincing” folks.

1 more reply

benterix8mo ago

I guess someone could make such a video, the question is, would anyone have the patience to watch it.

ochronus8mo ago

<fun> Just found the video you're looking for! https://www.youtube.com/watch?v=JeNS1ZNHQs8 </fun>

sneak8mo ago

Most of the code I write is greenfield projects. I’m pretty spoiled, I guess. Claude Code has helped me ship a lot of things I always wanted to build but didn’t have time to do.

wooque8mo ago

You got it wrong, the purpose of this blog post is not marketing Claude Code, but marketing their company. Writing about AI just happens to get more eyeballs.

coverj8mo ago

Are there even non-AI development streams/videos that meet this criteria?

brookst8mo ago

You’re coming at this from a highly biased and even angry position, which means I don’t think you’ll be satisfied with anything people can show you.

Which isn’t entirely unreasonable; AI is not really there yet. If you took this moment and said AI will never get better, and tools and processes will never improve to better accommodate AI, and the only fair comparison is a top-tier developer, and the only legitimate scenario is high quality human-maintainable code at scale… then yes, AI coding is a lot of hype with little value.

But that’s not what’s going on, is it? The trajectory here is breathtaking. A year ago you could have set a much lower bar and AI still would have failed. And the tooling to automate PRs and documentation was rough.

AI is already providing massive leverage to both amateur and professional developers. They use the tools differently (in my world the serious developers mostly use it for boilerplate and tests).

I don’t think you’ll be convinced if the value until the revolution is in the past. Which is fine! For many of us (me being in the amateur but lifelong programmer camp) it’s already delivering value that makes its imperfections worthwhile.

Is the code I’m generating world class, ready to be handed over to humans at enterprise sclae? No, definitely not. But it exists, and the scale of my amateur projects has gone through the roof, while quality is also up because tests take near zero effort.

I know it won’t convince you, and you have every right to be skeptical and dismiss the whole thing as marketing. But IMO rejecting this new tech in the short term means you’re in for a pretty rough time when the evidence is so insurmountable. Which might be a year or two. Or even three!

ale8mo ago

It’s about time these types of articles actually include the types of tasks being “orchestrated” (as the author writes) that aren’t just plain refactoring chores or React boilerplate. Sanity has quite a backlog of long-requested features and the message here is that these agents are supposedly parallelizing a lot of the work. What kind of staff engineer has “80% of their code” written by a “junior developer who doesn't learn“?

mindwok8mo ago

IMO “junior developer who doesn't learn“ is not quite right. Claude is more like an senior, highly academic engineer who has read all the literature but hasn't ever written any code. Amazing encyclopaedic knowledge, zero taste.

I've been building commercial codebases with Claude for the last few months and almost all of my input is on taste and what defines success. The code itself is basically disposable.

all28mo ago

> The code itself is basically disposable.

I'm finding this is the case for my work as well. The spec is the secret sauce, the code (and its many drafts) are disposable. Eventually I land on something serviceable, but until I do, I will easily drop a draft and start on a new one with a spec that is a little more refined.

2 more replies

globular-toast8mo ago

If the code is disposable then where do all the rules and reasoning etc live apart from in your head?

1 more reply

baq8mo ago

> The code itself is basically disposable.

This is key. We’re in mass production of software era. It’s easier and cheaper to replace a broken thing/part than to fix it, things being some units of code.

sanitycheck8mo ago

Eh, Claude is like a magical spaniel that can read and write very quickly, with early-stage alzheimers, on amphetamines.

Yes it knows a lot and can regurgitate things and create plausible code (if I have it run builds and fix errors every time it changes a file - which of course eats tokens) but having absolutely no understanding of how time or space works leads to 90% of its great ideas being nonsensical for UI tasks. Everything is needing very careful guidance and supervision otherwise it decides to do something different instead. For back end stuff, maybe it's better.

I'm on the fence regarding overall utility but $20/month could almost be worth it for a tool that can add a ton of debug logging in seconds, some months.

vincent_builds8mo ago

Hi Ale, author here. Skepticism is understandable, but trust me, I'm not just writing React boilerplate or refactoring.

I find it difficult to include examples because a lot of my work is boring backend work on existing closed-source applications. It's hard to share, but I'll give it a go with a few examples :)

----

First example: Our quota detection system (shipped last month) handles configurable threshold detection across billing metrics. The business logic is non-trivial, distinguishing counter vs gauge metrics, handling multiple consumers, and efficient SQL queries across time windows.

Claude's evolution: - First pass: Completely wrong approach (DB triggers) - Second pass: Right direction, wrong abstraction - Third pass: Working implementation, we could iterate on

---- Second example: Sentry monitoring wrapper for cron jobs, a reusable component to help us observe our cronjob usage

Claude's evolution: - First pass: Hard-coded the integration into each cron job, a maintainability nightmare. - Second pass: Using a wrapper, but the config is all wrong - Third pass: Again, OK implementation, we can iterate on it

----

The "80%" isn't about line count; it's about Claude handling the exploration space while I focus on architectural decisions. I still own every line that ships, but I'm reviewing and directing rather than typing.

This isn't writing boilerplate, it's core billing infrastructure. The difference is that Claude is treated like a very fast junior who needs clear boundaries rather than expecting senior-level architecture decisions.

bsder8mo ago

We have all these superpowered AI vibe coders, and yet open source projects still have vast backlogs of open issues.

Things that make you go "Hmmmmmm."

baq8mo ago

You have to pay a recurring subscription to access the worthwhile tools in a meaningful capacity. This goes directly against why retail users of open source software, some of whom are also developers of it, actually use it - and you can tell a lot of developers do it because they find coding fun.

It’s a very different discussion when you’re building a product to sell.

TiredOfLife8mo ago

The projects that have those backlogs dont allow ai made code

bakugo8mo ago

Actually providing examples of real tasks given to the AI and the subsequent results would break the illusion and give people opportunities to question the hype. Can't have that.

We'll just keep getting submission after submission talking about how amazing Claude Code is with zero real world examples.

vincent_builds8mo ago

Author here. It's fair enough. I didn't give real-world examples; that's partially down to what I typically work on. I usually work in brownfield backend logic in closed-source applications that don't showcase well.

Two recent production features:

1. *Quota crossing detection system* - Complex business logic for billing infrastructure - Detects when usage crosses configurable thresholds across multiple metric types - Time: 4 days parallel work vs ~10 days focused without AI

   The 3-attempt pattern was clear here:
   - Attempt 1: DB trigger approach - wouldn't scale for our requirements
   - Attempt 2: SQL detection but wrong interfaces, misunderstood counter vs gauge metrics
   - Attempt 3: Correct abstraction after explaining how values are stored and consumed

2. *Sentry monitoring wrapper for cron jobs* - Reusable component wrapping all cron jobs with monitoring - Time: 1 day parallel vs 2 days focused

Nothing glamorous, but they are real-world examples of changes I've deployed to production quicker because of Claude.

johnfn8mo ago

Really, zero real world examples? What about this?

https://news.ycombinator.com/item?id=44159166

willtemperley8mo ago

Yes exactly. Show us the code and we can evaluate the advice. Otherwise it’s just an advertisement.

dingnuts8mo ago

the kind of engineer who has been Salesified to the point that they write such drivel as "these learnings" instead of "lessons" in an article that allegedly has a technical audience.

it's funny because as I have gotten better as a dev I've gone backwards through his progression. when I was less experienced I relied on Google; now, just read the docs

juped8mo ago

Yeah, the trusty manual becomes #1 at around the same time as one starts actually engineering. You've entered the target audience!

1 more reply

asdev8mo ago

Guy said a whole lot of nothing. Said he's improved productivity, but also said AI falls short in all the common ways people have noticed. Also guarantee no one is building core functionality delegating to Claude Code.

aronowb148mo ago

Agreed. I think this Anthropic article is a realistic take on what’s possible (focus on prototyping)

https://www-cdn.anthropic.com/58284b19e702b49db9302d5b6f135a...

muzani8mo ago

This whole article is a really odd take. Maybe it's upvoted so much because it's from a "staff engineer". Most people are getting much better rates than 95% failure and almost nobody is spending over $1000 a month. If it was anyone else saying the same thing, they'd be laughed out of the room.

jpollock8mo ago

Avoiding the boilerplate is part of the job as a software developer.

Abstracting the boilerplate is how you make things easier for future you.

Giving it to an AI to generate just makes the boilerplate more of a problem when there's a change that needs to be made to _all_ the instances of it. Even worse if the boilerplate isn't consistent between copies in the codebase.

globular-toast8mo ago

Yeah. I'm increasingly starting to think this LLM stuff is simply the first time many programmers have been able to not write boilerplate. They didn't learn to build abstractions so essentially live on whatever platform someone else has built for them. AI is simply that new platform.

I'm lazy af. I have not been manually typing up boilerplate for the past 15 years. I use computers to do repetitive tasks. LLMs are good at some of them, but it's just another tool in the box for me. For some it seems like their first and only one.

What I can't understand is how people are ok with all that typing that you still have to do just going into /dev/null while only some translation of what you wrote ends up in the codebase. That one makes me even less likely to want to type. At least if I'm writing source code I know it's going into the repository directly.

skydhash8mo ago

The one thing I’m always suspicious about is the actual mastery (programming language and computer usage) involved. You never see anyon describe the context of what they’ve been doing pre-llm.

conradfr8mo ago

What's weird for me is that most frameworks and tools usually include generators for boilerplate code anyway so not sure why wasting tokens/money on that is valuable.

resonious8mo ago

Interesting that this guy uses AI for the initial implementation. I do the opposite. I always build the foundation. That way I know how things work fundamentally. Then I ask agents to do boilerplate tasks. They're really good at following suit, but very bad at architecture.

f311a8mo ago

Yeah, LLMs are pretty bad at planning maintainable architecture. They don’t refactor it when code is evolving and probably can’t do it due to context limitations.

meerab8mo ago

I have barely written any code since my switch to Claude Code! It's the best thing since sliced bread!

Here's what works for me:

- Detailed claude.md containing overall information about the project.

- Anytime Claude chooses a different route that's not my preferred route - ask my preference to be saved in global memory.

- Detailed planning documentation for each feature - Describe high-level functionality.

- As I develop the feature, add documentation with database schema, sample records, sample JSON responses, API endpoints used, test scripts.

- MCP, MCP, MCP! Playwright is a game changer

The more context you give upfront, the less back-and-forth you need. It's been absolutely transformative for my productivity.

Thank you Claude Code team!

f311a8mo ago

What you’re working on? In my industry it fails half of the time and comes up with absolute nonsense. The data just don’t exist for our problems, it can only work when you guide it and ask for a few functions at max.

ryukoposting8mo ago

This sounds like my experiences with it. I'm writing embedded firmware in C and Rust. I'd describe further, but Claude seems incompetent at all aspects of this space.

1 more reply

meerab8mo ago

I am working on VideoToBe.com - and my stack is NextJS, Postgresql and FastAPI.

Claude code is amazing at producing code for this stack. It does excellent job at outputting ffmpeg, curl commands, linux shell script etc.

I have written detailed project plan and feature plan in MarkDown - and Claude has no trouble understanding the instructions.

I am curious - what is your usecase?

1 more reply

ethanwillis8mo ago

Personally, I give Claude a fully specified program as my prompt so that it gives me back a working program 100% of the time.

Really simple workflow!

Zee28mo ago

Ah, I’ve tried that one, but I must be doing something wrong. I give it a fully specified working program, and often times it gives me back one that only works 50% of the time!

jazzyjackson8mo ago

Does Claude Code provide some kind of "global memory" the llm refers to, or is this just a request you make within the the llm's context window? Just curious hadn't heard the use of the term

EDIT: I see, you're asking Claude to modify claude.md to track your preference there, right?

https://docs.anthropic.com/en/docs/claude-code/memory

meerab8mo ago

Yes. /init will initialize the project and save initial project information and preference.

Ask Claude to update the preference and document the moment you realize that claude has deviated away from the path.

bobbylarrybobby8mo ago

What does the playwright MCP accomplish for you? Is it basically a way for Claude to play with your app in the browser without having to write playwright tests?

mierz008mo ago

How have you been using Playwright MCP?

albingroen8mo ago

So we’re supposed to start paying $1k-$1,5k on top of already crazy salaries just to maybe get a productivity boost on trivial to semi trivial issues? I know my boss would not be keen on that at least.

Jcampuzano28mo ago

If devs salaries are so crazy its quite the opposite. NOT investing 1-1.5k/mo to improve their productivity by a measurable amount would quite literally be just plain stupid and I would question your boss ability to think critically.

Not to mention - while I know many don't like it, they may be able to achieve enough of a productivity boost to not require hiring as many of those crazy salaried devs.

Its literally a no-brainer. Thinking about it from just the individual cost factor is too simplified a view.

151558mo ago

Hardware companies routinely license individual EDA tool seats that cost more than numerous developer salaries - $1k/year is nothing if it improves productivity by any measurable amount.

saulpw8mo ago

The OP was saying it's $1k/mo. That's a 5-10% raise, which is a bit more than nothing.

3 more replies

AnotherGoodName8mo ago

I can't use $20 of credit (gpt-5 thinking via intellij's pro AI subscription) a month right now with plenty of usage so I'm surprised at the $1k figure. Is Claude that much more expensive? (a quick Google suggests yes actually).

Having said the above some level of AI spending is the new reality. Your workplace pays for internet right? Probably a really expensive fast corporate grade connection? Well they now also need to pay for an AI subscription. That's just the current reality.

everforward8mo ago

I don't know what Intellij's AI integration is like, but my brief Claude Code experience is that it really chews through tokens. I think it's a combination of putting a lot of background info into the context, along with a lot of "planning" sort of queries that are fairly invisible to the end user but help with building that background for the ultimate query.

Aider felt similar when I tried it in architect mode; my prompt would be very short and then I'd chew through thousands of tokens while it planned and thought and found relevant code snippets and etc.

1 more reply

billllll8mo ago

Paying for Internet is not a great analogy imo. If you don't pay $1k/mo for Internet, you literally can't work.

What happens if you don't pay $1k/mo for Claude? Do you get an appreciable drop in productivity and output?

Genuinely asking.

EE84M3i8mo ago

Anthropic and OpenAI both have a high SSO/enterprise tier tax.

oblio8mo ago

The fast corporate internet connection is probably 1000$ for 100 developers or more...

albingroen8mo ago

And remember. This is on subsadised prices.

dajonker8mo ago

Exactly, makes it feel almost like an advertorial for Anthropic, who likely need most customers to pay 1000 bucks a month to break even.

sdesol8mo ago

It will certainly be interesting to see how businesses evolve in the upcoming years. What is written in stone is, you (employee) will be measured and I am curious to see what developers will be measured by in the future. Will you be at a greater risk of layoffs/lack of promotions/etc. if you spend more on AI? How do you as a developer prove that it is you and not the LLM that should be praised?

astrange8mo ago

The high salaries make productivity improvements even more important.

beefnugs8mo ago

If the world wasnt a garbage hole of mis-alignment and planning : The people seeing positivity out of this stuff would be demanding raises immediately, both AI experts and seniors should be demanding the company pay and train juniors as part of their loyalty commitment to the company

willtemperley8mo ago

Maybe I’m contrarian but I design and write most of my code and let LLMs do the reviews. Why?

First I know my problem space better than the LLM.

Second, the best way to express coding intention is with code. The models often have excellent suggestions on improvements I wouldn’t have thought of. I suspect the probability of providing a good answer has been increased significantly by narrowing the scope.

Another technique is to say “do this like <some good project> does it” but I suspect that might be close to copyright theft.

tkgally8mo ago

Anthropic just posted an interview with Boris Cherny, the creator of Claude Code. He also offers some ideas on how to use it.

“The future of agentic coding with Claude Code”

https://youtu.be/iF9iV4xponk

nikcub8mo ago

> budget for $1000-1500/month for a senior engineer going all-in on AI development.

Is this another case of someone using API keys and not knowing about the claude MAX plans? It's $100 or $200 a month, if you're not pure yolo brute-force vibe coding $100 plan works.

https://www.anthropic.com/max

vincent_builds8mo ago

Author here, quick clarification on pricing: the $1000-1500/month is for Teams/Enterprise with higher rate limits, not the consumer MAX plans. Consumer MAX ($200/month) works for lighter usage but hits limits quickly with parallel agents and large codebases.

For context: that's 1-2% of a senior engineer's fully loaded cost. The ROI is clear if it delivers even 10% productivity gain (we're seeing 2-3x on specific tasks).

You're right that many devs can start with MAX plans. The higher tier becomes necessary when running multiple parallel contexts and doing systematic exploration (the "3-attempt pattern" burns tokens fast).

I wouldn't be doing it if I didn't think it was value for money. I've always been a cost-conscious engineer who weighs cost/value, and with Claude, I am seeing the return.

imron8mo ago

> The ROI is clear if it delivers even 10% productivity gain

What if what feels like a productivity gain is actually a productivity loss?

https://mikelovesrobots.substack.com/p/wheres-the-shovelware...

(see link in the article to a study showing developers thought AI gave them a 20% gain in productivity, but measuring this showed they instead had a 20% loss)

reissbaker8mo ago

Yeah $1k-1.5k seems absurdly high. The $200/month 20x variant of the Max plan covers an insane amount of usage, and the rate limits reset every five hours. Hard to imagine needing it so badly that you're blowing through that rate limit multiple times a day, every day... And if you are, I think switching to per-token payment would probably cost a lot more than $1k.

rolls-reus8mo ago

The MAX plan is a consumer plan, it’s not available with Teams or Enterprise. They introduced a premium team plan ($150) with Claude code access but not sure how much usage that bundles.

kbuchanan8mo ago

For me, working mostly in Planning Mode skips much of the initial misfires, and often leads to correct outcomes for the first edit.

mierz008mo ago

Recently I’ve been taking a step back and getting ChatGPT 5 to ask me questions to create a spec.

I refine that spec and then give that to planning mode and then go from there.

I’ve found if I jump straight into planning mode I miss some critical aspects of what ever it is I am building.

RomanPushkin8mo ago

There is one thing I would highly recommend to anyone using Claude or any other agents: logging. I can't emphasize it more, if you have logging you can take the whole log file, dump it into AI, outline the problem and likely you're getting solution or would advance to the next step. Logging is everything.

makk8mo ago

I don’t understand the use of MCP described in the post.

Claude code can access pretty much all those third party services in the shell, using curl or gh and so on. And in at least one case using MCP can cause trouble: the linear MCP server truncates long issues, in my experience, whereas curling the API does not.

What am I missing?

musbemus8mo ago

You're exactly right. To be honest, in pretty much every case I've seen, indicating usage of a read-only resource directly in the prompt always outperforms using the MCP for it. Should really only be using MCP if you need MCP-specific functionality imo (elicitation, sampling)

jbs7898mo ago

I often find that Claude introduces a level of complexity that is not necessary in my cases. I suspect this is a function of the training data (large repos or novel solutions). That said, I do sometimes find inspiration for new techniques in its answers.

I just haven't heard others express the same over-engineering problem and wonder if this is a general observation or only shows up b/c my requests are quite simple.

(I have found that prompting it for the simplest or most efficient solution seems to help - sometimes taking 20+ lines down to 2-3, often more understandable.)

P.S. I tend to work with data and a web app for processes related to a small business, while not a formally trained developer.

chamomeal8mo ago

Seems like LLMs really suffer from the "eh I'll just write it myself" mindset. Yesterday on a react app using react-query (library to manage caching and re-fetching of data) claude code wanted to update the cache manually, instead of just using a bit of state that was already in scope in the exact same component!

For me, stuff like that is the same weird uncanny valley that you used to see in AI text, and see now in AI video. It just does such inhuman things. A senior developer would NEVER think to manually mutate the cache, because it's such desperate hack. A junior dev wouldn't even realize it's an option.

namesbc8mo ago

Spending $1500 per-month is a crazy wasteful amount of money

the_hoffa8mo ago

That's 18k a year, or about equal or cheaper than "outsourcing", minus the tax and legal ramifications.

I agree it's wasteful, but from a long-form view of what spending looks like (or at least should/used to look like). Those who see 1.5k/month as "saving" money typically only care about next quarter.

As the old adage goes: a thousand dollars saved this month is 100 thousand spent next year.

BobbyTables28mo ago

The author will be in upper management before they know it!

axus8mo ago

I like his point about more objectivity and zero ego. You don't have to worry about hurting an AI's feelings or your own when you throw away code.

awesome_dude8mo ago

But I still find myself needing (strongly) to let Claude know when it's made a breakthrough that would have been hard work on my own.

CharlesW8mo ago

Good creators tend to treat their tools with respect, and I can't imagine any reason we shouldn't feel gratitude toward our tools after a particularly satisfying session.

Also, there may be selfish reasons to do this as well: (1) "Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance" https://arxiv.org/abs/2402.14531 (2) "Three Things to Know About Prompting LLMs" https://sloanreview.mit.edu/article/three-things-to-know-abo...

groby_b8mo ago

Curious: Do you also laud your compiler for particularly good optimizations?

1 more reply

nzach8mo ago

One thing that I haven't seen a lot of people talk about is the relatively new model config "Opus Plan Mode: Use Opus 4.1 in plan mode, Sonnet 4 otherwise".

In my opinion this should be the default config. Increasing the quality of the plans gives you a much better experience using Claude Code.

pseudosavant8mo ago

This has been my experience too. I’m just not quite as far along as the author.

Detachment from the code has been excellent for me. Just started a v2 rewrite of something I’d never had done in the past. Mostly because it would have taken me too much time to try it out if I wrote it all by hand.

jedberg8mo ago

I'd like to share my journey with Claude (not code).

I fed Claude a copy of everything I've ever written on Hacker News. Then I asked it to generate an essay that sounds like me.

Out of five paragraphs I had to change one sentence. Everything else sounded exactly as I would have written it.

It was scary good.

into_ruin8mo ago

I'm doing a project in a codebase I'm not familiar with in a language I don't really know, and Claude Code has been amazing at _explaining_ thing to me. "Who calls this function," "how is this generated," etc. etc.

I'm not comfortable using it to generate code for this project, but I can absolutely see using it to generate code for a project I'm familiar with in a language I know well.

keeda8mo ago

Reid Hoffman, LinkedIn co-founder, has gone whole hog on that idea and has a literal AI clone of himself, trained on all his writings, videos and audio interviews -- complete with AI-generated deep-fake visuals and cloned voice:

https://www.linkedin.com/posts/reidhoffman_can-talking-with-...

I've watched a handful of videos with this "digital twin", and I don't know how much post-processing has gone into them, but it is scary accurate. And this was a year+ ago.

nh43215rgb8mo ago

$1000-1500/month for ai paid by employer... that's quite nice. I wonder how much would it cost to run couple of claude code instance to run 24/7 indefinitely. If company's got resources they might as well try that against their issues.

block_dagger8mo ago

The author doesn't make it clear why they switched from Cursor to Claude. Curious about what they can do with Claude that can't be done with Cursor. I use both a lot and find Cursor to be superior for the very large codebases I work in.

reissbaker8mo ago

Pretty much everyone I talk to prefers the opposite, and feels like Claude performs best inside the Claude Code harness and not the Cursor one. But I suppose different strokes for different folks...

Personally I'm a Neovim addict, so you can pry TUIs out of my cold dead hands (although I recognize that's not a preference everyone shares). I'm also not purely vibecoding; I just use it to speed up annoying tasks, especially UI work.

meerab8mo ago

Personal opinion:

Claude code is more user friendly than cursor with its CLI like interface. The file modifications are easy to view and it automatically runs psql, cd, ls , grep command. Output of the commands is shown in more user friendly fashion. Agents and MCPs are easy to organized and used.

block_dagger8mo ago

I feel just the opposite. I think Cursor's output is actually in the realm of "beautiful." It's well formatted and shows the user snippets of code and reasoning that helps the user learn. Claude is stuck in a terminal window, so reduced to monospaced bullet lines. Its verbose mode spits out lines of file listings and other context irrelevant to the user.

RomanPushkin8mo ago

It's easy: Cursor are resellers, they optimize your token usage, so they can make a profit. Claude is the final point, and they offer tokens for the cheapest price possible.

block_dagger8mo ago

I use Cursor in MAX mode because my employer pays for the tokens. I probably should have mentioned that in my OP. It makes a huge difference.

1 more reply

syspec8mo ago

Does this work for others when working in other domains? When creating a Swift application, I can't imagine creating 20 agents and letting them go to town. Same for the backend of such an application if it's in say, Java+Springboot

fragmede8mo ago

I've been using Claude with Swift for macOS and iOS apps. What problems do you forsee where you don't think it would work out to create 20 agents for Swift?

dakiol8mo ago

To all the engineers using claude code: how do you submit your (well, claude’s) to review? Say, you have a big feature/epic to implement. Typically (pre-ai) times you would split it in chunks and submit each chunk as PR to be reviewed. You don’t want to submit dozens of file changes because nobody would review it. Now with llms, one can easily explain the whole feature to the machine and they would output the whole code just fine. What do you do? You divide it manually for review submission? One chunk after another?

It’s way easier to let the agent code the whole thing if your prompt is good enough than to give instructions bit by bit only because your colleagues cannot review a PR with 50 file changes.

athrowaway3z8mo ago

Practically - you can commit it all after you're done and then tell it to tease apart the commit into multiple well documented logical steps.

"Ask the LLM" is a good enough solution to an absurd number of situations. Being open to questioning your approach - or even asking the LLM (with the right context) to question your approach has been valuable in my experience.

But from a more general POV, its something we'll have to spend the next decade figuring out. 'Agile'/scrum & friends is a sort of industry-wide standard approach, and all of that should be rethought - once a bit of the dust settles.

We're so early in the change that I haven't even seen anybody get it wrong, let alone right.

yodsanklai8mo ago

I split my diffs like I've always did so they can be reviewed by a human (or even an AI which won't understand 50 file changes).

The 50 file changes is most likely unsafe to deploy and unmaintainable.

Yoric8mo ago

I regularly write big MRs, then cut them into 5+ (sometimes 10+) smaller MRs. What does Claude Code change here?

dakiol8mo ago

The split seems artificial now. Before, an average engineer would produce code sequentially, chunk after chunk. Each chunk submitted only after the previous one was reviewed and approved. Today, one could submit the whole thing for review. Also, if machines can write it, why not let machines review it too? Seems weird not to do so.

2 more replies

edverma28mo ago

I built a tool to split up a single PR into multiple nice commits: https://github.com/edverma/git-smart-squash

bongodongobob8mo ago

Do whatever you want. Tell it to make different patches in chunks if you want. It'll do what you tell it to do.

josefrichter8mo ago

I'm almost sure that we all ended up at the same set of rules and steps how to get the best out of Claude - mine are almost identical, others' I know as well :-)

alessandru8mo ago

did this guy read that other paper about ai usage making people stupid?

how long until he falls from staff engineer back down to senior or something less?

BhavdeepSethi8mo ago

It's funny that the MIT paper HN is trending higher than this post, so it propped up before I read this article.

rester3248mo ago

> If I were to give advice from an engineer's perspective, if you're a technical leader considering AI adoption: >> Let your engineers adopt and test different AI solutions: AI-assisted coding is a skill that you have to practice to learn.

I am sorry, but this is so out of touch with reality. Maybe in the US most companies are willing to allocate you 1000 or 1500 USD/month/engineer, but I am sure that in many countries outside of the US not even a single line (or other type of) manager will allocate you such a budget.

I know for a fact that in countries like Japan you even need to present your arguments for a pizza party :D So that's all you need to know about AI adoption and what's driving it

bongodongobob8mo ago

Depends on the culture. I worked at a place that did $100 million in sales a year and if the cost was less than $5k for something we needed, management said just fuckin do it, don't even ask. I also worked at a place that did $2 billion a year and they required multi-level approval for MS project pro licenses. All depends.

Edit: Why is this downvoted? Different corp cultures have different ideas about what is worthwhile. Some places value innovation and experimentation and some places don't.

LtWorf8mo ago

I love how you are getting downvoted, probably by people who have never set foot outside the USA.

lordnacho8mo ago

I'm using Claude all the time now. It works, and I'm amazed it worked so easily for me. Here's what it looks like:

1) Summarize what I think my project currently does

2) Summarize what I think it should do

3) Give a couple of hints about how to do it

4) Watch it iterate a write-compile-test loop until it thinks it's ready

I haven't added any files or instructions anywhere, I just do that loop above. I know of people who put their Claude in YOLO mode on multiple sessions, but for the moment I'm just sitting there watching it.

Example:

"So at the moment, we're connecting to a websocket and subscribing to data, and it works fine, all the parsing tests are working, all good. But I want to connect over multiple sockets and just take whichever one receives the message first, and discard subsequent copies. Maybe you need a module that remembers what sequence number it has seen?"

Claude will then praise my insightful guidance and start making edits.

At some point, it will do something silly, and I will say:

"Why are you doing this with a bunch of Arc<RwLock> things? Let's share state by sharing messages!"

Claude will then apologize profusely and give reasons why I'm so wise, and then build the module in an async way.

I just keep an eye on what it tries, and it's completely changed how I code. For instance, I don't need to be fully concentrated anymore. I can be sitting in a meeting while I tell Claude what to do. Or I can be close to falling asleep, but still be productive.

abraxas8mo ago

I tried to follow the same pattern on a backend project written in Python/FastAPI and this has been mostly a heartache. It gets kind of close but then it seems to periodically go off the rails, lose its mind and write utter shit. Like braindead code that has no chance of working.

I don't know if this is a question of the language or what but I just have no good luck with its consistency. And I did invest time into defining various CLAUDE.md files. To no avail.

ryandrake8mo ago

What I find helpful in a large project is whenever Claude goes way off the rails, I correct it, and then tell it to update CLAUDE.md with instructions in its own words how to not do it again in the future. It doesn't stop the initial hallucinations and brainfarts, but it seems to be making the tool slowly better as it adds context for itself.

lordnacho8mo ago

Has this got anything to do with using a stronger typed language? I've heard that reported, not sure whether it's true since my python scripts tend to be short.

Does it end in a forever loop for you? I used to have this problem with other models.

1 more reply

wg08mo ago

I can second that. Even on plain CRUD with little to no domain logic.

pastage8mo ago

That is 150MWh per month in AI for a staff engineer. If we are doing a straight dollar to kWh conversion, plus/minus an order of magnitude.

cjonas8mo ago

Once thing I've noticed is the difference in code quality by language. I'm constantly disappointed by the output of python code. I have to correct it to follow even the most basic software development principles (DRY, etc).

Typescript on the other hand, seems to do much better on first pass. Still not always beautiful code, but much more application ready.

My hypothesis is that this is due to the billions LOC of Jupyter Notebook it was probably trained on :/

rcfox8mo ago

With Typescript, I find it pretty eager to just try `(foo as any).bar` when it gets the initial typing wrong. It also likes to redefine types in every file they're used instead of importing.

It will fix those if you catch them, but I haven't been able to figure out a prompt that prevents this in the first place.

__mharrison__8mo ago

There's a LOT of bad/newbie Python code floating around. I find that if I'm specific, it does a good job. (I'm also passing in my code/notebooks as context, so one would assume that it is attempting to mirror my style.)

furyofantares8mo ago

I've come around on something like this. I start by putting a little effort into a prompt and into providing context, but not a ton - and see where Claude Code gets with it. It might even get what I asked for working in terms of features, but it's garbage code. This is a vibe session, not caring about the code at all, or hardly at all.

I notice what worked and what didn't, what was good and what was garbage -- and also how my own opinion of what should be done changed. I have Claude Code help me update the initial prompt, help me update what should have been in the initial context, maybe add some of the bits that looked good to the initial context as well, and then write it all to a file.

Then I revert everything else and start with a totally blank context, except that file. In this session I care about the code, I review it, I am vigilant to not let any slop through. I've been trying for the second session to be the one that's gonna work -- but I'm open to another round or two of this iteration.

soperj8mo ago

and do you find this takes longer or shorter than just doing it yourself from scratch?

shinecantbeseen8mo ago

I’m with you. Sometimes it really just feels like we’re just tacking on the cognitive load of managing the drunk senior in addition to the problem of hand instead of just dealing with the problem at hand.

1 more reply

furyofantares8mo ago

Quite a bit shorter. Plus I can do the a good chunk of the work (first iteration) in contexts where I couldn't before, where I require less focus, and it uses less of my energy.

I think I can also end up with a better result, and having learned more myself. It's just better in a whole host of directions all at once.

I don't end up intimately familiar with the solution however. Which I think is still a major cost.

bongodongobob8mo ago

Not OP, I don't care if it's the same amount of time because I can do it drunk/while doing other things. Not sure why how long does it take is the be all end all for some people.

xentronium8mo ago

> The shift to Claude Code? That took just hours of use for me to become productive.

> This isn't failure; it's the process!

> The biggest challenge? AI can't retain learning between sessions

ai slop

drudolph9148mo ago

to throw my hat into the ring, I am in no way shy about using the AI tooling and I like using it, but I am happy we're finally seeing people talk about AI that matches with my personal reality with the tools.

for the record, I've been bullish on the tooling from the beginning

My dev-tooling AI journey has been chatGPT -> vscode + copilot -> early cursor adopter -> early claude + cursor adopter -> cursor agent with claude -> and now claude code

I've also spent a lot of time trying out self-hosted LLMs such as couple version of Qwen coder 2.5/3 32B, as well as deepseek 30B - and talking to them through the vscode continue.dev extension

My personal feelings are that the AI coding/tooling industry has seen a major plateau in usefulness as soon as agents became apart of the tooling. The reality is coding is a highly precise task, and LLMs down to the very core of the model architecture are not precise in the way coding needs them to be. and it's not that I don't think we won't one day see coding agents, but I think it will take a deep and complete bottom up kind of change and an possibly an entirely new model architecture to get us to what people imagine a coding agent is

I've accepted to just use claude w/ cursor and to be done with experimenting. the agent tooling just slows my engineering team down

I think the worst part about this dev tooling space is the comment sections on these kinds of articles is completely useless. it's either AI hype bots just saying non-sense, or the most mid an obvious takes that you here everywhere else. I've genuinely have become frustrated with all this vague advice and how the AI dev community talks about this domain space. there is no science, data, or reason as to why these things fail or how to improve it

I think anyone who tries to take this domain space seriously knows that there's limit to all this tooling, we're probably not going to see anything group breaking for a while, and there doesn't exist a person, outside the AI researchers at the the big AI companies, that could tell ya how to actually improve the performance of a coding agent

I think that famous vibe-code reddit post said it best

"what's the point of using these tools if I still need a software engineer to actually build it when I'm done prototyping"

sigmonsays8mo ago

every god damn time AI hallucinates a solution that is not real (in ChatGPT)

I havn't put a huge effort into learning to write prompts but in short, it seems easier to write the code myself than determine prompts. If you don't know every detail ahead of time and ask a slightly off question, the entire result will be garbage.

j / k navigate · click thread line to collapse

396 comments

spicyusername8mo ago

I guess we're just going to be in the age of this conversation topic until everyone gets tired of talking about it.

Every one of these discussions boils down to the following:

- LLMs are not good at writing code on their own unless it's extremely simple or boilerplate

- LLMs can be good at helping you debug existing code

- LLMs can be good at brainstorming solutions to new problems

- The code that is written by LLMs always needs to be heavily monitored for correctness, style, and design, and then typically edited down, often to at least half its original size

- LLMs utility is high enough that it is now going to be a standard tool in the toolbox of every software engineer, but it is definitely not replacing anyone at current capability.

- New software engineers are going to suffer the most because they know how to edit the responses the least, but this was true when they wrote their own code with stack overflow.

rafaelmn8mo ago

> but this was true when they wrote their own code with stack overflow.

fhd28mo ago

Like with any kind of learning, without a feedback loop (as tight as possible IMHO), it's not gonna happen. And there is always some kind of feedback loop.

Ultra short cycle: Pairing with a senior, solid manual and automated testing during development.

Reasonably short cycle: Code review by a senior within hours and for small subsets of the work ideally, QA testing by a seperate person within hours.

Borderline too long cycle: Code review of larger chunks of code by a senior with days of delay, QA testing by a seperate person days or weeks after implementation.

Terminally long feedback cycle: Critical bug in production, data loss, negative career consequences.

1 more reply

OvbiousError8mo ago

So much this. I see a 1000 lines super complicated PR that was whipped up in less than a day and I know they didn't read all of it, let alone understand.

1 more reply

dchftcs8mo ago

godelski8mo ago

  > you learned less about everything surrounding it.

1 more reply

chamomeal8mo ago

It's exhausting to hear about AI all the time but it's fun to watch history happen. In a decade we'll look back at all these convos and remember how wild of a time it was to be a programmer.

coldpie8mo ago

I'm thiiiis close to writing a Firefox extension that auto-hides any HN headline with an LLM/AI-related keyword in the title, just so I can find something interesting on here again.

4 more replies

red-iron-pine8mo ago

> Which is fine lol but I'm tired of seeing the exact same conversations.

makes me think the bots are providing these conversations

1 more reply

theshrike798mo ago

It's fun to see Vibe Coders discover the basics of project management in real time without ever reading a book :)

Exactly what you use to manage a team of actual human programmers.

automatic61318mo ago

>- LLMs utility is high enough that it is now going to be a standard tool in the toolbox of every software engineer, but it is definitely not replacing anyone at current capability.

This is a collosal malinvestment.

utyop228mo ago

Yeah eventually reality and fantasy have to converge.

Nobody knows when. But it will. TBH the biggest danger is that all the hopes and dreams aren't materialised and the appetite for high-risk investments dissipates.

We've had this period in which you can be money losing and its OK. But I believe we have passed the peak on that - and this is destined to blow up.

dawnerd8mo ago

It’s wasted so much time trying to make it write actual production quality code. The consistency and over-verbose nature kill it for me.

sunir8mo ago

All true if you one shot the code.

If you have a sophisticated agent system that uses multiple forward and backward passes, the quality improves tremendously.

The net is that unsupervised AI engineering isn’t really cheaper better or faster than human engineering right now. Does that mean in two years it will be? Possibly.

There will be a lot of optimizations in the message traffic, token uses, foundational models, and also just the Moore’s law of the hardware and energy costs.

I also gave it the SelfDocumentingCode pattern language that I wrote (on WikiWikiWeb) as a code review agent and quality improved tremendously again.

theshrike798mo ago

Currently it's just VC funded. The $20 packages they're selling are in no way cost-effective (for them).

That's why I'm driving all available models like I stole them, building every tool I can think of before they start charging actual money again.

By then local models will most likely be at a "good enough" level especially when combined with MCPs and tool use so I don't need to pay per token for APIs except for special cases.

1 more reply

zarzavat8mo ago

> If you have a sophisticated agent system that uses multiple forward and backward passes, the quality improves tremendously.

Just an hour ago I asked Claude to find bugs in a function and it found 1 real bug and 6 hallucinated bugs.

One of the "bugs" it wanted to "fix" was to revert a change that I had made previously to fix a bug in code it had written.

Don't get me wrong, Claude is a fantastic productivity boost but letting it run around unsupervised would slow me down rather than speed me up.

oblio8mo ago

> and also just the Moore’s law of the hardware and energy costs.

What Moore's law?

MontyCarloHall8mo ago

It's almost as if you could recapitulate each of these discussions using an LLM!

theshrike798mo ago

> - The code that is written by LLMs always needs to be heavily monitored for correctness, style, and design, and then typically edited down, often to at least half its original size

But LLM code has a habit of being very verbose and covers every situation no matter how minuscule.

This is especially grating when you're doing a simple project for local use and it's bootstrapping something that's enterprise-ready :D

WorldMaker8mo ago

If you force the LLM to solve every test failure this also can lead to the same breakdown models as very junior developers coding to the tests rather than the problems, I've seen all of:

1) I broke the tests, guess I should delete them.

2) I broke the tests, guess the code I wrote was wrong, guess I should delete all of that code I wrote.

That last one can be particularly "fun" because already verbose LLM code skyrockets into baroque million line PRs when left truly unsupervised, and that PR still won't build or pass tests.

There's no true understanding by an LLM. Forcing it to lint/build can be important/useful, but still not a cure-all, and leads to such fun even more degenerate cases than hand-holding it.

lordgrenville8mo ago

Yes! Reminds me of one of my all-time favourite HN comments https://news.ycombinator.com/item?id=23003595

godelski8mo ago

specialist8mo ago

> ...until everyone gets tired of talking about [LLMs]

Small price to pay for shuffling Agile Manifesto off the stage.

dboreham8mo ago

My experience with the latest Claude Code has been: it's not nearly as bad as you say.

swframe28mo ago

Preventing garbage just requires that you take into account the cognitive limits of the agent. For example ...

1) Don't ask for large / complex change. Ask for a plan but ask it to implement the plan in small steps and ask the model to test each step before starting the next.

2) For really complex steps, ask the model to write code to visualize the problem and solution.

I've seen other people blog about their tricks and tips. I do still see garbage results but not as high as 95%.

rco87868mo ago

enobrev8mo ago

> it doesn't seem to be making me any more efficient

That's been my experience.

2 more replies

jaggederest8mo ago

4 more replies

nostrademons8mo ago

hex4def68mo ago

FYI: You can force "Plan mode" by pressing shift-tab. That will prevent it from eagerly implementing stuff.

1 more reply

yahoozoo8mo ago

2 more replies

plaguuuuuu8mo ago

I've been using a few LLMs/agents for a while and I still struggle with getting useful output from it.

Mars0088mo ago

> What I really want is something like a front-end framework but for LLM prompting

dontlaugh8mo ago

At that point, why not just write the code yourself?

lucasyvas8mo ago

I reached this conclusion pretty quickly. With all the hand holding I can write it faster - and it’s not bragging, almost anyone experienced here could do the same.

Writing the code is the fast and easy part once you know what you want to do. I use AI as a rubber duck to shorten that cycle, then write it myself.

3 more replies

kyleee8mo ago

1 more reply

harrall8mo ago

I don’t do much of the deep prompting stuff but I find AI can write some code faster than I can and accurately most of the time. You just need to learn what those things are.

1 more reply

utyop228mo ago

I'm finding what's happening right now kinda bizarre.

The funny thing is - we need less. Less of everything. But an up-tick in quality.

But hey, good luck with that - ones thinking power is diminished overtime by interacing with LLMs etc.

1 more reply

MangoCoffee8mo ago

yodsanklai8mo ago

Actually, all good engineering principles which reduce cognitive load for humans work for AI as well.

2 more replies

alexsmirnov8mo ago

The best way is to create tests yourself, and block any attempts to modify them

MarkMarine8mo ago

1 more reply

jason_zig8mo ago

noosphr8mo ago

The people who build the models don't understand how to use the models. It's like asking people who design CPUs to build data-centers.

I've interviewed with three tier one AI labs and _no-one_ I talked to had any idea where the business value of their models came in.

2 more replies

nostrademons8mo ago

tombot8mo ago

Claude Code at least now lets you use its best model for planning mode and its cheapest model for coding mode.

1 more reply

MikeTheGreat8mo ago

Genuine question: What do you mean by " ask it to implement the plan in small steps"?

One option is to write "Please implement this change in small steps?" more-or-less exactly

I'm sure there's other options, too.

Benjammer8mo ago

My method is that I work together with the LLM to figure out the step-by-step plan.

1 more reply

conception8mo ago

ants_everywhere8mo ago

What I do is a step is roughly a reviewable commit.

Then I clear the context window and spawn workers to do the implementation.

com2kid8mo ago

> 1) Don't ask for large / complex change. Ask for a plan but ask it to implement the plan in small steps and ask the model to test each step before starting the next.

I asked Claude Code to read a variable from a .env file.

It proceeded to write a .env parser from scratch.

I then asked it to just use Node's built in .env file parsing....

This was the 2nd time in the same session that it wrote a .env file parser from scratch. :/

Claude Code is amazing, but it'll goes off and does stupid even for simple requests.

NitpickLawyer8mo ago

Check your settings, they might be unable to read .env files as a guardrail.

1 more reply

theshrike798mo ago

It doesn't say no.

For me it built a full-ass YAML parser when it couldn't use Viper to parse the configuration correctly :)

ants_everywhere8mo ago

IMO by far the best improvement would be to make it easier for the agent to force the agent to use a success criterion.

adastra228mo ago

CuriouslyC8mo ago

1 more reply

paulcole8mo ago

> Ask for a plan but ask it to implement the plan in small steps and ask the model to test each step before starting the next.

Tried this on a developer I worked with once and he just scoffed at me and pushed to prod on a Friday.

NitpickLawyer8mo ago

> scoffed at me and pushed to prod on a Friday.

that's the --yolo flag in cc :D

rvnx8mo ago

Your tips are perfect.

Most users will just give a vague tasks like: "write a clone of Steam" or "create a rocket" and then they blame Claude Code.

If you want AI to code for you, you have to decompose your problem like a product owner would do. You can get helped by AI as well, but you should have a plan and specifications.

Once your plan is ready, you have to decompose the problem into different modules, then make sure each modules are tested.

The issue is often with the user, not the tool, as they have to learn how to use the tool first.

wordofx8mo ago

> Most users will just give a vague tasks like: "write a clone of Steam" or "create a rocket" and then they blame Claude Code.

4 more replies

ccorcos8mo ago

biggc8mo ago

Thin sounds a lot like making a change yourself.

therein8mo ago

It appeals to some people because they'd rather manage a bot and get it to do something they told it to do rather than do it themselves.

rmonvfer8mo ago

I’d like to add: keep some kind of development documentation where you describe in detail the patterns and architecture of your application and it’s components.

Just my 2 cents.

salty_frog8mo ago

This is my algorithm for wetware llms.

whateveracct8mo ago

that sounds like just coding it yourself with extra steps

baq8mo ago

Exactly, then you launch ten copies of yourself and write code to manage that yourself, maybe.

renegat0x08mo ago

Huh, I thought that AI was made to be magic. Click and it generates code. Turns out it is like magic, but you are an apprentice, and still have to learn how to wield it.

dotancohen8mo ago

All sufficiently advanced technology...

rhubarbtree8mo ago

I want the code to have subsequently been deployed in production and demonstrably robust, without additional work outside of the livestream.

The livestream should include code review, test creation, testing, PR creation.

It should not be on a greenfield project, because nearly all coding is not.

There are lots of incentives for marketing at the grassroots level. I am totally open to changing my mind but I need evidence.

M4v3R8mo ago

Mind you I've never wrote a non-trivial game before in my life. It would take me weeks to do this on my own without any AI assistance.

Right now I'm working on a 3d world map editor for Final Fantasy VII that was also almost exclusively vibe-coded. It's almost finished and I plan a write up and a video about it when I'm done.

hvb28mo ago

> What matters is I manage to finish projects that I would not otherwise if not for the AI coding tools, so having them is a huge win for me.

I think the problem is in your definition of finishing a project.

1 more reply

sksrbWgbfK8mo ago

Unless you write tower defense games all day long for a living, I don't know how it's interesting.

rhubarbtree8mo ago

> Now of course you've made so many qualifiers in your post that you'll probably dismiss this as "not production", "not robust enough", "not clean" etc

Sure, my interest is whether it’s suitable for production use on an existing codebase, ie for what constitutes most of software engineering.

But - thanks for sharing, I will take a look and watch some of the stream.

infamousclyde8mo ago

I think he had a positive experience overall, but it was clear throughout the stream that he was not yielding control to a pure-agent workflow soon.

https://youtu.be/eZ7DVHAK8hw?si=vWW4kz2qiRRceNMQ

toth8mo ago

I think you shared the wrong link. Based on a quick youtube search I think you meant this one

https://youtu.be/EL7Au1tzNxE

rhubarbtree8mo ago

Thanks! This looks great (correct link in the child comment).

ochronus8mo ago

And your starry-eyed CEO is asking the same old question: How come everything takes so long when a 2-person team over two days was able to produce a shiny new thing?!. sigh

Could be used for early prototyping, though, before you hire your first engineers just to fire them 6 months later.

jf228mo ago

Yeah but you get the two days of hacking in 15 minutes.

And I highly doubt you spend months, as in 5+ weeks at the least making it production ready.

What even is "production readiness?" 100% fully unit tested and ready for planetary hyper scale or something? 95% of the human generated software I work on is awful but somehow makes people money.

1 more reply

coffeeri8mo ago

This video [0] is relevant, though it actually supports your point - it shows Claude Code struggling with non-trivial tasks and needing significant hand-holding.

I suspect videos meeting your criteria are rare because most AI coding demos either cherry-pick simple problems or skip the messy reality of maintaining real codebases.

[0] https://www.youtube.com/watch?v=EL7Au1tzNxE

thecupisblue8mo ago

Great video! Even more, shows a few things - how good it is with such a niche language but also exposes some direct flaws.

Second, there was a lot of time/token waste messing around with git and git messages. Few tips I noticed that could help you in the workflow:

#1: Add a subagent for git that knows your style, so you don't poison direct claude context and spend less tokens/time fighting it.

#2: Claude has hooks, if your favorite language has a formatter like rust fmt, just use hooks to run rust fmt and similar.

#3: Limit what they test, as most LLM models tend to write overeager tests, including testing if "the field you set as null is null", wasting tokens.

2 more replies

Aeolun8mo ago

> I suspect videos meeting your criteria are rare because most AI coding demos either cherry-pick simple problems or skip the messy reality of maintaining real codebases.

Or we’re just having too much fun making stuff to make videos to convince people that are never going to be convinced.

1 more reply

simonw8mo ago

ku1ik8mo ago

I watched one of those videos and it was very underwhelming, imho not really selling Claude Code to anyone who isn’t convinced.

1 more reply

rhubarbtree8mo ago

I’d say they’re a good match, thanks for sharing! Will watch in detail.

MontyCarloHall8mo ago

stitched2gethr8mo ago

This contains some specific data with pretty graphs: https://youtu.be/tbDDYKRFjhk?t=623

2 more replies

rhubarbtree8mo ago

Yes, this would be really useful.

AI coding should be transforming OSS, and we should be able to get a rough idea of the scale of the speed up in development. It’s an ideal application area.

brookst8mo ago

So what percentage of human programmers, in the entire world, do you think contribute to meaningful projects like those?

1 more reply

stared8mo ago

I wouldn't dive for these. Vibe coding is a slot machine - sometimes you get wonderful results on the first prompt, more than often - not. So, a cherry-picked example is not a proof it works.

If you want me to show an example of vibe coding, I bet I can migrate someone's blog to Astro with Claude Code faster than a frontend engineer.

> It should not be on a greenfield project, because nearly all coding is not.

Well, Claude Code does not work the best for existing projects. (With some exceptions.)

sirpalee8mo ago

apercu8mo ago

Just reinforces my biases that LLMs are currently garbage for anything new and complicated. But they are a great interactive note taker and brainstorming tool.

vincent_builds8mo ago

Author here. I think that's a great idea.

Although I cannot show the live stream or the code, I am writing and deploying production code for a brownfield project.

Two recent production features:

2. Sentry monitoring wrapper for metering cron jobs - Reusable component wrapping all cron jobs with Sentry monitoring capabilities - Time: 1 day parallelled with other tasks vs 2 days focused

As you can probably tell, my work is not glamorous :D. It's all the head-scratching backend work, extending the existing system with more capabilities or to make it more robust.

mathieuh8mo ago

I actually don't think I've ever had AI solve a non-trivial problem by itself. I do find it useful but I always have to give it the breakthrough which it can then implement.

dewey8mo ago

One of these things where you just have to put in the work yourself for a while and see how it works for your workflow and project.

rhubarbtree8mo ago

That’s unusual though? I think programming languages, idioms, features - for example - are adopted by consensus, not by every programmer starting out from scratch and evaluating each one.

1 more reply

boesboes8mo ago

sunnyam8mo ago

I have the same opinion, but my worry with this attitude is that it's going to hold me back in the long run.

Would these principles not apply to regular developers as well? I suspect that most of my disappointment with these tools is that I haven't spend enough time learning how to use them correctly.

I don't want someone to just come and eat my cake because they've figured out how to make themselves productive with it.

1 more reply

adastral8mo ago

I personally have not watched much, but it sounds just like what you are looking for!

[0] https://www.youtube.com/watch?v=3MleDtXZUlM

rhubarbtree8mo ago

Yes, good enough for me, thanks. I look forward to watching it. This is particularly interesting as it’s a group stream.

sunir8mo ago

I’ve built an agent system to quality control the output following my engineering know how.

The difference is I run my own business so that matters to me more than my value or aptitude as an engineer.

sdeframond8mo ago

I think not.

The reason is about missing context. Such non-trivial problems have a lot of specific unwritten context. It takes a lot of effort to share that context. Often more than doing anything one self.

lysecret8mo ago

https://news.ycombinator.com/item?id=44159166

Kiro8mo ago

Very few people want to record themselves doing stuff or have an incentive to convince anyone except for winning internet arguments.

nosianu8mo ago

> Very few people .... have an incentive to convince anyone

We are already only talking about the subset the writes AI blog posts, not about all of humanity.

rhubarbtree8mo ago

Streaming coding is a popular and widespread activity, and it is usually nothing to do with “convincing” folks.

1 more reply

benterix8mo ago

I guess someone could make such a video, the question is, would anyone have the patience to watch it.

ochronus8mo ago

<fun> Just found the video you're looking for! https://www.youtube.com/watch?v=JeNS1ZNHQs8 </fun>

sneak8mo ago

Most of the code I write is greenfield projects. I’m pretty spoiled, I guess. Claude Code has helped me ship a lot of things I always wanted to build but didn’t have time to do.

wooque8mo ago

You got it wrong, the purpose of this blog post is not marketing Claude Code, but marketing their company. Writing about AI just happens to get more eyeballs.

coverj8mo ago

Are there even non-AI development streams/videos that meet this criteria?

brookst8mo ago

You’re coming at this from a highly biased and even angry position, which means I don’t think you’ll be satisfied with anything people can show you.

AI is already providing massive leverage to both amateur and professional developers. They use the tools differently (in my world the serious developers mostly use it for boilerplate and tests).

ale8mo ago

mindwok8mo ago

I've been building commercial codebases with Claude for the last few months and almost all of my input is on taste and what defines success. The code itself is basically disposable.

all28mo ago

> The code itself is basically disposable.

2 more replies

globular-toast8mo ago

If the code is disposable then where do all the rules and reasoning etc live apart from in your head?

1 more reply

baq8mo ago

> The code itself is basically disposable.

This is key. We’re in mass production of software era. It’s easier and cheaper to replace a broken thing/part than to fix it, things being some units of code.

sanitycheck8mo ago

Eh, Claude is like a magical spaniel that can read and write very quickly, with early-stage alzheimers, on amphetamines.

I'm on the fence regarding overall utility but $20/month could almost be worth it for a tool that can add a ton of debug logging in seconds, some months.

vincent_builds8mo ago

Hi Ale, author here. Skepticism is understandable, but trust me, I'm not just writing React boilerplate or refactoring.

I find it difficult to include examples because a lot of my work is boring backend work on existing closed-source applications. It's hard to share, but I'll give it a go with a few examples :)

----

Claude's evolution: - First pass: Completely wrong approach (DB triggers) - Second pass: Right direction, wrong abstraction - Third pass: Working implementation, we could iterate on

---- Second example: Sentry monitoring wrapper for cron jobs, a reusable component to help us observe our cronjob usage

----

bsder8mo ago

We have all these superpowered AI vibe coders, and yet open source projects still have vast backlogs of open issues.

Things that make you go "Hmmmmmm."

baq8mo ago

It’s a very different discussion when you’re building a product to sell.

TiredOfLife8mo ago

The projects that have those backlogs dont allow ai made code

bakugo8mo ago

Actually providing examples of real tasks given to the AI and the subsequent results would break the illusion and give people opportunities to question the hype. Can't have that.

We'll just keep getting submission after submission talking about how amazing Claude Code is with zero real world examples.

vincent_builds8mo ago

Two recent production features:

   The 3-attempt pattern was clear here:
   - Attempt 1: DB trigger approach - wouldn't scale for our requirements
   - Attempt 2: SQL detection but wrong interfaces, misunderstood counter vs gauge metrics
   - Attempt 3: Correct abstraction after explaining how values are stored and consumed

2. *Sentry monitoring wrapper for cron jobs* - Reusable component wrapping all cron jobs with monitoring - Time: 1 day parallel vs 2 days focused

Nothing glamorous, but they are real-world examples of changes I've deployed to production quicker because of Claude.

johnfn8mo ago

Really, zero real world examples? What about this?

https://news.ycombinator.com/item?id=44159166

willtemperley8mo ago

Yes exactly. Show us the code and we can evaluate the advice. Otherwise it’s just an advertisement.

dingnuts8mo ago

the kind of engineer who has been Salesified to the point that they write such drivel as "these learnings" instead of "lessons" in an article that allegedly has a technical audience.

it's funny because as I have gotten better as a dev I've gone backwards through his progression. when I was less experienced I relied on Google; now, just read the docs

juped8mo ago

Yeah, the trusty manual becomes #1 at around the same time as one starts actually engineering. You've entered the target audience!

1 more reply

asdev8mo ago

aronowb148mo ago

Agreed. I think this Anthropic article is a realistic take on what’s possible (focus on prototyping)

https://www-cdn.anthropic.com/58284b19e702b49db9302d5b6f135a...

muzani8mo ago

jpollock8mo ago

Avoiding the boilerplate is part of the job as a software developer.

Abstracting the boilerplate is how you make things easier for future you.

globular-toast8mo ago

skydhash8mo ago

The one thing I’m always suspicious about is the actual mastery (programming language and computer usage) involved. You never see anyon describe the context of what they’ve been doing pre-llm.

conradfr8mo ago

What's weird for me is that most frameworks and tools usually include generators for boilerplate code anyway so not sure why wasting tokens/money on that is valuable.

resonious8mo ago

f311a8mo ago

Yeah, LLMs are pretty bad at planning maintainable architecture. They don’t refactor it when code is evolving and probably can’t do it due to context limitations.

meerab8mo ago

I have barely written any code since my switch to Claude Code! It's the best thing since sliced bread!

Here's what works for me:

- Detailed claude.md containing overall information about the project.

- Anytime Claude chooses a different route that's not my preferred route - ask my preference to be saved in global memory.

- Detailed planning documentation for each feature - Describe high-level functionality.

- As I develop the feature, add documentation with database schema, sample records, sample JSON responses, API endpoints used, test scripts.

- MCP, MCP, MCP! Playwright is a game changer

The more context you give upfront, the less back-and-forth you need. It's been absolutely transformative for my productivity.

Thank you Claude Code team!

f311a8mo ago

ryukoposting8mo ago

This sounds like my experiences with it. I'm writing embedded firmware in C and Rust. I'd describe further, but Claude seems incompetent at all aspects of this space.

1 more reply

meerab8mo ago

I am working on VideoToBe.com - and my stack is NextJS, Postgresql and FastAPI.

Claude code is amazing at producing code for this stack. It does excellent job at outputting ffmpeg, curl commands, linux shell script etc.

I have written detailed project plan and feature plan in MarkDown - and Claude has no trouble understanding the instructions.

I am curious - what is your usecase?

1 more reply

ethanwillis8mo ago

Personally, I give Claude a fully specified program as my prompt so that it gives me back a working program 100% of the time.

Really simple workflow!

Zee28mo ago

Ah, I’ve tried that one, but I must be doing something wrong. I give it a fully specified working program, and often times it gives me back one that only works 50% of the time!

jazzyjackson8mo ago

Does Claude Code provide some kind of "global memory" the llm refers to, or is this just a request you make within the the llm's context window? Just curious hadn't heard the use of the term

EDIT: I see, you're asking Claude to modify claude.md to track your preference there, right?

https://docs.anthropic.com/en/docs/claude-code/memory

meerab8mo ago

Yes. /init will initialize the project and save initial project information and preference.

Ask Claude to update the preference and document the moment you realize that claude has deviated away from the path.

bobbylarrybobby8mo ago

What does the playwright MCP accomplish for you? Is it basically a way for Claude to play with your app in the browser without having to write playwright tests?

mierz008mo ago

How have you been using Playwright MCP?

albingroen8mo ago

Jcampuzano28mo ago

Not to mention - while I know many don't like it, they may be able to achieve enough of a productivity boost to not require hiring as many of those crazy salaried devs.

Its literally a no-brainer. Thinking about it from just the individual cost factor is too simplified a view.

151558mo ago

Hardware companies routinely license individual EDA tool seats that cost more than numerous developer salaries - $1k/year is nothing if it improves productivity by any measurable amount.

saulpw8mo ago

The OP was saying it's $1k/mo. That's a 5-10% raise, which is a bit more than nothing.

3 more replies

AnotherGoodName8mo ago

everforward8mo ago

1 more reply

billllll8mo ago

Paying for Internet is not a great analogy imo. If you don't pay $1k/mo for Internet, you literally can't work.

What happens if you don't pay $1k/mo for Claude? Do you get an appreciable drop in productivity and output?

Genuinely asking.

EE84M3i8mo ago

Anthropic and OpenAI both have a high SSO/enterprise tier tax.

oblio8mo ago

The fast corporate internet connection is probably 1000$ for 100 developers or more...

albingroen8mo ago

And remember. This is on subsadised prices.

dajonker8mo ago

Exactly, makes it feel almost like an advertorial for Anthropic, who likely need most customers to pay 1000 bucks a month to break even.

sdesol8mo ago

astrange8mo ago

The high salaries make productivity improvements even more important.

beefnugs8mo ago

willtemperley8mo ago

Maybe I’m contrarian but I design and write most of my code and let LLMs do the reviews. Why?

First I know my problem space better than the LLM.

Another technique is to say “do this like <some good project> does it” but I suspect that might be close to copyright theft.

tkgally8mo ago

Anthropic just posted an interview with Boris Cherny, the creator of Claude Code. He also offers some ideas on how to use it.

“The future of agentic coding with Claude Code”

https://youtu.be/iF9iV4xponk

nikcub8mo ago

> budget for $1000-1500/month for a senior engineer going all-in on AI development.

Is this another case of someone using API keys and not knowing about the claude MAX plans? It's $100 or $200 a month, if you're not pure yolo brute-force vibe coding $100 plan works.

https://www.anthropic.com/max

vincent_builds8mo ago

For context: that's 1-2% of a senior engineer's fully loaded cost. The ROI is clear if it delivers even 10% productivity gain (we're seeing 2-3x on specific tasks).

I wouldn't be doing it if I didn't think it was value for money. I've always been a cost-conscious engineer who weighs cost/value, and with Claude, I am seeing the return.

imron8mo ago

> The ROI is clear if it delivers even 10% productivity gain

What if what feels like a productivity gain is actually a productivity loss?

https://mikelovesrobots.substack.com/p/wheres-the-shovelware...

(see link in the article to a study showing developers thought AI gave them a 20% gain in productivity, but measuring this showed they instead had a 20% loss)

reissbaker8mo ago

rolls-reus8mo ago

The MAX plan is a consumer plan, it’s not available with Teams or Enterprise. They introduced a premium team plan ($150) with Claude code access but not sure how much usage that bundles.

kbuchanan8mo ago

For me, working mostly in Planning Mode skips much of the initial misfires, and often leads to correct outcomes for the first edit.

mierz008mo ago

Recently I’ve been taking a step back and getting ChatGPT 5 to ask me questions to create a spec.

I refine that spec and then give that to planning mode and then go from there.

I’ve found if I jump straight into planning mode I miss some critical aspects of what ever it is I am building.

RomanPushkin8mo ago

makk8mo ago

I don’t understand the use of MCP described in the post.

What am I missing?

musbemus8mo ago

jbs7898mo ago

I just haven't heard others express the same over-engineering problem and wonder if this is a general observation or only shows up b/c my requests are quite simple.

(I have found that prompting it for the simplest or most efficient solution seems to help - sometimes taking 20+ lines down to 2-3, often more understandable.)

P.S. I tend to work with data and a web app for processes related to a small business, while not a formally trained developer.

chamomeal8mo ago

namesbc8mo ago

Spending $1500 per-month is a crazy wasteful amount of money

the_hoffa8mo ago

That's 18k a year, or about equal or cheaper than "outsourcing", minus the tax and legal ramifications.

I agree it's wasteful, but from a long-form view of what spending looks like (or at least should/used to look like). Those who see 1.5k/month as "saving" money typically only care about next quarter.

As the old adage goes: a thousand dollars saved this month is 100 thousand spent next year.

BobbyTables28mo ago

The author will be in upper management before they know it!

axus8mo ago

I like his point about more objectivity and zero ego. You don't have to worry about hurting an AI's feelings or your own when you throw away code.

awesome_dude8mo ago

But I still find myself needing (strongly) to let Claude know when it's made a breakthrough that would have been hard work on my own.

CharlesW8mo ago

Good creators tend to treat their tools with respect, and I can't imagine any reason we shouldn't feel gratitude toward our tools after a particularly satisfying session.

groby_b8mo ago

Curious: Do you also laud your compiler for particularly good optimizations?

1 more reply

nzach8mo ago

One thing that I haven't seen a lot of people talk about is the relatively new model config "Opus Plan Mode: Use Opus 4.1 in plan mode, Sonnet 4 otherwise".

In my opinion this should be the default config. Increasing the quality of the plans gives you a much better experience using Claude Code.

pseudosavant8mo ago

This has been my experience too. I’m just not quite as far along as the author.

jedberg8mo ago

I'd like to share my journey with Claude (not code).

I fed Claude a copy of everything I've ever written on Hacker News. Then I asked it to generate an essay that sounds like me.

Out of five paragraphs I had to change one sentence. Everything else sounded exactly as I would have written it.

It was scary good.

into_ruin8mo ago

I'm not comfortable using it to generate code for this project, but I can absolutely see using it to generate code for a project I'm familiar with in a language I know well.

keeda8mo ago

https://www.linkedin.com/posts/reidhoffman_can-talking-with-...

I've watched a handful of videos with this "digital twin", and I don't know how much post-processing has gone into them, but it is scary accurate. And this was a year+ ago.

nh43215rgb8mo ago

block_dagger8mo ago

reissbaker8mo ago

Pretty much everyone I talk to prefers the opposite, and feels like Claude performs best inside the Claude Code harness and not the Cursor one. But I suppose different strokes for different folks...

meerab8mo ago

Personal opinion:

block_dagger8mo ago

RomanPushkin8mo ago

It's easy: Cursor are resellers, they optimize your token usage, so they can make a profit. Claude is the final point, and they offer tokens for the cheapest price possible.

block_dagger8mo ago

I use Cursor in MAX mode because my employer pays for the tokens. I probably should have mentioned that in my OP. It makes a huge difference.

1 more reply

syspec8mo ago

fragmede8mo ago

I've been using Claude with Swift for macOS and iOS apps. What problems do you forsee where you don't think it would work out to create 20 agents for Swift?

dakiol8mo ago

It’s way easier to let the agent code the whole thing if your prompt is good enough than to give instructions bit by bit only because your colleagues cannot review a PR with 50 file changes.

athrowaway3z8mo ago

Practically - you can commit it all after you're done and then tell it to tease apart the commit into multiple well documented logical steps.

We're so early in the change that I haven't even seen anybody get it wrong, let alone right.

yodsanklai8mo ago

I split my diffs like I've always did so they can be reviewed by a human (or even an AI which won't understand 50 file changes).

The 50 file changes is most likely unsafe to deploy and unmaintainable.

Yoric8mo ago

I regularly write big MRs, then cut them into 5+ (sometimes 10+) smaller MRs. What does Claude Code change here?

dakiol8mo ago

2 more replies

edverma28mo ago

I built a tool to split up a single PR into multiple nice commits: https://github.com/edverma/git-smart-squash

bongodongobob8mo ago

Do whatever you want. Tell it to make different patches in chunks if you want. It'll do what you tell it to do.

josefrichter8mo ago

I'm almost sure that we all ended up at the same set of rules and steps how to get the best out of Claude - mine are almost identical, others' I know as well :-)

alessandru8mo ago

did this guy read that other paper about ai usage making people stupid?

how long until he falls from staff engineer back down to senior or something less?

BhavdeepSethi8mo ago

It's funny that the MIT paper HN is trending higher than this post, so it propped up before I read this article.

rester3248mo ago

I know for a fact that in countries like Japan you even need to present your arguments for a pizza party :D So that's all you need to know about AI adoption and what's driving it

bongodongobob8mo ago

Edit: Why is this downvoted? Different corp cultures have different ideas about what is worthwhile. Some places value innovation and experimentation and some places don't.

LtWorf8mo ago

I love how you are getting downvoted, probably by people who have never set foot outside the USA.

lordnacho8mo ago

I'm using Claude all the time now. It works, and I'm amazed it worked so easily for me. Here's what it looks like:

1) Summarize what I think my project currently does

2) Summarize what I think it should do

3) Give a couple of hints about how to do it

4) Watch it iterate a write-compile-test loop until it thinks it's ready

Example:

Claude will then praise my insightful guidance and start making edits.

At some point, it will do something silly, and I will say:

"Why are you doing this with a bunch of Arc<RwLock> things? Let's share state by sharing messages!"

Claude will then apologize profusely and give reasons why I'm so wise, and then build the module in an async way.

abraxas8mo ago

I don't know if this is a question of the language or what but I just have no good luck with its consistency. And I did invest time into defining various CLAUDE.md files. To no avail.

ryandrake8mo ago

lordnacho8mo ago

Has this got anything to do with using a stronger typed language? I've heard that reported, not sure whether it's true since my python scripts tend to be short.

Does it end in a forever loop for you? I used to have this problem with other models.

1 more reply

wg08mo ago

I can second that. Even on plain CRUD with little to no domain logic.

pastage8mo ago

That is 150MWh per month in AI for a staff engineer. If we are doing a straight dollar to kWh conversion, plus/minus an order of magnitude.

cjonas8mo ago

Typescript on the other hand, seems to do much better on first pass. Still not always beautiful code, but much more application ready.

My hypothesis is that this is due to the billions LOC of Jupyter Notebook it was probably trained on :/

rcfox8mo ago

With Typescript, I find it pretty eager to just try `(foo as any).bar` when it gets the initial typing wrong. It also likes to redefine types in every file they're used instead of importing.

It will fix those if you catch them, but I haven't been able to figure out a prompt that prevents this in the first place.

__mharrison__8mo ago

furyofantares8mo ago

soperj8mo ago

and do you find this takes longer or shorter than just doing it yourself from scratch?

shinecantbeseen8mo ago

1 more reply

furyofantares8mo ago

Quite a bit shorter. Plus I can do the a good chunk of the work (first iteration) in contexts where I couldn't before, where I require less focus, and it uses less of my energy.

I think I can also end up with a better result, and having learned more myself. It's just better in a whole host of directions all at once.

I don't end up intimately familiar with the solution however. Which I think is still a major cost.

bongodongobob8mo ago

Not OP, I don't care if it's the same amount of time because I can do it drunk/while doing other things. Not sure why how long does it take is the be all end all for some people.

xentronium8mo ago

> The shift to Claude Code? That took just hours of use for me to become productive.

> This isn't failure; it's the process!

> The biggest challenge? AI can't retain learning between sessions

ai slop

drudolph9148mo ago

for the record, I've been bullish on the tooling from the beginning

My dev-tooling AI journey has been chatGPT -> vscode + copilot -> early cursor adopter -> early claude + cursor adopter -> cursor agent with claude -> and now claude code

I've also spent a lot of time trying out self-hosted LLMs such as couple version of Qwen coder 2.5/3 32B, as well as deepseek 30B - and talking to them through the vscode continue.dev extension

I've accepted to just use claude w/ cursor and to be done with experimenting. the agent tooling just slows my engineering team down

I think that famous vibe-code reddit post said it best

"what's the point of using these tools if I still need a software engineer to actually build it when I'm done prototyping"

sigmonsays8mo ago

every god damn time AI hallucinates a solution that is not real (in ChatGPT)

j / k navigate · click thread line to collapse