My LLM codegen workflow (opens in new tab)

(harper.blog)

522 pointslolptdr1y ago160 comments

160 comments

Absolutely LLMs are great for greenfield projects. They can get you to a prototype for a new idea faster than any tool yet invented. Where they start to break down, I find, is when you ask it to make changes/refactors to existing code and mature projects. They usually lack context, so they doesn't hesitate to introduce lots of extra complexity, add frameworks you don't need, and in general just make the situation worse. Or if they get you to some solution it will have taken so long that you might as well have just done the heavy lifting yourself. LLMs are still no substitute for actually understanding your code.

wilkystyle1y ago

100% agree.

My experience to date across the major LLMs is that they are quick to leap to complex solutions, and I find that the code often is much harder to maintain than if I were to do it myself.

But complex code is only part of the problem. Another huge problem I see is the rapid accumulation of technical debt. LLMs will confidently generate massive amounts of code with abstractions and design patterns that may be a good fit in isolation, but are absolutely the wrong pattern for problem you're trying to solve or the system you're trying to build. You run into the "existing code pattern" problem that Sandi Metz talked about in her fantastic 2014 RailsConf talk, "All the little things" [0]:

> "We have a bargain to follow the pattern, and if the pattern is a good one then the code gets better. If the pattern is a bad one, then we exacerbate the problem."

Rapidly generating massive amounts of code with the wrong abstractions and design patterns is insidious because it feels like incredible productivity. You see it all the time in posts on e.g. Twitter or LinkedIn. People gushing about how quickly they are shipping products with minimal to zero other humans involved. But there is no shortcut to understanding or maintainability if you care about building sustainable software for the medium to long-term.

EDIT: Forgot to add link

[0] https://www.youtube.com/watch?v=8bZh5LMaSmE&t=8m11s

williamcotton1y ago

But why follow the wrong abstraction and why try to build something that you don't fundamentally understand?

I've built some rather complex systems:

Guish, a bi-directional CLI/GUI for constructing and executing Unix pipelines: https://github.com/williamcotton/guish

WebDSL, fast C-based pipeline-driven DSL for building web apps with SQL, Lua and jq: https://github.com/williamcotton/webdsl

Search Input Query, a search input query parser and React component: https://github.com/williamcotton/search-input-query

jdlshore1y ago

I'm not trying to throw shade when I say this: those codebases are very small. (I'm assuming what I found in the src/ directories is their code.) Working in large codebases is a different kind of experience than working in a small codebase. It's no longer possible to keep the whole system in mind, to keep the dozens+ people working on it in sync, or keep up to date with all the changes being made. In that environment, consistency is a useful mechanism to keep things under control, although it can be overused.

jrvarela561y ago

I would recommend everyone reading this to think of it as a skill issue. You can learn to use the LLM/agent to document your code base, test isolated components and refactor your spaghetti into modular chunks easily understandable by the agent.

The greenfield projects turn into a mess very quick because if you let it code without any guidance (wrt documentation, interactivity, testability, modularity) it generates crap until you can't modify it. The greenfield project turns into legacy as fast as the agent can spit out new code.

ukuina1y ago

> turns into legacy as fast as the agent can spit out new code.

This is an important point. Unconstrained code generation lets you witness accelerated codebase aging in real-time.

HPsquared1y ago

LLM coding is quite well-suited to projects that apply the Unix philosophy. Highly modular systems that can be broken into small components that do one thing well.

StableAlkyne1y ago

I've found modularity to be helpful in general whether there's an LLM or not.

Easier to test, lower cognitive overload, and it's faster to onboard someone when they only need to understand a small part at a time.

I almost wonder if these LLMs can be used to assess the barrier for onboarding. If it gets confused and generating shitty suggestions, I wonder if that could that be a good informal smoke alarm for trouble areas the next junior will run into.

infecto1y ago

You are right but that's also a good indication that the codebase itself is too complex and at a certain size / scale, too much for a human to reason over even or even if something a human could do, is not efficient doing so.

You should not be structuring the code for a LLM alone but I have found that trying to be very modular has helped both my code as well as the ability to utilize LLM on top of it.

ziml771y ago

I've found they will also introduce subtle changes. I just used o1 recently to pull code from a Python notebook and remove all the intermediate output. It basically got it right except for one string that was used to look up info from an external source. It just dropped 2 characters from the end of the string. That issue required a bit of time to track down because I thought it was an issue with the test environment!

Eventually I ended up looking at the notebook and the extracted code side-by-side and carefully checking every line. Despite being split across dozens of cells, it would have been faster if I had started out by just manually copying the code out of each meaningful cell and pasted it all together.

cheema331y ago

I have seen LLMs do this. This is why I try to create a git repo, even if it is just local and then diff my changes. This allows me to catch LLM changing something that was completely unrelated to the task it was working on.

sejje1y ago

You can use LLMs and actually understand your code.

y1n01y ago

I agree, but where I run into problems is my existing projects are large. In the last couple weeks I’ve had two cases where I really wanted AI help but I couldn’t fit my stuff in the 128k context window.

These are big legacy projects where I didn’t write the code to begin with, so having an AI partner would have been really nice.

jrexilius1y ago

The first part of this, where you told it to ask YOU questions, rather than laboriously building prompts and context yourself was the magic ticket for me. And I doubt I would have stumbled on that sorta inverse logic on my own. Really great write up!

danphilibin1y ago

This is the key to a lot of my workflows as well. I'll usually tack some form of "ask me up to 5 questions to improve your understanding of what I'm trying to do here" onto the end of my initial messages. Over time I've noticed patterns in information I tend to leave out which has helped me improve my initial prompts, plus it often gets me thinking about aspects I hadn't considered yet.

daxfohl1y ago

Frankly getting used to doing this may help our communication with other engineers as well.

fragmede1y ago

promo from L5->L7 confirmed.

1 more reply

treetalker1y ago

Indeed!

The example prompts are useful. They not only reduced the activation energy required for me to start installing this habit in my personal workflows, but also inspired the notion that I can build a library of good prompts and easily implement them by turning them into TextExpander snippets.

P.S.: Extra credit for the Insane Clown Posse reference!

CamperBob21y ago

That's one of the key wins with o1-pro's deep research feature. The first thing it tends to do after you send a new prompt is ask you several questions, and they tend to be good ones.

One idea I really like here is asking the model to generate a todo list.

nijaru1y ago

I add something like “ask me any clarifying questions” to my my initial prompts. For larger requests, it seems to start a dialogue of refinement before providing solutions.

theturtle321y ago

Can confirm, this is an excellent tactic when working with LLMs!

bcoates1y ago

That lonely/downtime section at the end is a giant red flag for me.

It looks like the sort of nonproductive yak-shaving you do when you're stuck or avoiding an unpleasant task--coasting, fooling around incrementally with your LLM because your project's fucked and you psychologically need some sense of progress.

The opposite of this is burnout--one of the things they don't tell you about successful projects with good tools is they induce much more burnout than doomed projects. There's a sort of Amdahl's Law in effect, where all the tooling just gives you more time to focus on the actual fundamentals of the product/project/problem you’re trying to address, which is stressful and mentally taxing even when it works.

Fucking around with LLM coding tools, otoh, is very fun, and like constantly clean-rebuilding your whole (doomed) project, gives you both some downtime and a sense of forward momentum--look how much the computer is chugging!

The reality testing to see if the tool is really helping is to sit down with a concrete goal and a (near) hard deadline. Every time I've tried to use an LLM under these conditions it just fails catastrophically--I don't just get stuck, I realize how basically every implicit decision embedded in the LLM output has an unacceptably high likelihood of being wrong, and I have an amount of debug cycles ahead of me exceeding the time to throw it all away and do it without the LLM by, like, an order of magnitude.

I'm not an LLM-coding hater and I've been doing AI stuff that's worked for decades, but current offerings I've tried aren't even close to productive compared to searching for code that already exists on the web.

getnormality1y ago

It sounds like LLMs are the new futzing with emacs configuration.

wilkystyle1y ago

Old and busted: Futzing around with my Emacs configuration. New hotness: Having an LLM do it for me.

krupan1y ago

Seriously!! Coding with LLM's is marketed as a huge time saver, but every time I've tried, it hasn't been. I'm told I just need to put in the time (ironic, no?) to learn to use the LLM properly. Why don't I just use that time to learn to write code better myself?

anon70001y ago

It’s not really ironic. You could spend a couple hours making yourself twice as good as using AI tools, or a couple hours making yourself like .1% of a better programmer, assuming you’re not banging your head against the wall anyways.

It’s one of those things where a little upskilling can make a big impact. So many things in life need a bit of practice before they’re useful to you.

For starters, you need to change the default prompt in your editor to make it do what you want. If it does something annoying or weird, put it in the prompt to not take that approach. For me, that was absurdly long, useless explanations. And now it’s short and sweet.

mdrzn1y ago

Seriously!! Cars are marketed as a huge time saver, but every time I’ve tried one, they haven’t been. I’m told I just need to put in the time (ironic, no?) to learn to drive properly. Why don’t I just use that time to train my legs and run faster instead?

krupan1y ago

I think the difference here is it is not at all obvious to me that an LLM is a force multiplier on same the order as cars to legs.

Cars are pretty easy to observe in action doing what they promise to do. Driving a car is a very straightforward, mechanical, repeatable, intuitive operation.

Working with an LLM is not repeatable or straightforward.

I'm short, your analogy is not helping me

1 more reply

khqc1y ago

I guess you're not a big fan of rubber duck debugging then? Whenever I get stuck I like to ask myself a bunch of questions and thought experiments to get a better understanding of the problem/project, and with LLMs I'm forced to spell out each one of these questions/experiments coherently, which ends up being great documentation later on. I think LLMs are great if you're actually interested in the fundamentals of your problem/project, otherwise it just turns into a sinkhole that sucks you in.

bcoates1y ago

Sure, but I just use OneNote, the company wiki, or like physical sticky notes for that. I'm not seeing the value of having the LLM give feedback aside from entertainment

triyambakam1y ago

It's more like waiting for the code to compile (or node_modules to install before npm improved)

flir1y ago

> constantly clean-rebuilding your whole (doomed) project, gives you both some downtime and a sense of forward momentum

ouch. You've thought about this, haven't you? Your ideas are intriguing to me, and I wish to subscribe to your newsletter.

rotcev1y ago

This is the first article I’ve come across that truly utilizes LLMs in a workflow the right way. I appreciate the time and effort the author put into breaking this down.

I believe most people who struggle to be productive with language models simply haven’t put in the necessary practice to communicate effectively with AI. The issue isn’t with the intelligence of the models—it’s that humans are still learning how to use this tool properly. It’s clear that the author has spent time mastering the art of communicating with LLMs. Many of the conclusions in this post feel obvious once you’ve developed an understanding of how these models "think" and how to work within their constraints.

I’m a huge fan of the workflow described here, and I’ll definitely be looking into AIder and repomix. I’ve had a lot of success using a similar approach with Cursor in Composer Agent mode, where Claude-3.5-sonnet acts as my "code implementer." I strategize with larger reasoning models (like o1-pro, o3-mini-high, etc.) and delegate execution to Claude, which excels at making inline code edits. While it’s not perfect, the time savings far outweigh the effort required to review an "AI Pull Request."

Maximizing efficiency in this kind of workflow requires a few key things:

- High typing speed – Minimizing time spent writing prompts means maximizing time generating useful code.

- A strong intuition for "what’s right" vs. "what’s wrong" – This will probably become less relevant as models improve, but for now, good judgment is crucial.

- Familiarity with each model’s strengths and weaknesses – This only comes with hands-on experience.

Right now, LLMs don’t work flawlessly out of the box for everyone, and I think that’s where a lot of the complaints come from—the "AI haterade" crowd expects perfection without adaptation.

For what it’s worth, I’ve built large-scale production applications using these techniques while writing minimal human code myself.

Most of my experience using these workflows has been in the web dev domain, where there's an abundance of training data. That said, I’ve also worked in lower-level programming and language design, so I can understand why some people might not find models up to par in every scenario, particularly in niche domains.

brokencode1y ago

> “I appreciate the time and effort the author put into breaking this down.”

Let’s be honest. The author was probably playing cookie clicker while this article was being written.

rd1y ago

Has anyone who evolved from a baseline of just using Cursor chat and freestyling to a proper workflow like this got any anecdata to share on noticeable improvements?

Does the time invested into the planning benefit you? Have you noticed less hallucinations? Have you saved time overall?

I’d be curious to hear because my current workflow is basically

1. Have idea

2. create-next-app + ShadCN + TailwindUI boilerplate

3. Cursor Composer on agent mode with Superwispr voice transcription

I’m gonna try the author’s workflow regardless, but would love to hear others opinions.

ghuntley1y ago

If you steer it and build a stdlib, you get better outcomes. See https://ghuntley.com/stdlib

fragmede1y ago

> I'm hesitant to give this advice away for free

With all the layoffs in our sector, I wouldn't blame you if you didn't share it, so thank you for sharing.

margalabargala1y ago

Yeah, don't they know how to hustle? I bet they're still asleep at 5am.

Seriously, though, it's really sad that not trying to profit off a discussion of industry tooling is something someone has to "push through".

risyachka1y ago

How does it help and not make it worse when it comes to layoffs?

1 more reply

MarkMarine1y ago

I’ve been following this, my workflow doesn’t us cursor (VS Code descendants just aren’t my preference) but I’ve built your advice into my home made system using emacs and gptel. I keep a style guide that is super detailed for each language and project, and now I’ve been building the stdlib you recommended. It’s great, thanks for writing this!

ghuntley1y ago

no problem <3

newtwilly1y ago

Hi, I appreciate you sharing. I've been starting to use this advice with a different tool. Just FYI, this sentence kind of came out of nowhere and it wasn't clear what you meant: > The foundational LLM models right now are what I'd estimate to be at circa 45% accuracy and require frequent steering

Do your rules count as frequent steering and lead to increased 'accuracy', or is that the 'accuracy' you're seeing with your current workflow, rules and all?

e12e1y ago

Looks like 70% of those rules would benefit from being shared, just like dot files and editor configs.

mike_hearn1y ago

Aider + AI generated maps and user guides for internal modules has worked well for me. Just today I did my own version of a script that uses Gemini 2 Flash (1M context window) to generate maps of each module in my codebase, i.e. a short one or two sentence description of what's in every file. Aider's repo maps don't work well for me, so I disable them, and I think this will work better.

I also have a scratchpad file that I tell the model it can update to reflect anything new it learns, so that gives it a crude form of memory as it works on the codebase. This does help it use internal utility APIs.

manmal1y ago

LLMs forcing us to improve our documentation habits. Seriously though, many languages allow API doc generation out of comments. Maybe these docs can just be flattened into a file.

mike_hearn1y ago

Yes sort of. This particular codebase is a mix of Java and Kotlin, and all my internal code is documented with proper Javadocs/KDocs already since years, just for myself and other people I work with. That's partly why Gemini can make such accurate maps.

The problem isn't a lack of docs but rather birds-eye context: even with models that allow huge context windows and are fast, you can drown a model in irrelevant stuff and it's expensive. I'm still with Claude 3.5 for coding and its window is large but not unlimited. You really don't want to add a bunch of source files _and_ the complete API docs for tens of thousands of classes into every prompt, not unless you like waiting whilst money burns and getting problems due to the model getting distracted.

It's also just wasteful, docs contain a lot of redundancy and stuff the model can guess. If you ask models to make notes about only the surprising stuff, it's a form of compression that lets you make smaller prompts.

Aider provides a quick fix because it's easy to control what files are in the context. But to 'level up' I need to let the AI find and add files itself. Aider can do this: it gives the model tools for requesting files to be added to the chat. And in theory, Aider computes a PageRank over symbols and symbolic references to find the most important stuff in the repository and computes a map that's prepended to the prompt so the model knows what to ask for. In practice for reasons I don't understand, Aider's repo maps in this project are full of random useless stuff. Maybe it works better for Python.

Finding the right way to digest codebases is still an open problem. I haven't tried RAG, for instance. If things are well abstracted it in theory shouldn't be needed.

throwup2381y ago

Indexing automatically generated API docs using Cursor seems to work very well. I also index any guides/mdbooks libraries have available, depending on whether I’m trying to implement something new or modifying existing code.

orsenthil1y ago

> Aider + AI generated maps and user guides

How do do that? Especially the AI generated map?

mike_hearn1y ago

I have a custom script. It selects all the source files, strips any license headers and concatenates them like this:

    <source_file name="foo/bar/Baz.java">
    ...
    </source_file>

It then chunks them to fit within model context window limits, sends it to the LLM with a system prompt that asks it to summarize each file in a compact way, and writes the result back out to the tree.

The ugly XML tag is to avoid conflicts. Some other scripts try to make a Markdown document of the tree which is silly because your tree is quite likely to contain Markdown already, and so it's confusing for the model to see ``` that doesn't really terminate the block. Using a marker pattern that's unlikely to occur in your code fixes that.

dimitri-vs1y ago

Yes, and then I keep going back to the basics:

- small .cursorrules file explaining what I am trying to build and why at a very high level and my tech stack

- a DEVELOPMENT.md file which is just a to-do/issue list for me that I tell cursor to update before every commit

- a temp/ directory where I dump contextual md and txt files (chat logs discussing feature, more detailed issue specs, etc.)

- a separate snippet management app that has my commonly used request snippets (write commit message, ask me clarify questions, update README, summarize chat for new session, etc.)

Otherwise it's pretty much what your workflow is.

cynicalpeace1y ago

I'm wondering the same thing.

Most of these workflows are just context management workflows and in Cursor it's so simple to manage the context.

For large files I just highlight the code and cmd+L. For short files, I just add them all by using /+downarrow

I constantly feed context like this and then usually come to a good solution for both legacy and greenfield features/products.

If I don't come to a good solution it's almost always because I didn't think through my prompt well enough and/or I didn't provide the correct context.

rollinDyno1y ago

Something I quickly learned while retooling this past week is that it’s preferable not to add opinionated frameworks to the project as they increase the size of the context the model should be aware of. This context will also not likely be available in the training data.

For example, rather than using Plasmo for its browser extension boilerplate and packaging utilities, I’ve chosen to ask the LLM to setup all of that for me as it won’t have any blindspots when tasked with debugging.

sampton1y ago

The end of artisan frameworks - probably for the better.

balls1871y ago

It's likely the end of a lot of abstractions that made programming easier.

At some point, specialized code-gen transformer models should get really good at just spitting out the lowest level code required to perform the job.

yoz1y ago

Disagree. Some abstractions are still vital, and it's for the same reasons as always: communicate purpose and complexity concisely rather than hiding it.

The best code is that which explains itself most efficiently and readably to Whoever Reads It Next. That's even more important with LLMs than with humans, because the LLMs probably have far less context than the humans do.

Developers often fall back on standard abstraction patterns that don't have good semantic fit with the real intent. Right now, LLMs are mostly copying those bad habits. But there's so much potential here for future AI to be great at creating and using the right abstractions as part of software that explains itself.

1 more reply

hy4000days1y ago

This.

Future programming language designers are then answering questions like:

"How low-level can this language be while considering generally available models and hardware available can only generate so many tokens per second?",

"Do we have the language models generate binary code directly, or is it still more efficient time-wise to generate higher level code and use a compiler?"

"Do we ship this language with both a compiler and language model?"

"Do we forsake code readability to improve model efficiency?"

1 more reply

bee_rider1y ago

Surely no respectable professional would just ship code they don’t understand, right? So the LLM should probably spit out code in reasonably well known languages using reasonably well known libraries and other abstractions…

1 more reply

fastball1y ago

It's not just frameworks – I noticed this recently when starting a new project and utilizing EdgeDB. They have their own Typescript query builder, and [insert LLM] cannot write correct constructions with that query builder to save its life.

bambax1y ago

This is all fine for a solo dev, but how does this work with a team / squad, working on the same code base?

Having 7 different instances of an LLM analyzing the same code base and making suggestions would not just be economically wasteful, it would also be unpractical or even dangerous?

Outside of RAG, which is a different thing, are there products that somehow "centralize" the context for a team, where all questions refer to the same codebase?

sambo5461y ago

I've started substituting "human" for "LLM" when I read posts like these. Is having 7 different humans analyzing the same code base any less wasteful?

bambax1y ago

They are not analyzing the same code base, they are all contributing to the same code base, each in their own domain. It would seem relevant that any advice an LLM gives to one of them is kept consistent -- in real time -- with any other advice to any other dev, instead of having to wait for each commit or push.

staindk1y ago

I've only recently switched to Cursor so am not clued up about everything, but they mention that the embedded indexing they do on your code is shared with others (others who have access to that repository? Unsure).

It did seem to take a while to index, even though my colleagues had been using Cursor for a while, so I'm likely misunderstanding something.

adr1an1y ago

Cody/ sourcegraph provide workspaces for teams/ enterprise. Probably for this and other reasons.

tarkin21y ago

Most new programmers forget the specification and execution plan part of programming.

I ended up finishing my side projects when I kept these in mind, rather than focusing on elegant code for elegant code's sake.

It seems the key to using LLMs successfully is to make them create a specification and execution plan, through making them ask /you/ questions.

If this skill--specification and execution planning--is passed onto LLMs, along with coding, then are we essentially souped-up tester-analysts?

fullstackwife1y ago

Looks similar to my experience, except this part:

> if it doesn’t work, Q&A with aider to fix

I fix errors myself, because LLMs are capable of producing large chunks of really stupid/wrong code, which needs to be reverted, and thats why it makes sense to see the code at least once.

Also I used to find myself in a situation when I tried to use LLM for the sake of using LLM to write code (waste of time)

codeisawesome1y ago

Would be great if there were more details on the costs of doing this work - especially when loading lots of tokens of context via repo mix and then generating code with context (context-loaded inference API calls are more expensive, correct?). A dedicated post discussing this and related considerations would be even better. Are there cost estimations in the tools like aider (vs just refreshing the LLM platform’s billing dashboard?)

keyle1y ago

I have been using LLM for a long time but these prompts ideas are fantastic; they really opened up a world for me.

Because a lot of the benefits of LLM is bringing ideas or questions I am not thinking of right now, and this really does that. Typically this would happen as I dig through a topic, not before hand. So that's a net benefit.

I also tried it and it worked a charm, the LLM did respect context and the step by step approach, poking holes in my ideas. Amazing work.

I still like writing codes and solving puzzles in my mind so I won't be doing the "execution" part. From there on, I mostly use LLM as auto complete and I'm stuck here or obscure bug solving. Otherwise, I don't get any satisfaction from programming, having learnt nothing.

ggulati1y ago

Nice, I coincidentally wrote a blog post today exploring workflows as well: https://ggulati.wordpress.com/2025/02/17/cursorai-for-fronte...

Your workflow is much more polished, will definitely try it out for my next project

fragmede1y ago

> paste in prompt into claude copy and paste code from claude.ai into IDE

is more polished? What's your workflow, banging rocks together?

ggulati1y ago

More or less, I tried out Cursor for the first time a week ago. So very much in the newbie stage and looking to learn

shoemakersteve1y ago

This made me laugh audibly. Thank you.

harper1y ago

let me know how it works!

hnuser1234561y ago

Looks like your blog crashed, I've been wanting to read it

hnuser1234561y ago

Thank you for fixing it

Isamu1y ago

I’m curious, is adding “do not hallucinate” to prompts effective in preventing hallucinations? The author does this.

watt1y ago

It will work - you can see it well with a Chain of Thought (CoT) model: it will keep asking itself: "am I hallucinating? let's double check" and then will self-reject thoughts if it can't find a proper grounding. In fact, this is the best part of CoT model, that you can see where it goes off rails and can add a message to fix it in the prompt.

For example, there is this common challenge, "count how many r letters in strawberry", and you can see the issue is not counting, but that model does not know if "rr" should be treated as single "r" because it is not sure if you are counting r "letters" or r "sounds" and when you sound out the word, there is a single "r" sound where it is spelled with double "r". so if you tell the model, double "r" stands for 2 letters, it will get it right.

simonw1y ago

Apple were using that in their Apple Intelligence system prompts last year, I don't know if they still have that in there. https://simonwillison.net/2024/Aug/6/apple-intelligence-prom...

I have no idea if it works or not!

harper1y ago

I added it because of the apple prompts! I figured it is worth a try. The results are good, but i did not test it extensively

becquerel1y ago

I don't know about this specific technique, but I have found it useful to add a line like 'it's OK if you don't know or this isn't possible' at the end of queries. Otherwise LLMs have a tendency to tilt at whatever windmill you give them. Managing tone and expectations with them is a subtle but important art.

krainboltgreene1y ago

It seems absurd, but I suppose it’s the same as misspelling with similar enough trigrams as to get the best autocorrect results.

cipehr1y ago

Am I the only one that doesn’t see the hype with Claude? I recently tried it, hit the usage limit, read around found tons of blogs and posts from devs saying Claude is the best code assistant LLM… so I purchased Claude pro… and I hate it. I have been asking it surface level questions about Apache spark (configuring the number of tasks retries, errors, error handling, etc.) and it hallucinated so much, so frequently. It reminds me of like ChatGPT 3…

What am I doing wrong or what am I missing? My experience has been so underwhelming I just don’t understand the hype for why people use Claude over something else.

Sorry I know there are many models out there, and Claude is probably better than 99% of them. Can someone help me understand the value of it over o1/o3? I honestly feel like I like 4o better.

/frustration-rant

btucker1y ago

It seems like you might be trying to use it like a search engine, which is a common mistake people make when first trying LLMs. LLMs are not like Google.

The key is to give it context so it can help you. For example, if you want it to help you with Spark configuration, give it the Spark docs. If you want it to help you write code, give it your codebase.

Tools like cursor and the like make this process very easy. You can also set up a local MCP server so the LLM can get the context and tools it needs on its own.

cipehr1y ago

Thank you very much for the ideas here, i will try the approach of giving it context. I havent got into cursor, since i use helix and intellij… i need to look into the MCP server thing

Thanks again!

foretop_yardarm1y ago

Giving examples of inputs and outputs can also help

1 more reply

dimitri-vs1y ago

Claude is exceptionally good at taking "here is two paragraphs of me rambling off about a feature I want in broken voice to text" and actually understanding what you want. It has really good prompt adherence but at the same time knows when to read between the lines.

cipehr1y ago

Awesome, thanks for the reassurance. I’ll stick with it and keep trying to improve my usage

hooverd1y ago

I think LLM codegen still requires a mental model of the problem domain. I wonder how many upcoming devs will simply never develop one. Calculators are tools for engineers /and/ way too many people can't even do basic receipt math.

jack_pp1y ago

Calculations are for calculators. I was good at math in school but now I struggle / take so much time doing receipt math and for what? What's the purpose of the time you spend doing it, when do you need to have your brain trained for this specific task?

hooverd1y ago

For me, being able to notice when you mess up your own calculations. It doesn't help that we teach arithmetic operations ass-backwards (smallest to largest instead of largest to smallest).

junto1y ago

Something I’ve started to do recently is mob programming with LLM’s.

I act as the director, creativity and ideas person. I have one LLM that implements, and a second LLM that critiques and suggests improvements and alternatives.

krupan1y ago

I hope this is a joke, but I'm guessing it isn't, lol!

triyambakam1y ago

It's a good idea. Get diverse model output.

jacooper1y ago

I find making the LLMs think and plan the project a bit worrying, I understand this helps with procrastination but when these systems eventually get better and more integrated, the most likely thing to happen to software devs is them moving away from purely coding to more of a solution architect role (aka Planning stuff), not taking into account the negative impact of giving up critical thinking to LLMs.

https://news.ycombinator.com/item?id=43057907

Other than that a great article! Very insightful.

harper1y ago

I actually think it is going to be way worse than you are suggesting. I think that the LLM codegen is going to replace most if not all of software eng workflow and teams that we see today.

Software is going to be prompt wrangling with some acceptance testing. Then just prompt wrangling.

I don't have a lot of hope for the software profession to survive.

avandekleut1y ago

This is pretty much my flow that I landed on as well. Dump existing relevant files into context, explain what we are trying to achieve, and ask it to analyze various approaches, considerations, and ask clarifying questions. Once we both align on direction, I ask for a plan of all files to be created/modified in dependency order with descriptions of the required changes. Once we align on the plan I say lets proceed one file at a time, that way I can ensure each file builds on the previous one and I can adjust as needed.

randomcatuser1y ago

> I really want someone to solve this problem in a way that makes coding with an LLM a multiplayer game. Not a solo hacker experience. There is so much opportunity to fix this and make it amazing.

This i think is the grand vision -- what could it look like?

in my mind programming should look like a map -- you can go anywhere, and there'll be things happening. and multiple people.

If anyone wants to work on this (or have comments, hit me up!)

blah22441y ago

This is a great article -- I really appreciate the author giving specific examples. I have never heard of mise (https://mise.jdx.dev/) before either, and the integration with the saved prompts is a nifty idea -- excited to try it out!

abrookewood1y ago

Mise is great - it's an alternative to ASDF and remains call compatible from memory, but is much faster.

krupan1y ago

If I have to go to this much effort, what is AI buying us here? Why don't we just put the effort in to learn to write code ourselves? Instead of troubleshooting AI problems and coming up with clever workarounds for those problems, troubleshoot your code, solve those problems directly!

harper1y ago

the speed is way faster. I am a good programmer with more than 25 years of professional experience. The AI is a better programmer in every way. Why do it myself when i can outsource it and play cookie clicker?

The real thing that sold me is the entire workflow takes 10 minutes to plan, and then 10-15 minutes to execute (let's say a python script of medium complexity). after a solid ~20-30 min I am largely done. no debugging necessary.

it would have taken me an hour or two to do the same script.

this means i can spend a lot more time with the fam, hacking on more things, and messing about.

triyambakam1y ago

Speed. The abstraction layer has moved up. You probably aren't writing machine code anymore.

runoisenze1y ago

Great write up! Roughly how many Claude tokens are you using per month with this workflow? What’s your monthly API costs?

Also what do you mean by “I really want someone to solve this problem in a way that makes coding with an LLM a multiplayer game. Not a solo hacker experience.” ?

harper1y ago

most of this is aider / codegen:

Total tokens in: 26,729,994 Total tokens out: 1,553,284

Last month anthropic bill was $89.30

I want to program with a team, together. not a group of people individually coding with an agent, and then managing merges. I have been playing a lot with merging team context - but haven't gotten too far yet.

fragmede1y ago

Have you tried using OpenHands so you can just give it the todo.md until it gets stuck/is finished?

harper1y ago

when it first was released i played with it - but haven't since. i should try again.

bionhoward1y ago

I don’t mind LLMs, but what irks me is the customer noncompete, you have these systems that can do almost anything and the legal terms explicitly say you’re not allowed to use the thing for anything that competes with the thing. But if the things can do almost anything then you really can’t use it for anything. Making a game with Grok? No, that competes with the xAI game studio. Making an agents framework with ChatGPT? No, that competes with Swarm. Making legal AI with Claude? No, that competes with Claude. Seems like the only American companies making AI we can actually use for work are HuggingFace and Meta.

thornewolf1y ago

Ignore the noncompetes. Never get sued. If you do, everyone else is on your side.

bionhoward1y ago

Meh, why pay to teach someone else’s bot? I’m sticking with open source

biddit1y ago

Form a Nonprofit X and a Corp Y:

Noprofit X publishes outputs from competing AI, which is not copyrightable.

Corp Y injests content published by Nonprofit X.

dfltr1y ago

> Legacy modern code

As opposed to Vintage Pioneer code?

harper1y ago

in my experience, there is quite a spectrum of legacy code.

Legacy modern code would be anything from the last 5-10 years. Vintage Pioneer code (which i have both initialized, and maintained) is more than 20 years old.

I am trying not to be a vintage pioneer these days.

mrklol1y ago

About the first step, you probably also need some kind of context that the LLM has most information to iterate with you about the new feature idea.

so either you put the whole codebase into the context (will mostly lead to problems as tokens are limited) or you have some kind of summary with your current features etc.

Or you do some kind of "black box" iterations, which I feel won’t be that useful for new features, as the model should know about current features etc?

What’s the way here?

maelito1y ago

Given a 3 648 318 tokens repository (number from Repomix), I'm not sure what would be the cost of using a leading LLM to analyse it and ask improvements.

Isn't the input token number way more limited than that ?

This is part is unclear to me in the "non-Greenfield" part of the article.

Iterating with aider on very limited scopes is easy, I've used it often. But what about understanding a whole repository and act on it ? Following imports to understand a Typescript codebase as a whole ?

kridsdale31y ago

Well, do you as a human have the whole codebase loaded in to your memory with the ability to mentally reason with it? No, you work on a small scope at a time.

layer81y ago

You may work in a limited scope at a time, but you are aware how it fits into the larger scope, and more often than not you actually have to connect things across different scopes.

kridsdale31y ago

I work on projects with hundreds of thousands of files and tens of millions of lines. I am honestly clueless how most of it fits together and working here as a human feels often not too different (just a few steps of scale higher) than LLM-based coding like Cursor.

jack_pp1y ago

Well you can use an LLM similarly. Have it write docs for all your files including a summary for each function / class, ideally in order of dependency. Then use only the summaries in context. This should significantly lower your token count.

Haven't tried it personally but it should work

1 more reply

lanthissa1y ago

you do the same thing with the llm, you have it describe the api of modules not related to your code and that in place of those segments of the code.

bcoates1y ago

On larger codebases, I use tools heavily instead of just winging it. If the LLM can't either orchestrate those tools or apply its own whole-program analysis it quite impossible for it to do anything useful.

thedeep_mind1y ago

This is effing great...thanks for sharing your experience.

I was just wondering how to give my edits back to in-browser tools like Claude or ChatGPT, but the idea of repo mix is great, will try!

Although I have been flying bit with copilot in vscode, so right now I have essentially two AI, one for larger changes (in the browser), and then minor code fixes (in vscode).

ChrisRob1y ago

In our company we are only allowed to use GitHub Copilot with GPT or Claude, but not Claude directly. I'm quite struggling with getting good results from it, so I'll try to adapt your workflow into that setup. To the community: Do you have some additional guidance for that setup?

debian31y ago

Use vs code insider if you can. They double the context size and it really makes a difference. You get 128k input token.

psadri1y ago

One more tool he could make is one to wrap that entire process so there is less copy/pasting needed.

sejje1y ago

Aider, a tool he uses, can do that automatically. He could just use that feature.

harper1y ago

Yea - i have had OK luck with the architect mode. but was using that as part of this (using deepseek for reasoning) and it good. but oh so slow.

Ultimately, i would love to just use one tool

sejje1y ago

I don't mean the architect mode, I mean the copy-paste mode.

https://aider.chat/docs/usage/copypaste.html

pyreal1y ago

I'm curious to see his mise tasks. He lists a few of them near the end but I'm not sure what his LLM CLI is. Is that an actual tool or is he using it as a placeholder for "insert your LLM CLI tool here"?

nickrj1y ago

https://github.com/simonw/llm

It is linked to in the article - a brilliant utility from Simon.

fallinditch1y ago

Great post and discussion.

Also, don't forget that your favorite AI tools can be of great help with the factors that cause us to make software: research, subject expertise, marketing, business planning, etc.

insin1y ago

I liked the bit where he asked it not to hallucinate

jdenning1y ago

Question to folks with good workflows: Are you using tools like DSPy to generate prompts? Any other tools/tips about managing prompts?

harper1y ago

i really wanted to use DSPy to generate prompts, but it wasn't quite as compatible with my workflow as i wanted. I love the idea tho - code instead of strings.

i will dig in again. It is an exciting idea.

mark_mcnally_je1y ago

I'm a bit confused here, what promt do you use to start Aider and how do you just let Aider run wild so you can play cookie clicker?

harper1y ago

the prompts are generated from the planning steps. If you were to follow the prompts in the planning phase, you would get output that is clearly the "starting prompt"

that is the first thing you send to aider.

also - there was a joke below, but you can do --yes-always and it will not ask for confirmation. I find it does a pretty good job.

fragmede1y ago

Am I overthinking

    yes | aider

mark_mcnally_je1y ago

Hahahaa well that might work but I wish you could just say `aider --go-hog-wild`

fragmede1y ago

fwiw it doesn't/that was a joke. In some cases the LLM will suggest running a command that doesn't terminate (eg npm run dev to run a webserver), so it'd get stuck running that command just waiting for user input.

2 more replies

matsemann1y ago

Yeah, everyone is saying this article is great, but it's akin to "then draw rest of the owl". It's very detailed in planning, and then for execution just "paste prompt into claude". What prompt?

bill_lau191y ago

I learn this things from this blog: 1. Use multi turn with LLM tools to finish a job. 2. Work step by step.

sprobertson1y ago

Strange capitalization of atm as ATM in the HN title, but great tips in there

Philpax1y ago

HN will implicitly modify the title, including uppercasing acronyms. Very possible this was one of those changes.

snowwrestler1y ago

Spelling nit:

“Over my skis” ~ in over my head.

“Over my skies” ~ very far overhead. In orbit maybe?

matsemann1y ago

> “Over my skis” ~ in over my head.

Is that correct? Never heard the expression before, but as a skier if you're over your skis you're in control of them, while if you're backseated the skis will control you.

jdlshore1y ago

It seems to be currently popular corporate-speak for "overextended." I've heard it a bunch lately. Never really thought about whether it was accurate, though!

pyreal1y ago

Thanks for that clarification! I was wondering what skies had to do with skiing.

harper1y ago

fixed! thanks

zackify1y ago

Cline over everything for me

oars1y ago

Good for greenfield projects.

j / k navigate · click thread line to collapse

160 comments

briga1y ago

wilkystyle1y ago

100% agree.

My experience to date across the major LLMs is that they are quick to leap to complex solutions, and I find that the code often is much harder to maintain than if I were to do it myself.

> "We have a bargain to follow the pattern, and if the pattern is a good one then the code gets better. If the pattern is a bad one, then we exacerbate the problem."

EDIT: Forgot to add link

[0] https://www.youtube.com/watch?v=8bZh5LMaSmE&t=8m11s

williamcotton1y ago

But why follow the wrong abstraction and why try to build something that you don't fundamentally understand?

I've built some rather complex systems:

Guish, a bi-directional CLI/GUI for constructing and executing Unix pipelines: https://github.com/williamcotton/guish

WebDSL, fast C-based pipeline-driven DSL for building web apps with SQL, Lua and jq: https://github.com/williamcotton/webdsl

Search Input Query, a search input query parser and React component: https://github.com/williamcotton/search-input-query

jdlshore1y ago

jrvarela561y ago

ukuina1y ago

> turns into legacy as fast as the agent can spit out new code.

This is an important point. Unconstrained code generation lets you witness accelerated codebase aging in real-time.

HPsquared1y ago

LLM coding is quite well-suited to projects that apply the Unix philosophy. Highly modular systems that can be broken into small components that do one thing well.

StableAlkyne1y ago

I've found modularity to be helpful in general whether there's an LLM or not.

Easier to test, lower cognitive overload, and it's faster to onboard someone when they only need to understand a small part at a time.

infecto1y ago

You should not be structuring the code for a LLM alone but I have found that trying to be very modular has helped both my code as well as the ability to utilize LLM on top of it.

ziml771y ago

cheema331y ago

sejje1y ago

You can use LLMs and actually understand your code.

y1n01y ago

These are big legacy projects where I didn’t write the code to begin with, so having an AI partner would have been really nice.

jrexilius1y ago

danphilibin1y ago

daxfohl1y ago

Frankly getting used to doing this may help our communication with other engineers as well.

fragmede1y ago

promo from L5->L7 confirmed.

1 more reply

treetalker1y ago

Indeed!

P.S.: Extra credit for the Insane Clown Posse reference!

CamperBob21y ago

That's one of the key wins with o1-pro's deep research feature. The first thing it tends to do after you send a new prompt is ask you several questions, and they tend to be good ones.

One idea I really like here is asking the model to generate a todo list.

nijaru1y ago

I add something like “ask me any clarifying questions” to my my initial prompts. For larger requests, it seems to start a dialogue of refinement before providing solutions.

theturtle321y ago

Can confirm, this is an excellent tactic when working with LLMs!

bcoates1y ago

That lonely/downtime section at the end is a giant red flag for me.

getnormality1y ago

It sounds like LLMs are the new futzing with emacs configuration.

wilkystyle1y ago

Old and busted: Futzing around with my Emacs configuration. New hotness: Having an LLM do it for me.

krupan1y ago

anon70001y ago

It’s one of those things where a little upskilling can make a big impact. So many things in life need a bit of practice before they’re useful to you.

mdrzn1y ago

krupan1y ago

I think the difference here is it is not at all obvious to me that an LLM is a force multiplier on same the order as cars to legs.

Cars are pretty easy to observe in action doing what they promise to do. Driving a car is a very straightforward, mechanical, repeatable, intuitive operation.

Working with an LLM is not repeatable or straightforward.

I'm short, your analogy is not helping me

1 more reply

khqc1y ago

bcoates1y ago

Sure, but I just use OneNote, the company wiki, or like physical sticky notes for that. I'm not seeing the value of having the LLM give feedback aside from entertainment

triyambakam1y ago

It's more like waiting for the code to compile (or node_modules to install before npm improved)

flir1y ago

> constantly clean-rebuilding your whole (doomed) project, gives you both some downtime and a sense of forward momentum

ouch. You've thought about this, haven't you? Your ideas are intriguing to me, and I wish to subscribe to your newsletter.

rotcev1y ago

This is the first article I’ve come across that truly utilizes LLMs in a workflow the right way. I appreciate the time and effort the author put into breaking this down.

Maximizing efficiency in this kind of workflow requires a few key things:

- High typing speed – Minimizing time spent writing prompts means maximizing time generating useful code.

- A strong intuition for "what’s right" vs. "what’s wrong" – This will probably become less relevant as models improve, but for now, good judgment is crucial.

- Familiarity with each model’s strengths and weaknesses – This only comes with hands-on experience.

Right now, LLMs don’t work flawlessly out of the box for everyone, and I think that’s where a lot of the complaints come from—the "AI haterade" crowd expects perfection without adaptation.

For what it’s worth, I’ve built large-scale production applications using these techniques while writing minimal human code myself.

brokencode1y ago

> “I appreciate the time and effort the author put into breaking this down.”

Let’s be honest. The author was probably playing cookie clicker while this article was being written.

rd1y ago

Has anyone who evolved from a baseline of just using Cursor chat and freestyling to a proper workflow like this got any anecdata to share on noticeable improvements?

Does the time invested into the planning benefit you? Have you noticed less hallucinations? Have you saved time overall?

I’d be curious to hear because my current workflow is basically

1. Have idea

2. create-next-app + ShadCN + TailwindUI boilerplate

3. Cursor Composer on agent mode with Superwispr voice transcription

I’m gonna try the author’s workflow regardless, but would love to hear others opinions.

ghuntley1y ago

If you steer it and build a stdlib, you get better outcomes. See https://ghuntley.com/stdlib

fragmede1y ago

> I'm hesitant to give this advice away for free

With all the layoffs in our sector, I wouldn't blame you if you didn't share it, so thank you for sharing.

margalabargala1y ago

Yeah, don't they know how to hustle? I bet they're still asleep at 5am.

Seriously, though, it's really sad that not trying to profit off a discussion of industry tooling is something someone has to "push through".

risyachka1y ago

How does it help and not make it worse when it comes to layoffs?

1 more reply

MarkMarine1y ago

ghuntley1y ago

no problem <3

newtwilly1y ago

Do your rules count as frequent steering and lead to increased 'accuracy', or is that the 'accuracy' you're seeing with your current workflow, rules and all?

e12e1y ago

Looks like 70% of those rules would benefit from being shared, just like dot files and editor configs.

mike_hearn1y ago

manmal1y ago

LLMs forcing us to improve our documentation habits. Seriously though, many languages allow API doc generation out of comments. Maybe these docs can just be flattened into a file.

mike_hearn1y ago

Finding the right way to digest codebases is still an open problem. I haven't tried RAG, for instance. If things are well abstracted it in theory shouldn't be needed.

throwup2381y ago

orsenthil1y ago

> Aider + AI generated maps and user guides

How do do that? Especially the AI generated map?

mike_hearn1y ago

I have a custom script. It selects all the source files, strips any license headers and concatenates them like this:

    <source_file name="foo/bar/Baz.java">
    ...
    </source_file>

dimitri-vs1y ago

Yes, and then I keep going back to the basics:

- small .cursorrules file explaining what I am trying to build and why at a very high level and my tech stack

- a DEVELOPMENT.md file which is just a to-do/issue list for me that I tell cursor to update before every commit

- a temp/ directory where I dump contextual md and txt files (chat logs discussing feature, more detailed issue specs, etc.)

- a separate snippet management app that has my commonly used request snippets (write commit message, ask me clarify questions, update README, summarize chat for new session, etc.)

Otherwise it's pretty much what your workflow is.

cynicalpeace1y ago

I'm wondering the same thing.

Most of these workflows are just context management workflows and in Cursor it's so simple to manage the context.

For large files I just highlight the code and cmd+L. For short files, I just add them all by using /+downarrow

I constantly feed context like this and then usually come to a good solution for both legacy and greenfield features/products.

If I don't come to a good solution it's almost always because I didn't think through my prompt well enough and/or I didn't provide the correct context.

rollinDyno1y ago

sampton1y ago

The end of artisan frameworks - probably for the better.

balls1871y ago

It's likely the end of a lot of abstractions that made programming easier.

At some point, specialized code-gen transformer models should get really good at just spitting out the lowest level code required to perform the job.

yoz1y ago

Disagree. Some abstractions are still vital, and it's for the same reasons as always: communicate purpose and complexity concisely rather than hiding it.

1 more reply

hy4000days1y ago

This.

Future programming language designers are then answering questions like:

"How low-level can this language be while considering generally available models and hardware available can only generate so many tokens per second?",

"Do we have the language models generate binary code directly, or is it still more efficient time-wise to generate higher level code and use a compiler?"

"Do we ship this language with both a compiler and language model?"

"Do we forsake code readability to improve model efficiency?"

1 more reply

bee_rider1y ago

1 more reply

fastball1y ago

bambax1y ago

This is all fine for a solo dev, but how does this work with a team / squad, working on the same code base?

Having 7 different instances of an LLM analyzing the same code base and making suggestions would not just be economically wasteful, it would also be unpractical or even dangerous?

Outside of RAG, which is a different thing, are there products that somehow "centralize" the context for a team, where all questions refer to the same codebase?

sambo5461y ago

I've started substituting "human" for "LLM" when I read posts like these. Is having 7 different humans analyzing the same code base any less wasteful?

bambax1y ago

staindk1y ago

It did seem to take a while to index, even though my colleagues had been using Cursor for a while, so I'm likely misunderstanding something.

adr1an1y ago

Cody/ sourcegraph provide workspaces for teams/ enterprise. Probably for this and other reasons.

tarkin21y ago

Most new programmers forget the specification and execution plan part of programming.

I ended up finishing my side projects when I kept these in mind, rather than focusing on elegant code for elegant code's sake.

It seems the key to using LLMs successfully is to make them create a specification and execution plan, through making them ask /you/ questions.

If this skill--specification and execution planning--is passed onto LLMs, along with coding, then are we essentially souped-up tester-analysts?

fullstackwife1y ago

Looks similar to my experience, except this part:

> if it doesn’t work, Q&A with aider to fix

I fix errors myself, because LLMs are capable of producing large chunks of really stupid/wrong code, which needs to be reverted, and thats why it makes sense to see the code at least once.

Also I used to find myself in a situation when I tried to use LLM for the sake of using LLM to write code (waste of time)

codeisawesome1y ago

keyle1y ago

I have been using LLM for a long time but these prompts ideas are fantastic; they really opened up a world for me.

I also tried it and it worked a charm, the LLM did respect context and the step by step approach, poking holes in my ideas. Amazing work.

ggulati1y ago

Nice, I coincidentally wrote a blog post today exploring workflows as well: https://ggulati.wordpress.com/2025/02/17/cursorai-for-fronte...

Your workflow is much more polished, will definitely try it out for my next project

fragmede1y ago

> paste in prompt into claude copy and paste code from claude.ai into IDE

is more polished? What's your workflow, banging rocks together?

ggulati1y ago

More or less, I tried out Cursor for the first time a week ago. So very much in the newbie stage and looking to learn

shoemakersteve1y ago

This made me laugh audibly. Thank you.

harper1y ago

let me know how it works!

hnuser1234561y ago

Looks like your blog crashed, I've been wanting to read it

hnuser1234561y ago

Thank you for fixing it

Isamu1y ago

I’m curious, is adding “do not hallucinate” to prompts effective in preventing hallucinations? The author does this.

watt1y ago

simonw1y ago

Apple were using that in their Apple Intelligence system prompts last year, I don't know if they still have that in there. https://simonwillison.net/2024/Aug/6/apple-intelligence-prom...

I have no idea if it works or not!

harper1y ago

I added it because of the apple prompts! I figured it is worth a try. The results are good, but i did not test it extensively

becquerel1y ago

krainboltgreene1y ago

It seems absurd, but I suppose it’s the same as misspelling with similar enough trigrams as to get the best autocorrect results.

cipehr1y ago

What am I doing wrong or what am I missing? My experience has been so underwhelming I just don’t understand the hype for why people use Claude over something else.

Sorry I know there are many models out there, and Claude is probably better than 99% of them. Can someone help me understand the value of it over o1/o3? I honestly feel like I like 4o better.

/frustration-rant

btucker1y ago

It seems like you might be trying to use it like a search engine, which is a common mistake people make when first trying LLMs. LLMs are not like Google.

Tools like cursor and the like make this process very easy. You can also set up a local MCP server so the LLM can get the context and tools it needs on its own.

cipehr1y ago

Thank you very much for the ideas here, i will try the approach of giving it context. I havent got into cursor, since i use helix and intellij… i need to look into the MCP server thing

Thanks again!

foretop_yardarm1y ago

Giving examples of inputs and outputs can also help

1 more reply

dimitri-vs1y ago

cipehr1y ago

Awesome, thanks for the reassurance. I’ll stick with it and keep trying to improve my usage

hooverd1y ago

jack_pp1y ago

hooverd1y ago

For me, being able to notice when you mess up your own calculations. It doesn't help that we teach arithmetic operations ass-backwards (smallest to largest instead of largest to smallest).

junto1y ago

Something I’ve started to do recently is mob programming with LLM’s.

I act as the director, creativity and ideas person. I have one LLM that implements, and a second LLM that critiques and suggests improvements and alternatives.

krupan1y ago

I hope this is a joke, but I'm guessing it isn't, lol!

triyambakam1y ago

It's a good idea. Get diverse model output.

jacooper1y ago

https://news.ycombinator.com/item?id=43057907

Other than that a great article! Very insightful.

harper1y ago

I actually think it is going to be way worse than you are suggesting. I think that the LLM codegen is going to replace most if not all of software eng workflow and teams that we see today.

Software is going to be prompt wrangling with some acceptance testing. Then just prompt wrangling.

I don't have a lot of hope for the software profession to survive.

avandekleut1y ago

randomcatuser1y ago

> I really want someone to solve this problem in a way that makes coding with an LLM a multiplayer game. Not a solo hacker experience. There is so much opportunity to fix this and make it amazing.

This i think is the grand vision -- what could it look like?

in my mind programming should look like a map -- you can go anywhere, and there'll be things happening. and multiple people.

If anyone wants to work on this (or have comments, hit me up!)

blah22441y ago

abrookewood1y ago

Mise is great - it's an alternative to ASDF and remains call compatible from memory, but is much faster.

krupan1y ago

harper1y ago

it would have taken me an hour or two to do the same script.

this means i can spend a lot more time with the fam, hacking on more things, and messing about.

triyambakam1y ago

Speed. The abstraction layer has moved up. You probably aren't writing machine code anymore.

runoisenze1y ago

Great write up! Roughly how many Claude tokens are you using per month with this workflow? What’s your monthly API costs?

Also what do you mean by “I really want someone to solve this problem in a way that makes coding with an LLM a multiplayer game. Not a solo hacker experience.” ?

harper1y ago

most of this is aider / codegen:

Total tokens in: 26,729,994 Total tokens out: 1,553,284

Last month anthropic bill was $89.30

fragmede1y ago

Have you tried using OpenHands so you can just give it the todo.md until it gets stuck/is finished?

harper1y ago

when it first was released i played with it - but haven't since. i should try again.

bionhoward1y ago

thornewolf1y ago

Ignore the noncompetes. Never get sued. If you do, everyone else is on your side.

bionhoward1y ago

Meh, why pay to teach someone else’s bot? I’m sticking with open source

biddit1y ago

Form a Nonprofit X and a Corp Y:

Noprofit X publishes outputs from competing AI, which is not copyrightable.

Corp Y injests content published by Nonprofit X.

dfltr1y ago

> Legacy modern code

As opposed to Vintage Pioneer code?

harper1y ago

in my experience, there is quite a spectrum of legacy code.

Legacy modern code would be anything from the last 5-10 years. Vintage Pioneer code (which i have both initialized, and maintained) is more than 20 years old.

I am trying not to be a vintage pioneer these days.

mrklol1y ago

About the first step, you probably also need some kind of context that the LLM has most information to iterate with you about the new feature idea.

so either you put the whole codebase into the context (will mostly lead to problems as tokens are limited) or you have some kind of summary with your current features etc.

Or you do some kind of "black box" iterations, which I feel won’t be that useful for new features, as the model should know about current features etc?

What’s the way here?

maelito1y ago

Given a 3 648 318 tokens repository (number from Repomix), I'm not sure what would be the cost of using a leading LLM to analyse it and ask improvements.

Isn't the input token number way more limited than that ?

This is part is unclear to me in the "non-Greenfield" part of the article.

kridsdale31y ago

Well, do you as a human have the whole codebase loaded in to your memory with the ability to mentally reason with it? No, you work on a small scope at a time.

layer81y ago

You may work in a limited scope at a time, but you are aware how it fits into the larger scope, and more often than not you actually have to connect things across different scopes.

kridsdale31y ago

jack_pp1y ago

Haven't tried it personally but it should work

1 more reply

lanthissa1y ago

you do the same thing with the llm, you have it describe the api of modules not related to your code and that in place of those segments of the code.

bcoates1y ago

thedeep_mind1y ago

This is effing great...thanks for sharing your experience.

I was just wondering how to give my edits back to in-browser tools like Claude or ChatGPT, but the idea of repo mix is great, will try!

Although I have been flying bit with copilot in vscode, so right now I have essentially two AI, one for larger changes (in the browser), and then minor code fixes (in vscode).

ChrisRob1y ago

debian31y ago

Use vs code insider if you can. They double the context size and it really makes a difference. You get 128k input token.

psadri1y ago

One more tool he could make is one to wrap that entire process so there is less copy/pasting needed.

sejje1y ago

Aider, a tool he uses, can do that automatically. He could just use that feature.

harper1y ago

Yea - i have had OK luck with the architect mode. but was using that as part of this (using deepseek for reasoning) and it good. but oh so slow.

Ultimately, i would love to just use one tool

sejje1y ago

I don't mean the architect mode, I mean the copy-paste mode.

https://aider.chat/docs/usage/copypaste.html

pyreal1y ago

nickrj1y ago

https://github.com/simonw/llm

It is linked to in the article - a brilliant utility from Simon.

fallinditch1y ago

Great post and discussion.

Also, don't forget that your favorite AI tools can be of great help with the factors that cause us to make software: research, subject expertise, marketing, business planning, etc.

insin1y ago

I liked the bit where he asked it not to hallucinate

jdenning1y ago

Question to folks with good workflows: Are you using tools like DSPy to generate prompts? Any other tools/tips about managing prompts?

harper1y ago

i really wanted to use DSPy to generate prompts, but it wasn't quite as compatible with my workflow as i wanted. I love the idea tho - code instead of strings.

i will dig in again. It is an exciting idea.

mark_mcnally_je1y ago

I'm a bit confused here, what promt do you use to start Aider and how do you just let Aider run wild so you can play cookie clicker?

harper1y ago

the prompts are generated from the planning steps. If you were to follow the prompts in the planning phase, you would get output that is clearly the "starting prompt"

that is the first thing you send to aider.

also - there was a joke below, but you can do --yes-always and it will not ask for confirmation. I find it does a pretty good job.

fragmede1y ago

Am I overthinking

    yes | aider

mark_mcnally_je1y ago

Hahahaa well that might work but I wish you could just say `aider --go-hog-wild`

fragmede1y ago

2 more replies

matsemann1y ago

Yeah, everyone is saying this article is great, but it's akin to "then draw rest of the owl". It's very detailed in planning, and then for execution just "paste prompt into claude". What prompt?

bill_lau191y ago

I learn this things from this blog: 1. Use multi turn with LLM tools to finish a job. 2. Work step by step.

sprobertson1y ago

Strange capitalization of atm as ATM in the HN title, but great tips in there

Philpax1y ago

HN will implicitly modify the title, including uppercasing acronyms. Very possible this was one of those changes.

snowwrestler1y ago

Spelling nit:

“Over my skis” ~ in over my head.

“Over my skies” ~ very far overhead. In orbit maybe?

matsemann1y ago

> “Over my skis” ~ in over my head.

Is that correct? Never heard the expression before, but as a skier if you're over your skis you're in control of them, while if you're backseated the skis will control you.

jdlshore1y ago

It seems to be currently popular corporate-speak for "overextended." I've heard it a bunch lately. Never really thought about whether it was accurate, though!

pyreal1y ago

Thanks for that clarification! I was wondering what skies had to do with skiing.

harper1y ago

fixed! thanks

zackify1y ago

Cline over everything for me

oars1y ago

Good for greenfield projects.

j / k navigate · click thread line to collapse