In my experience of these tools, including the flagship models discussed here, this is a deal-breaking problem. If I have to waste time re-prompting to make progress, and reviewing and fixing the generated code, it would be much faster if I wrote the code from scratch myself. The tricky thing is that unless you read and understand the generated code, you really have no idea whether you're progressing or regressing. You can ask the model to generate tests for you as well, but how can you be sure they're written correctly, or covering the right scenarios?
More power to you if you feel like you're being productive, but the difficult things in software development always come in later stages of the project[1]. The devil is always in the details, and modern AI tools are just incapable of getting us across that last 10%. I'm not trying to downplay their usefulness, or imply that they will never get better. I think current models do a reasonably good job of summarizing documentation and producing small snippets of example code I can reuse, but I wouldn't trust them for anything beyond that.
[1]: https://en.wikipedia.org/wiki/Ninety%E2%80%93ninety_rule
Same thing I do without an LLM: I try to fix it myself!
> If I have to waste time re-prompting to make progress, and reviewing and fixing the generated code, it would be much faster if I wrote the code from scratch myself.
Definitely not in the cases I'm thinking about. This extends from "build me a boilerplate webapp that calls this method every time this form changes and put the output in that text box" (which would take me hours to learn how to do in any given web framework) to "find a more concise/idiomatic way to express this chain of if-statements in this language I'm unfamiliar with" (which I just wouldn't do if I don't much care to learn that particular language).
For the UI/boilerplate part, it's easy enough to tell if things are working or not, and for crucial components I'll at least write tests myself or even try to fully understand what it came up with.
I'd definitely never expect it to get the "business logic" (if you want to call it that for a hobby project) right, and I always double-check that myself, or outright hand-write it and only use the LLM for building everything around it.
> The devil is always in the details, and modern AI tools are just incapable of getting us across that last 10%.
What I enjoy most about programming is exactly solving complicated puzzles and fixing gnarly bugs, not doing things that could at least theoretically be abstracted into a framework (that actually saves labor and doesn't just throw it in an unknown form right back at me, as so many modern ones do) relatively easily.
LLMs more often than not allow me to get to these 10% much faster than I normally would.
https://github.com/williamcotton/search-input-query
https://github.com/williamcotton/guish
Both are non-trivial but certainly within the context window so they're not large projects. However, they are easily extensible due to the architecture I instructed as I was building them!
The first contains a recursive descent parser for a search query DSL (and much more).
The second is a bidirectional GUI for bash pipelines.
Both operate at the AST level, guish powered by an existing bash parser.
The READMEs have animated gifs so you can see them in action.
When the LLM gets stuck I either take over the coding myself or come up with a plan to break up the requests into smaller sized chunks with more detail about the steps to take.
It takes a certain amount of skill to use these tools, both with how the tool itself works and definitely with the expertise of the person wielding the tool!
If you have these tools code to good abstractions and good interfaces you can hide implementation details. Then you expose these interfaces to the LLM and make it easier and simpler to build on.
Like, once you've got an AST it's pretty much downhill from there to build tools that operate on said AST.
What it seems like a lot people assume the process is that you give the AI a relatively high level prompt that’s a description of features, and you get a back a fully functioning app that does everything you outlined.
In my experience (and I think what you are describing here), is that the initial feature-based prompt will often give you (some what impressively) a basic functioning app. But as you start iterating on that app, the high level feature-based prompts start not working very well pretty quickly. It then becomes more an exercise in programming by proxy — where you basically tell the AI what code to write/what changes are needed at a technical level in smaller chunks, and it saves you a lot of time by actually writing the proper syntax. The thing you still have know how to program to be able to accomplish this — (arguably, you have to be a fairly decent programmer who can already reasonably break down complicated tasks into small understandable chunks).
Furthermore, if you want to AI write good code with a solid architecture you pretty much have to tell it what to do from a technical level from the start — for example, here I imagine the AI didn’t come up with structuring things to work as the AST level on its own — you knew that would give you a solid architecture to build on, so you told it to do that.
As someone whose already a half decent programmer, I’ve found this process to be a pretty significant boon to my productivity, on the other hand beyond the basic POC app, I have a hard time seeing it living up the marketing hype of “Anyone can build an app using AI!” that’s being constantly spewed.
LLMs are tools that need to be learned. Good prompts aren’t hard, but they do take some effort to build.
So my workflow is to just review every bit of code the assistant generates and sometimes I ask the assistant (I'm using Cody) to revisit a particular portion of the code. It usually corrects and spits out a new variant.
My experience has been nothing short of spectacular in using assistants for hobby projects, sometimes even for checking design patterns. I can usually submit a piece of code and ask if the code follows a good pattern under the <given> constraints. I usually get a good recommendation that clearly points out the pros and cons of the said pattern.
Outside that context, the better way to use the tools is as a superpowered stack overflow search. Don't know how ${library} expects you to ${thing} in ${language}? Rather than just ask "I need to add a function in this codebase which..." and pastes it into your code ask "I need an example function which uses..." and use what it spits out as an example to integrate. Then you can ask "can I do it like..." and get some background on why you can/can't/should/shouldn't think about doing it that way. It's not 100% right or applicable, especially with every ${library}, ${thing}, and ${language} but it's certainly faster to a good answer most of the time than SO or searching. Worst case failure? You've spent a couple minutes to find you need to spend a lot of time reading through the docs to do you one off thing yourself still.
Even worse, the LLM will never tell you it doesn't know the answer, or that what you're trying to do is not possible, but will happily produce correct-looking code. It's not until you actually try it that you will notice an error, at which point you either go into a reprompt-retry loop, or just go read the source documentation. At least that one won't gaslight you with wrong examples (most of the time).
There are workarounds to this, and there are coding assistants that actually automate this step for you, and try to automatically run the code and debug it if something goes wrong, but that's an engineering solution to an AI problem, and something that doesn't work when using the model directly.
> Worst case failure? You've spent a couple minutes to find you need to spend a lot of time reading through the docs to do you one off thing yourself still.
It's not a couple of minutes, though. How do you know you've reached the limit of what the LLM can do, vs. not using the right prompt, or giving enough context? The answer always looks to be _almost_ there, so I'm always hopeful I can get it to produce the correct output. I've spent hours of my day in aggregate coaxing the LLM for the right answer. I want to rely on it precisely because I want to avoid looking at the documentation—which sometimes may not even exist or be good enough, otherwise it's back to trawling the web and SO. If I knew the LLM would waste my time, I could've done that from the beginning.
But I do appreciate that the output sometimes guides me in the right direction, or gives me ideas that I didn't have before. It's just that the thought of relying on this workflow to build fully-fledged apps seems completely counterproductive to me, but some folks seem to be doing this, so more power to them.
Then I had an idea: as it was a picture animation problem, I asked it to write it in CSS. Then I asked it to translate it to Python. Boom, it worked!
At this moment, I finally realized the value of knowing how to prompt. Most of the time it doesn't make a difference, but when things start to get complex, knowing how to speak with these assistants makes all the difference.