The first prompt (with o1) will get you 60% there, but then you have a different workflow. The prompts can get to a local minimum, where claude/gpt4/etc.. just can't do any better. At which point you need to climb back out and try a different approach.
I recommend git branches to keep track of this. Keep a good working copy in main, and anytime you want to add a feature, make a branch. If you get it almost there, make another branch in case it goes sideways. The biggest issue with developing like this is that you are not a coder anymore; you are a puppet master of a very smart and sometimes totally confused brain.
This is one fact that people seem to severely under-appreciate about LLMs.
They're significantly worse at coding in many aspects than even a moderately skilled and motivated intern, but for my hobby projects, until now I haven't had any intern that would even as much as taking a stab at some of the repetitive or just not very interesting subtasks, let alone stick with them over and over again without getting tired of it.
In my experience of these tools, including the flagship models discussed here, this is a deal-breaking problem. If I have to waste time re-prompting to make progress, and reviewing and fixing the generated code, it would be much faster if I wrote the code from scratch myself. The tricky thing is that unless you read and understand the generated code, you really have no idea whether you're progressing or regressing. You can ask the model to generate tests for you as well, but how can you be sure they're written correctly, or covering the right scenarios?
More power to you if you feel like you're being productive, but the difficult things in software development always come in later stages of the project[1]. The devil is always in the details, and modern AI tools are just incapable of getting us across that last 10%. I'm not trying to downplay their usefulness, or imply that they will never get better. I think current models do a reasonably good job of summarizing documentation and producing small snippets of example code I can reuse, but I wouldn't trust them for anything beyond that.
[1]: https://en.wikipedia.org/wiki/Ninety%E2%80%93ninety_rule
If software engineering should look like this, oh boy am I happy to be retiring in mere 17 years (fingers crossed) and not having to spend more time in such work. No way any quality complex code can come up from such approach, and people complain about quality of software now .
So you're basically bruteforcing development, a famously efficient technique for... anything.
site:github.com map comparison
I guess the difference, is that my way uses dramatically less time and resources, but requires directly acknowledging the original coders instead of relying on the plagiarism-ish capabilities of reguritating something through an LLM.
Or
Can you come up easily with many things that LLMs have no clue of and hence will fail?
I’ve been using Sonnet 3.5 to code and I’ve managed to build multiple full fledged apps, including paid ones
Maybe they’re not perfect, but they work and I’ve had no complaints yet. They might not scale to become the next Facebook, but not everything has to scale
Going to some new place meant getting a map, looking at it, making a plan, following the plan, keeping track on the map, that sort of thing.
Then I traveled somewhere new, for the first time, with GPS and a navigation sofware. It was quite impressive, and rather easier. I got to my destination the first time, without any problems. And each time after that.
But I did remark that I did not learn the route. The 10th time, the 50th time, I still needed the GPS to guide me. And without it, I would have to start the whole thing from scratch: get a map, make a plan, and so on.
Having done the "manual" navigation with maps lots of times before, it never worries me what I would do without a GPS. But if you're "born" with the GPS, I wonder what you do when it fails.
Are you not worried how you would manage your apps if for some reason the AIs were unavailable?
If anyone else is frustrated by this experience, I've found that changing the setting in Google Maps to have the map always point north has helped me with actually building a mental model of directions. I found instead of just following the line, it forced me to think about whether I'm going north, south, east, or west for each directions.
Make hay while the sun shines, friends. It might not last forever, but neither will you!
I never worried about what would happen if internet were to become unavailable. Given that it’s become one an essential service I just trust that powers that be will make sure to get it back up.
Prior to an iPhone I’d have the general lay of a city memorised within 10min of landing, using a paper tourist map, and probably never feel disoriented, let alone lost.
This morning I walked 2 blocks further than needed (of a 1 block walk) because I wasn’t at all oriented while following Google maps.
I won’t spell out the AI comparison, other than I think more “apps” will be created, and predictable “followed the GPS off a bridge” revelations.
Python/JS and their ecosystem replacing OS hosted C/C++ which replaced bare metal Assembly which replaced digital logic which replaced analog circuits which replaced mechanical design as the “standard goto tool” for how to create programs.
Starting with punchcard looms and Ada Lovelace maybe.
In every case we trade resource efficiency and lower level understanding for developer velocity and raise the upper bound on system complexity, capability, and somehow performance (despite the wasted efficiency).
>I played around a lot with code when I was younger. I built my first site when I was 13 and had a good handle on Javascript back when jQuery was still a pipe dream.
>Started with the Codecademy Ruby track which was pretty easy. Working through RailsTutorial right now.
posted on April 15, 2015, https://news.ycombinator.com/item?id=9382537
>I've been freelancing since I was 17. I've dabbled in every kind of online trade imaginable, from domain names to crypto. I've built and sold multiple websites. I also built and sold a small agency.
>I can do some marketing, some coding, some design, some sales, but I'm not particularly good at any of those in isolation.
posted on Jan 20, 2023, https://news.ycombinator.com/item?id=34459482
So I don't really understand where this claim of only "6 months of coding experience" is coming from, when you clearly have been coding on and off for multiple decades.
Find a way to work around it.
Everybody ships nasty bugs in production that he himself might find impossible to debug, everybody.
Thus he will do the very same thing me, you or anybody else on this planet do, find a second pair of eyes, virtually or not, paying or not.
Do you think you could you maintain and/or debug someone else's application?
Most of the things I’ve built are fun things
See: GoUnfaked.com and PlaybookFM.com as examples
PlaybookFM.com is interesting because everything from the code to the podcasts to the logo are AI generated
It's a slightly orthogonal way of thinking about this but if you are solving real problems, you get away with so much shit, it's unreal.
Maybe Google is not gonna let you code monkey on their monorepo, but you do not have to care. There's enough not-google in the world, and enough real problems.
In fact, my main reason for not doing any web development is that I find the amount of layers of abstraction and needless complexity for something that should really be simple quite deterring.
I'm sure e.g. React and GraphQL allow people to think about web apps in really elegant and scalable ways, but the learning curve is just way more than I can justify for a side project or a one-off thing at work that will never have more than two or three users opening it once every few months.
The browser is a great place to build voice chat, 3d, almost any other experience. I expect a renewed interest in granting fuller capabilities to the web, especially background processing and network access.
How about we go back to thick clients, with LLMs the effort required to do that for multiple operating systems will also be reduced, no?
MetHacker.io (has a lot of features I had to remove because of X API’s new pricing - see /projects on it)
GoUnfaked.com
PlaybookFM.com
TokenAI.dev (working with blowfish to remove the warning flag)
Maybe I'm "holding it wrong" -- I mean using it incorrectly.
True it renders quite interesting mockups and has React code behind it -- but then try and get this into even a demoable state for your boss or colleagues...
Even a simple "please create a docker file with everything I need in a directory to get this up and running"...doesn't work.
Docker file doesnt work (my fault maybe for not expressing I'm on Arm64), app is miss configured, files are in the wrong directories, key things are missing.
Again just my experience.
I find Claude interesting for generating ideas-- but I have a hard time seeing how a dev with six months experience could get multiple "paid" apps out with it. I have 20 years (bla, bla) experience and still find it requires outrageous hand holding for anything serious.
Again I'm not doubting you at all -- I'm just saying me personally I find it hard to be THAT productive with it.
My only complaints are:
a) that it's really easy to hit the usage limit, especially when refactoring across a half dozen files. One thing that'd theoretically be easyish to fix would be automatically updating files in the project context (perhaps with an "accept"/"reject" prompt) so that the model knows what the latest version of your code is without having to reupload it constantly.
b) it oscillating between being lazy in really annoying ways (giving largeish code blocks with commented omissions partway through) and supplying the full file unnecessarily and using up your usage credits.
My hope is that Jetbrains give up on their own (pretty limited) LLM and partner with Anthropic to produce a super-tight IDE native integration.
At least 95% of the code was generated by AI (I reached the limit so had to add final bits on my own).
POCs and demos are easy to build by anyone these days. The last 10% is what separates student projects from real products.
any engineer who has spent time in the trenches understands that fixing corner cases in code produced by inexperienced engineers consumes a lot of time.
in fact, poor overall design and lack of diligence tanks entire projects.
There’s a daily 2.5 million token limit that you can use up fairly quickly with 100K context
So they may very well have completed the whole program with Claude. It’s just the machine literally stopped and the human had to do the final grunt work.
What stops you from using AI to explain the code base?
I can't think of a worse llm than Claude.
Not necessarily because users can identify AI apps, but more because due to the lower barrier of entry - the space is going to get hyper-competitive and it'll be VERY difficult to distinguish your app from the hundreds of nearly identical other ones.
Another thing that worries me (because software devs in particular seem to take a very loose moral approach to plagiarism and basic human decency) is that it'll be significantly easier for a less scrupulous dev to find an app that they like, and use an LLM to instantly spin up a copy of it.
I'm trying not to be all gloom and doom about GenAI, because it can be really nifty to see it generate a bunch of boilerplate (YAML configs, dev opsy stuff, etc.) but sometimes it's hard....
Take this very post for example. Imagine an artist forum having daily front-page articles on AI, and most of the comments are curious and non-negative. That's basically what HackerNews is doing, but with developers instead. The huge culture difference is curious, and makes me happy with the posters on this site.
You attribute it to the difficulty of using AI coding tools. But such tools to cut out the programmer and make it available to the layman has always existed: libraries, game engines, website builders, and now web app builders. You also attribute it to the flooding of the markets. But the website and mobile markets are famously saturated, and yet there we continue making stuff, because we want to (and because quality things make more money).
I instead attribute it to our culture of free sharing (what one might call "plagiarism"... of ideas?!), adaptability, and curiosity. And that makes me hopeful.
People don't seem to realize that the same thing is going to happen to regular app development once AI tooling gets even easier.
I am looking forward to this type of real time app creation being added into our OSs, browsers, phones and glasses.
What do you see that being used for?
Surely, polished apps written for others are going to be best built in professional tools that live independently of whatever the OS might offer.
So I assume you're talking about quick little scratch apps for personal use? Like an AI-enriched version of Apple's Automator or Shortcuts, or of shell scripts, where you spend a while coahcing an AI to write the little one-off program you need instead of visually building a worrkflow or writing a simple script? Is that something you believe there's a high unmet need for?
This is an earnest question. I'm sincerely curious what you're envisioning and how it might supercede the rich variety of existing tools that seem to only see niche use today.
Once a class was full, you could still get in if someone who was selected for the classes changed their mind, which (at an unpredictable time) would result in a seat becoming available in that class until another student noticed the availability and signed up.
So I wrote a simple PHP script that loaded the page every 60 seconds checking, and the script would send me a text message if any of the classes I wanted suddenly had an opening. I would then run to a computer and try to sign up.
These are the kind of bespoke, single-purpose things that I presume AI coding could help the average person with.
“Send me a push notification when the text on this webpage says the class isn’t full, and check every 60 seconds”
Ask a bird what flying is good for and their answer will be encumbered by reality.
Kind of the opposite of “everything looks like a nail”.
Two ideas: "For every picture of food I take, create a recipe to recreate it so I can make it at home in the future" or "Create an app where I can log my food for today and automatically calculate the calories based on the food I put in".
https://github.com/williamcotton/search-input-query
Why multi-pass? So multiple semantic errors can be reported at once to the user!
The most important factor here is that I've written lexers and parsers beforehand. I was very detailed in my instructions and put it together piece-by-piece. It took probably 100 or so different chats.
Try it out with the GUI you see in the gif in the README:
git clone git@github.com:williamcotton/search-input-query.git
cd search-input-query/search-input-query-demo
npm install
npm run devits even documented on their site
https://support.anthropic.com/en/articles/9519189-project-vi...
Click the "Share" button in the upper right corner of your chat.
Click the "Share & Copy Link" button to create a shareable link and add the chat snapshot to your project’s activity feed.
/edit: i just checked. i think they had a regression? or at least i cannot see the button anymore. go figure. must be pretty recently, as i shared a chat just ~2-3 weeks agoStarted off with having it create funny random stories, to slowly creating more and more advanced programs.
It’s shocking how good 3.5 Sonnet is at coding, considering the size of the model.
We don't know the size of Claude 3.5 Sonnet or any other Anthropic model.
So pretty simple flow, totally not scalable for bigger projects.
I need to read and check Cursor AI which can also use Claude models.
In django i had it create a backend, set admin user, create requirements.txt and then do a whole frontend in vue as a test. It even can do screen testing and tested what happens if it puts a wrong login in.
There are plenty of website builder tools that will glue third party maps. Even the raw Google Maps API website will generate an HTML page with customized maps.
Next obvious steps: make it understand large existing programs, learn form the style of the existing code while avoiding to learn the bad style where it's present, and then contribute features or fixes to that codebase.
There are so many small tasks that I could, but until now almost never would automate (whether it's not worth the time [1] or I just couldn't bring myself to do it as I don't really enjoy doing it). A one-off bitmask parser at work here, a proof of concept webapp at home there – it's literally opened up a new world of quality-of-life improvements, in a purely quantitative sense.
It extends beyond UI and web development too: Very often I find myself thinking that there must be a smarter way to use CLI tools like jq, zsh etc., but considering how rarely I use them and that I do already know an ineffective way of getting what I need, up until now I couldn't justify spending the hours of going through documentation on the moderately high chance of finding a few useful nuggets letting me shave off a minute here and there every month.
The same applies to SQL: After plateauing for several years (I get by just fine for my relatively narrow debugging and occasional data migration needs), LLMs have been much better at exposing me to new and useful patterns than dry and extensive documentation. (There are technical documents I really do enjoy reading, but SQL dialect specifications, often without any practical motivation as to when to use a given construct, are really not it.)
LLMs have generally been great at that, but being able to immediately run what they suggest in-browser is where Claude currently has the edge for me. (ChatGPT Plus can apparently evaluate Python, but that's server-side only and accordingly doesn't really allow interactive use cases.)
We’re getting there with some of the smaller open source models, but we’re not quite there yet. I’m looking forward to where we’ll be in a year!
In many professions, $5000 for tools is almost nothing.
If you want to pay that <$1k up front to just say "it was always just on my machine, nobody elses" then more power to you. Most just prefer this "pay as you go for someone else to have set it up" model. That doesn't imply it's unattainable if you want to run it differently though.
I know we all love dunking on how expensive Apple computers are, but for $5000 you would be getting a Mac Mini maxed-out with an M4 Pro chip with 14‑core CPU, 20‑core GPU, 16-core Neural Engine, 64GB unified RAM memory, an 8TB SSD and 10 Gigabit Ethernet.
M4 MacBook Pros start at $1599.
What I think GP was overlooking is newer mid range models like Qwen2.5-Coder 32B produce more than usable outputs for this kind of scenario on much lower end consumer (instead of prosumer) hardware so you don't need to go looking for the high memory stuff to do this kind of task locally, even if you may need the high memory stuff for serious AI workloads or AI training.