But I always realize it's just smoke and mirrors - the actual quality of the code and the failure modes and stuff are just so much worse than claude and gemini.
And I write some code for my personal enjoyment, and I gave it to Claude 6-8 months back for improvement, it gave me a massive change log and it was quite risky so abandoned it.
I tried this again with Gemini last week, I was more prepared and asked it to improve class by class, and for whatever reasons I got better answers -- changed code, with explanations, and when I asked it to split the refactor in smaller steps, it did so. Was a joy working on this over the thanksgiving holidays. It could break the changes in small pieces, talk through them as I evolved concepts learned previously, took my feedback and prioritization, and also gave me nuanced explanation of the business objectives I was trying to achieve.
This is not to downplay claude, that is just the sequence of events narration. So while it may or may not work well for experienced programmers, it is such a helpful tool for people who know the domain or the concepts (or both) and struggle with details, since the tool can iron out a lot of details for you.
My goal now is to have another project for winter holidays and then think through 4-6 hour AI assisted refactors over the weekends. Do note that this is a project of personal interest so not spending weekends for the big man.
There is a learning curve with all of the LLM tools. It's basically required for everyone to go through the trough of disillusionment when you realize that the vibecoding magic isn't quite real in the way the influencers talk about it.
You still have to be involved in the process, steer it in the right direction, and review the output. Rejecting a lot of output and re-prompting is normal. From reading comments I think it's common for new users to expect perfection and reject the tools when it's not vibecoding the app for them autonomously. To be fair, that's what the hype influencers promised, but it's not real.
If you use it as an extension of yourself that can type and search faster, while also acknowledging that mistakes are common and you need to be on top of it, there is some interesting value for some tasks.
In other areas, it is as you say and you need to be on top of it constantly.
You're absolutely right re: the learning curve, and you're much more likely to hit an area where you need to be on top of it than one that it can do autonomously, at least without a lot of scaffolding in the form of sub-agents, and rules to follow, and agent loops with reviews etc., which takes a lot of time to build up, and often include a lot of things specific to what you want to achieve. Sorting through how much effort is worth it for those things for a given project will take time to establish.
That’s if I want quality. If I just want to prototype and don’t care, I’ll let it go. See what I like, don’t like and start over as detailed above.
The last article I could find on this is from 2020 though: https://www.cnbc.com/2020/04/06/new-jersey-seeks-cobol-progr...
JFC TLA OD...
So (again) we are just sharing anecdata
Somehow it doesn't get on my nerves (unlike Gemini with "Of course").
Interested, because I’ve been getting pretty good results with different tasks using the Codex.
Claude Sonnet 4.5 was able to figure out a way to resolve it eventually (around 7 fixes) and I let it create an rllib.md with all the fixes and pitfalls and am curious if feeding this file to the next experiment will lead to a one-shot. GPT-5 struggled more but haven't tried Codex on this yet so it's not exactly fair.
All done with Copilot in agent mode, just prompting, no specs or anything.
I thought it would be handy to use AI to make the code from the paper so a few months ago I tried to use Claude (not GPT, because I only have access to Claude) to recreate C++ code to implement the algorithms in this paper as practice for me in LLM use and it didn’t go well.
A few ideas how to make it work for you:
1. You gave a link to a PDF, but you did not describe how you provided the content of the PDF to the model. It might only have read the text with something like pdftotext, which for this PDF results in a garbled mess. It is safer to convert the pages to PNG (e.g. with pdftoppm) and let the model read it from the pages. A prompt like "Transcribe these pages as markdown." should be sufficient. If you can not see what the model did, there is a chance it made things up.
2. You used C++, but Python is much easier to write. You can tell the model to translate the code to C++ once it works in Python.
3. Tell the model to write unit tests to verify that the individual components work as intended.
4. Use Agent Mode and tell the model to print something and to judge whether the output is sensible, so it can debug the code.
The problem is that the "AI"s can cough up code examples based upon proprietary codebases that you, as an individual, have no access to. That creates a significant quality differential between coders who only use publicly available search (Google, Github, etc.) vs those who use "AI" systems.
Which makes sense for something that isn’t AI but LLM.