undefined | Better HN

0 pointsp1necone3mo ago0 comments

Every so often I try out a GPT model for coding again, and manage to get tricked by the very sparse conversation style into thinking it's great for a couple of days (when it says nothing and then finishes producing code with a 'I did x, y and z' with no stupid 'you're absolutely' right sucking up and it works, it feels very good).

But I always realize it's just smoke and mirrors - the actual quality of the code and the failure modes and stuff are just so much worse than claude and gemini.

0 comments

kshacker3mo ago

I am a novice programmer -- I have programmed for 35+ years now but I build and lose the skills moving between coder to manager to sales -- multiple times. Fresh IC since last week again :) I have coded starting with Fortran, RPG and COBOL and I have also coded Java and Scala. I know modern architecture but haven't done enough grunt work to make it work or to debug (and fix) a complex problem. Needless to say sometimes my eyes glaze over the code.

And I write some code for my personal enjoyment, and I gave it to Claude 6-8 months back for improvement, it gave me a massive change log and it was quite risky so abandoned it.

I tried this again with Gemini last week, I was more prepared and asked it to improve class by class, and for whatever reasons I got better answers -- changed code, with explanations, and when I asked it to split the refactor in smaller steps, it did so. Was a joy working on this over the thanksgiving holidays. It could break the changes in small pieces, talk through them as I evolved concepts learned previously, took my feedback and prioritization, and also gave me nuanced explanation of the business objectives I was trying to achieve.

This is not to downplay claude, that is just the sequence of events narration. So while it may or may not work well for experienced programmers, it is such a helpful tool for people who know the domain or the concepts (or both) and struggle with details, since the tool can iron out a lot of details for you.

My goal now is to have another project for winter holidays and then think through 4-6 hour AI assisted refactors over the weekends. Do note that this is a project of personal interest so not spending weekends for the big man.

Aurornis3mo ago

> I was more prepared and asked it to improve class by class, and for whatever reasons I got better answers

There is a learning curve with all of the LLM tools. It's basically required for everyone to go through the trough of disillusionment when you realize that the vibecoding magic isn't quite real in the way the influencers talk about it.

You still have to be involved in the process, steer it in the right direction, and review the output. Rejecting a lot of output and re-prompting is normal. From reading comments I think it's common for new users to expect perfection and reject the tools when it's not vibecoding the app for them autonomously. To be fair, that's what the hype influencers promised, but it's not real.

If you use it as an extension of yourself that can type and search faster, while also acknowledging that mistakes are common and you need to be on top of it, there is some interesting value for some tasks.

wiz21c3mo ago

For me the learning curve was learning to choose what is worth asking to Claude. After 3 months on it, I can reap the benefit: Claude produces the code I want right 80% of the time. I usually ask it: to create new functions from scratch (it truly shines at understanding the context of these functions by reusing other parts of the code I wrote), refactor code, create little tools (for example a chart viewer).

vidarh3mo ago

It really depends on what you're building. As an experiment, I started having Claude Code build a real-time strategy game a bit over a week ago, and it's done an amazing job, with me writing no code whatsoever. It's an area with lots of tutorials for code structure etc., and I'm guessing that helps. And so while I've had to read the code and tell it to refactor things, it has managed to do a good job of it with just relatively high level prodding, and produced a well-architected engine with traits based agents for the NPCs and a lot of well-functioning game mechanics. It started as an experiment, but now I'm seriously toying with building an actual (but small) game with it just to see how far it can get.

In other areas, it is as you say and you need to be on top of it constantly.

You're absolutely right re: the learning curve, and you're much more likely to hit an area where you need to be on top of it than one that it can do autonomously, at least without a lot of scaffolding in the form of sub-agents, and rules to follow, and agent loops with reviews etc., which takes a lot of time to build up, and often include a lot of things specific to what you want to achieve. Sorting through how much effort is worth it for those things for a given project will take time to establish.

1 more reply

boie00253mo ago

I appreciate this narrative; relatable to me in how I have experienced and watched others around me experience the last few years. It's as if we're all kinda-sorta following a similar "Dunning–Kruger effect" curve at the same time. It feels similar to growing up mucking around with a ppp connection and Netscape in some regards. I'll stretch it: "multimodal", meet your distant analog "hypermedia".

ikidd3mo ago

My problem with Gemini is how token hungry it is. It does a good job but it ends up being more expensive than any other model because it's so yappy. It sits there and argues with itself and outputs the whole movie.

mleo3mo ago

Breaking down requirements, functionality and changes into smaller chunks is going to give you better results with most of the tools. If it can complete smaller tasks in the context window, the quality will likely hold up. My go to has been to develop task documents with multiple pieces of functionality and sub tasks. Build one piece of functionality at a time. Commit, clear context and start the next piece of functionality. If something goes off the rails, back up to the commit, fix and rebase future changes or abandon and branch.

That’s if I want quality. If I just want to prototype and don’t care, I’ll let it go. See what I like, don’t like and start over as detailed above.

altmanaltman3mo ago

Interesting. From my experience, Claude is much better at stuff involving frontend design somehow compared to other models (GPT is pretty bad). Gemini is also good but often the "thinking" mode just adds stuff to my code that I did not ask it to add or modifies stuff to make it "better". It likes to 1 up on the objective a lot which is not great when you're just looking for it to do what you precisely asked it and nothing else.

bovermyer3mo ago

I have never considered trying to apply Claude/Gemini/etc. to Fortran or COBOL. That would be interesting.

kshacker3mo ago

I was just giving my history :) but yes I am sure this could actually get us out of the COBOL lock-in which requires 70 years old programmers to continue working.

The last article I could find on this is from 2020 though: https://www.cnbc.com/2020/04/06/new-jersey-seeks-cobol-progr...

1 more reply

Aurornis3mo ago

You can actually use Claude Code (and presumably the other tools) on non-code projects, too. If you launch claude code in a directory of files you want to work on, like CSVs or other data, you can ask it to do planning and analysis tasks, editing, and other things. It's fun to experiment with, though for obvious reasons I prefer to operate on a copy of the data I'm using rather than let Claude Code go wild.

2 more replies

tartoran3mo ago

I'm starting with Claude at work but did have an okay experience with OpenAi so far. For clearly delimited tasks it does produce working code more often than not. I've seen some improvement on their side compared to say, last year. For something more complex and not clearly defined in advance, yes, it does produce plausible garbage and it goes off the rails a lot. I was migrating a project and asked ChatGPT to analyze the original code base and produce a migration plan. The result seemed good and encouraging because I didn't know much about that project at that time. But I ended up taking a different route and when I finished the migration (with bits of help from ChatGPT) I looked at the original migration plan out of curiosity since I had become more familiar with the project by now. And the migration plan was an absolutely useless and senseless hallucination.

wahnfrieden3mo ago

Use Codex for coding work

herpdyderp3mo ago

On the contrary, I cannot use the top Gemini and Claude models because their outputs are so out place and hard to integrate with my code bases. The GPT 5 models integrate with my code base's existing patterns seamlessly.

ta126534213mo ago

Supply some relevant files of your codebase in the ClaudeAI project area in the right part of the browser. Usually it will understand your architecture, patterns, principles

herpdyderp3mo ago

I'm using AI in-editor, all the models have full access to my code base.

inquirerGeneral3mo ago

You realize on some level all of these sort of anecdotes, though, are simply random coincidence .

findjashua3mo ago

NME at all - 5.1 codex has been the best by far.

manmal3mo ago

How can you stand the excruciating slowness? Claude Code is running circles around codex. The most mundane tasks make it think for a minute before doing anything.

aschobel3mo ago

I use it on medium reasoning and it's decently quick. I only switch to gpt-5.1-codex-max xhigh for the most annoying problems.

wahnfrieden3mo ago

By learning to parallelize my work. This also solved my problem with slow Xcode builds.

1 more reply

findjashua3mo ago

i workshop a detailed outline w it first, and once i'm happy w the plan/outline, i let it run while i go do something else

pshirshov3mo ago

By my tests (https://github.com/7mind/jopa) Gemini 3 is somewhat better than Claude with Opus 4.5. Both obliterate Codex with 5.1

Incipient3mo ago

What's - roughly - your monthly spend when using ppt models? I only use fixed priced copilot, and my napkin maths says I'd be spending something crazy like $200/mo if I went ppt on the more expensive models.

2 more replies

viking1233mo ago

Codex is super cheap though even with the cheapest GPT subscription you get lots of tokens. I use 4.5 opus at work and codex at home tbh the differences are not that big if you know what you are doing.

andybak3mo ago

NME = "not my experience" I presume.

JFC TLA OD...

stevedonovan3mo ago

I've been getting great results from Codex. Can be a bit slow, but gets there. Writes good Rust, powers through integration test generation.

So (again) we are just sharing anecdata

sharyphil3mo ago

You're absolutely right!

Somehow it doesn't get on my nerves (unlike Gemini with "Of course").

jpalomaki3mo ago

Can you give some concrete example of programming problem task GPT fails to solve?

Interested, because I’ve been getting pretty good results with different tasks using the Codex.

gloosx3mo ago

Try to ask it to write some GLSL shaders. Just describe what you want to see and then try to run the shaders it outputs. It can output a UV-map or the simple gradient right, but when it comes to shaders a bit more complex it most of the time will not compile or run properly, sometimes mix GLSL versions, sometimes just straight make up things which don't work or output what you want.

kriro3mo ago

Library/API conflicts are the biggest pain point for me usually. Especially breaking changes. RLlib (currently 2.41.0) and Gymnasium (currently 0.29.0+) have ended in circles many times for me because they tend to be out of sync (for multi-agent environments). My go to test now is a simple hello world type card game like war, competitive multi-agent with rllib and gymnasium (pettingzoo tends to cause even more issues).

Claude Sonnet 4.5 was able to figure out a way to resolve it eventually (around 7 fixes) and I let it create an rllib.md with all the fixes and pitfalls and am curious if feeding this file to the next experiment will lead to a one-shot. GPT-5 struggled more but haven't tried Codex on this yet so it's not exactly fair.

All done with Copilot in agent mode, just prompting, no specs or anything.

throwaway311313mo ago

I posted this example before but academic papers on algorithms often have pseudo code but no actual code.

I thought it would be handy to use AI to make the code from the paper so a few months ago I tried to use Claude (not GPT, because I only have access to Claude) to recreate C++ code to implement the algorithms in this paper as practice for me in LLM use and it didn’t go well.

https://users.cs.duke.edu/~reif/paper/chen/graph/graph.pdf

threeducks3mo ago

I just tried it with GPT-5.1-Codex. The compression ratio is not amazing, so not sure if it really worked, but at least it ran without errors.

A few ideas how to make it work for you:

1. You gave a link to a PDF, but you did not describe how you provided the content of the PDF to the model. It might only have read the text with something like pdftotext, which for this PDF results in a garbled mess. It is safer to convert the pages to PNG (e.g. with pdftoppm) and let the model read it from the pages. A prompt like "Transcribe these pages as markdown." should be sufficient. If you can not see what the model did, there is a chance it made things up.

2. You used C++, but Python is much easier to write. You can tell the model to translate the code to C++ once it works in Python.

3. Tell the model to write unit tests to verify that the individual components work as intended.

4. Use Agent Mode and tell the model to print something and to judge whether the output is sensible, so it can debug the code.

1 more reply

cmarschner3mo ago

Completely failed for me running the code it changed in a docker container i keep running. Claude did it flawlessly. It absolutely rocks at code reviews but ir‘s terrible in comparison generating code

peab3mo ago

It really depends on what kind of code. I've found it incredible for frontend dev, and for scripts. It falls apart in more complex projects and monorepos

logicchains3mo ago

I find for difficult questions math and design questions GPT5 tends to produce better answers than Claude and Gemini.

munk-a3mo ago

Could you clarify what you mean by design questions? I do agree that GPT5 tends to have a better agentic dispatch style for math questions but I've found it has really struggled with data model design.

bsder3mo ago

At this point you are now forced to use the "AI"s as code search tools--and it annoys me to no end.

The problem is that the "AI"s can cough up code examples based upon proprietary codebases that you, as an individual, have no access to. That creates a significant quality differential between coders who only use publicly available search (Google, Github, etc.) vs those who use "AI" systems.

dieortin3mo ago

How would the AIs have access to proprietary codebases?

milutinovici3mo ago

Microsoft owns github

CheeseFromLidl3mo ago

Same experience here. The more commonly known the stuff it regurgitates is, the fewer errors. But if you venture into RF electronics or embedded land, beware of it turning into a master of bs.

Which makes sense for something that isn’t AI but LLM.

j / k navigate · click thread line to collapse

0 comments

kshacker3mo ago

And I write some code for my personal enjoyment, and I gave it to Claude 6-8 months back for improvement, it gave me a massive change log and it was quite risky so abandoned it.

Aurornis3mo ago

> I was more prepared and asked it to improve class by class, and for whatever reasons I got better answers

wiz21c3mo ago

vidarh3mo ago

In other areas, it is as you say and you need to be on top of it constantly.

1 more reply

boie00253mo ago

ikidd3mo ago

mleo3mo ago

That’s if I want quality. If I just want to prototype and don’t care, I’ll let it go. See what I like, don’t like and start over as detailed above.

altmanaltman3mo ago

bovermyer3mo ago

I have never considered trying to apply Claude/Gemini/etc. to Fortran or COBOL. That would be interesting.

kshacker3mo ago

I was just giving my history :) but yes I am sure this could actually get us out of the COBOL lock-in which requires 70 years old programmers to continue working.

The last article I could find on this is from 2020 though: https://www.cnbc.com/2020/04/06/new-jersey-seeks-cobol-progr...

1 more reply

Aurornis3mo ago

2 more replies

tartoran3mo ago

wahnfrieden3mo ago

Use Codex for coding work

herpdyderp3mo ago

ta126534213mo ago

Supply some relevant files of your codebase in the ClaudeAI project area in the right part of the browser. Usually it will understand your architecture, patterns, principles

herpdyderp3mo ago

I'm using AI in-editor, all the models have full access to my code base.

inquirerGeneral3mo ago

You realize on some level all of these sort of anecdotes, though, are simply random coincidence .

findjashua3mo ago

NME at all - 5.1 codex has been the best by far.

manmal3mo ago

How can you stand the excruciating slowness? Claude Code is running circles around codex. The most mundane tasks make it think for a minute before doing anything.

aschobel3mo ago

I use it on medium reasoning and it's decently quick. I only switch to gpt-5.1-codex-max xhigh for the most annoying problems.

wahnfrieden3mo ago

By learning to parallelize my work. This also solved my problem with slow Xcode builds.

1 more reply

findjashua3mo ago

i workshop a detailed outline w it first, and once i'm happy w the plan/outline, i let it run while i go do something else

pshirshov3mo ago

By my tests (https://github.com/7mind/jopa) Gemini 3 is somewhat better than Claude with Opus 4.5. Both obliterate Codex with 5.1

Incipient3mo ago

2 more replies

viking1233mo ago

andybak3mo ago

NME = "not my experience" I presume.

JFC TLA OD...

stevedonovan3mo ago

I've been getting great results from Codex. Can be a bit slow, but gets there. Writes good Rust, powers through integration test generation.

So (again) we are just sharing anecdata

sharyphil3mo ago

You're absolutely right!

Somehow it doesn't get on my nerves (unlike Gemini with "Of course").

jpalomaki3mo ago

Can you give some concrete example of programming problem task GPT fails to solve?

Interested, because I’ve been getting pretty good results with different tasks using the Codex.

gloosx3mo ago

kriro3mo ago

All done with Copilot in agent mode, just prompting, no specs or anything.

throwaway311313mo ago

I posted this example before but academic papers on algorithms often have pseudo code but no actual code.

https://users.cs.duke.edu/~reif/paper/chen/graph/graph.pdf

threeducks3mo ago

I just tried it with GPT-5.1-Codex. The compression ratio is not amazing, so not sure if it really worked, but at least it ran without errors.

A few ideas how to make it work for you:

2. You used C++, but Python is much easier to write. You can tell the model to translate the code to C++ once it works in Python.

3. Tell the model to write unit tests to verify that the individual components work as intended.

4. Use Agent Mode and tell the model to print something and to judge whether the output is sensible, so it can debug the code.

1 more reply

cmarschner3mo ago

peab3mo ago

It really depends on what kind of code. I've found it incredible for frontend dev, and for scripts. It falls apart in more complex projects and monorepos

logicchains3mo ago

I find for difficult questions math and design questions GPT5 tends to produce better answers than Claude and Gemini.

munk-a3mo ago

bsder3mo ago

At this point you are now forced to use the "AI"s as code search tools--and it annoys me to no end.

dieortin3mo ago

How would the AIs have access to proprietary codebases?

milutinovici3mo ago

Microsoft owns github

CheeseFromLidl3mo ago

Same experience here. The more commonly known the stuff it regurgitates is, the fewer errors. But if you venture into RF electronics or embedded land, beware of it turning into a master of bs.

Which makes sense for something that isn’t AI but LLM.

j / k navigate · click thread line to collapse