Official OpenAI gpt-5 coding examples repo: https://github.com/openai/gpt-5-coding-examples (https://news.ycombinator.com/item?id=44826439)
Github leak: https://news.ycombinator.com/item?id=44826439
I recently used OpenAI models to generate OCaml code, and it was eye opening how much even reasoning models are still just copy and paste machines. The code was full of syntax errors, and they clearly lacked a basic understanding of what functions are in the stdlib vs those from popular (in OCaml terms) libraries.
Maybe GPT-5 is the great leap and I'll have to eat my words, but this experience really made me more pessimistic about AI's potential and the future of programming in general. I'm hoping that in 10 years niche languages are still a thing, and the world doesn't converge toward writing everything in JS just because AIs make it easier to work with.
Agreed. The models break down on not even that complex of code either, if it's not web/javascript. Was playing with Gemini CLI the other day and had it try to make a simple Avalonia GUI app in C#/.NET, kept going around in circles and couldn't even get a basic starter project to build so I can imagine how much it'd struggle with OCaml or other more "obscure" languages.
This makes the tech even less useful where it'd be most helpful - on internal, legacy codebases, enterprisey stuff, stacks that don't have numerous examples on github to train from.
Or anything that breaks the norm really.
I recently wrote something where I updated a variable using atomic primitives. Because it was inside a hot path I read the value without using atomics as it was okay for the value to be stale. I handed it the code because I had a question about something unrelated and it wouldn't stop changing this piece of code to use atomic reads. Even when I prompted it not to change the code or explained why this was fine it wouldn't stop.
Isn't that the rub though? It's not an ex nihlo "intelligence", it's whatever stuff it's trained on and can derive completions from.
Maybe I spend too much time rage baiting myself reading X threads and that's why I feel the need to emphasize that AI isn't what they make it out to be.
A more useful demonstration like making large meaningful changes to a large complicated codebase would be much harder to evaluate since you need to be familiar with the existing system to evaluate the quality of the transformation.
Would be kinda cool to instead see diffs of nontrivial patches to the Ruby on Rails codebase or something.
This seems to impress the mgmt types a lot, e.g. "I made a WHOLE APP!", when basically what most of this is is frameworks and tech that had crappy bootstrapping to begin with (React and JS are rife with this, in spite of their popularity).
Will be interesting to see what pushing it harder does – what the new ceiling is. 88% on aider polyglot is pretty good!