undefined | Better HN

0 pointstekacs7mo ago0 comments

For those who haven't seen, a little bit of early stuff:

Official OpenAI gpt-5 coding examples repo: https://github.com/openai/gpt-5-coding-examples (https://news.ycombinator.com/item?id=44826439)

Github leak: https://news.ycombinator.com/item?id=44826439

0 comments

0xFEE1DEAD7mo ago

I wish they wouldn't use JS to demonstrate the AI's coding abilities - the internet is full of JS code and at this point I expect them to be good at it. Show me examples in complex (for lack of a better word) languages to impress me.

I recently used OpenAI models to generate OCaml code, and it was eye opening how much even reasoning models are still just copy and paste machines. The code was full of syntax errors, and they clearly lacked a basic understanding of what functions are in the stdlib vs those from popular (in OCaml terms) libraries.

Maybe GPT-5 is the great leap and I'll have to eat my words, but this experience really made me more pessimistic about AI's potential and the future of programming in general. I'm hoping that in 10 years niche languages are still a thing, and the world doesn't converge toward writing everything in JS just because AIs make it easier to work with.

thewebguyd7mo ago

> I wish they wouldn't use JS to demonstrate the AI's coding abilities - the internet is full of JS code and at this point I expect them to be good at it. Show me examples in complex (for lack of a better word) languages to impress me.

Agreed. The models break down on not even that complex of code either, if it's not web/javascript. Was playing with Gemini CLI the other day and had it try to make a simple Avalonia GUI app in C#/.NET, kept going around in circles and couldn't even get a basic starter project to build so I can imagine how much it'd struggle with OCaml or other more "obscure" languages.

This makes the tech even less useful where it'd be most helpful - on internal, legacy codebases, enterprisey stuff, stacks that don't have numerous examples on github to train from.

0xFEE1DEAD7mo ago

> on internal, legacy codebases, enterprisey stuff

Or anything that breaks the norm really.

I recently wrote something where I updated a variable using atomic primitives. Because it was inside a hot path I read the value without using atomics as it was okay for the value to be stale. I handed it the code because I had a question about something unrelated and it wouldn't stop changing this piece of code to use atomic reads. Even when I prompted it not to change the code or explained why this was fine it wouldn't stop.

1 more reply

gedy7mo ago

> the internet is full of JS code and at this point I expect them to be good at it.

Isn't that the rub though? It's not an ex nihlo "intelligence", it's whatever stuff it's trained on and can derive completions from.

0xFEE1DEAD7mo ago

Yes, for me it is and it was even before this experience. But, you know, there's a growing crowd that believes AI is almost at AGI level and that they'll vibe code their way to a Fortune 100 company.

Maybe I spend too much time rage baiting myself reading X threads and that's why I feel the need to emphasize that AI isn't what they make it out to be.

1 more reply

kgeist7mo ago

The snake game they showcased - if you ask Qwen3-coder-30b to generate a snake game in JS - it generates the exact same layout, the exact same two buttons below, and the exact same text under the 2 buttons. It just regurgigates its training data.

_flux7mo ago

I used ChatGPT to convert an old piece of OCaml code of mine to Rust and while it didn't really work—and I didn't expect it to—it seemed a very reasonable starting point to actually do the rest of the work manually.

robotpepi7mo ago

I've tried with many models to program in mathematica and sagemath; they're terrible, even with lots of hints.

rkozik19897mo ago

Honestly, why would anyone find this information useful? Creating a brand new greenfield project is a terrible test. Because literally anything it outputs as long as it looks good as long as it works following the happy path. Coding with LLMs falls apart in situations where complex reasoning is required. Situations such as having debugging issues in a service where there's either no framework in use or they've significantly modified a framework to make it better suit the authors needs.

hombre_fatal7mo ago

Yeah, I guess it's just the easiest thing to generate and evaluate.

A more useful demonstration like making large meaningful changes to a large complicated codebase would be much harder to evaluate since you need to be familiar with the existing system to evaluate the quality of the transformation.

Would be kinda cool to instead see diffs of nontrivial patches to the Ruby on Rails codebase or something.

gedy7mo ago

> Honestly, why would anyone find this information useful?

This seems to impress the mgmt types a lot, e.g. "I made a WHOLE APP!", when basically what most of this is is frameworks and tech that had crappy bootstrapping to begin with (React and JS are rife with this, in spite of their popularity).

cuuupid7mo ago

These are honestly pretty disappointing :/ this quality was possible with Claude Code months ago

tekacsOP7mo ago

Yep, agreed -- the repo is talking about 'one prompt with an agentic coding platform, but... at least here there's nothing particularly new.

Will be interesting to see what pushing it harder does – what the new ceiling is. 88% on aider polyglot is pretty good!

j / k navigate · click thread line to collapse

0 comments

0xFEE1DEAD7mo ago

thewebguyd7mo ago

This makes the tech even less useful where it'd be most helpful - on internal, legacy codebases, enterprisey stuff, stacks that don't have numerous examples on github to train from.

0xFEE1DEAD7mo ago

> on internal, legacy codebases, enterprisey stuff

Or anything that breaks the norm really.

1 more reply

gedy7mo ago

> the internet is full of JS code and at this point I expect them to be good at it.

Isn't that the rub though? It's not an ex nihlo "intelligence", it's whatever stuff it's trained on and can derive completions from.

0xFEE1DEAD7mo ago

Yes, for me it is and it was even before this experience. But, you know, there's a growing crowd that believes AI is almost at AGI level and that they'll vibe code their way to a Fortune 100 company.

Maybe I spend too much time rage baiting myself reading X threads and that's why I feel the need to emphasize that AI isn't what they make it out to be.

1 more reply

kgeist7mo ago

_flux7mo ago

robotpepi7mo ago

I've tried with many models to program in mathematica and sagemath; they're terrible, even with lots of hints.

rkozik19897mo ago

hombre_fatal7mo ago

Yeah, I guess it's just the easiest thing to generate and evaluate.

Would be kinda cool to instead see diffs of nontrivial patches to the Ruby on Rails codebase or something.

gedy7mo ago

> Honestly, why would anyone find this information useful?

cuuupid7mo ago

These are honestly pretty disappointing :/ this quality was possible with Claude Code months ago

tekacsOP7mo ago

Yep, agreed -- the repo is talking about 'one prompt with an agentic coding platform, but... at least here there's nothing particularly new.

Will be interesting to see what pushing it harder does – what the new ceiling is. 88% on aider polyglot is pretty good!

j / k navigate · click thread line to collapse