This is a fantastic piece, very timely, evidently well-researched, and also well-written. Judging by the little that I know, it's accurate. Thank you for doing the work and sharing it with the world.
OpenAI may be in a more tenuous competitive position than many people realize. Recent anecdotal evidence suggests the company has lost its lead in the AI race to Anthropic.[b]
Many people here, on HN, who develop software prefer Claude, because they think it's a better product.[c]
Is your understanding of OpenAI's current competitive position similar?
---
[a] You may want to provide proof online that you are who you say you are: https://en.wikipedia.org/wiki/On_the_Internet%2C_nobody_know...
[b] https://www.latimes.com/business/story/2026-04-01/openais-sh...
[c] For example, there are 2x more stories mentioning Claude than ChatGPT on HN over the past year. Compare https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=tru... to https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=tru...
The piece captures some of the anxieties within OpenAI right now about their competitive position. This obviously ebbs and flows but of late there has been much focus on Anthropic's relative position. We of course mention the allegations of "circular deals" and concerns about partners taking on debt.
I was asking more about your informed view on how OpenAI's technology, products, and roadmap are perceived, particularly by customers and partners, in comparison to those of competitors.
If you have an opinion about that, everyone here would love to hear about it.
As it turns out, and what I’m kind of going with for this LLM shit, is that it’ll play out exactly how you think it will. The companies are all too big to fail, with billionaire backers who would rather commit fraud than lose money.
He's had so many conversations that he likely has a sense of how perceptions of the company and its offerings have changed.
I'm curious.
The problem is sometimes on paper everything people like Sam Altman do is legal, despite it harming so many. We've literally had a major RAM producer pull off the consumer RAM market. I feel like Sam Altman should be investigated and heavily scrutinized. He kind of is the biggest bubble in the AI bubble, we're letting him fester too far into it too, and these circular deals have seemingly somewhat stopped for now, but it might only get worse.
I guess my question was more, if the article author was the judge of fate or morality, what should happen?
As to AI and Sam, I think it’s too early to tell what effects will be. So we should adopt non judgement, build good ourselves and see what unfolds.
No comment on the CEO: I just find the product superior in everything but UI/UX and conversation. It's better at quality code.
Both codex and Claude code fail when it comes to extremely sophisticated programming for distributed systems
For the few times I've used both models side by side on more typical tasks (not so much web stuff, which I don't do much of, but more conventional Python scripts, CLI utilities in C, some OpenGL), they seem much more evenly matched. I haven't found a case where Claude would be markedly superior since Codex 5.2 came out, but I'm sure there are plenty. In my view, benchmarks are completely irrelevant at this point, just use models side by side on representative bits of your real work and stick with what works best for you. My software engineer friends often react with disbelief when I say I much prefer Codex, but in my experience it is not a close comparison.
Is there one that you prefer for, i dunno, physics?
Gemini seems to be the worst of the three, and some open-weight models are not too bad (like Kimi k2.5). Cursor is still pretty good, and copilot just really really sucks.
LLMs aren't able to achieve 100% correctness of every line of code. But luckily, 100% correctness is not required for debugging. So its better at that sort of thing. Its also (comparatively) good at reading lots and lots of code. Better than I am - I get bogged down in details and I exhaust quickly.
An example of broad work is something like: "Compile this C# code to webassembly, then run it from this go program. Write a set of benchmarks of the result, and compare it to the C# code running natively, and this python implementation. Make a chart of the data add it to this latex code." Each of the steps is simple if you have expertise in the languages and tools. But a lot of work otherwise. But for me to do that, I'd need to figure out C# webassembly compilation and go wasm libraries. I'd need to find a good charting library. And so on.
I think its decent at debugging because debugging requires reading a lot of code. And there's lots of weird tools and approaches you can use to debug something. And its not mission critical that every approach works. Debugging plays to the strengths of LLMs.
Last one is from yesterday: https://news.ycombinator.com/item?id=47660925
I've been working on a wide range of relatively projects and I find that the latest GPT-5.2+ models seem to be generally better coders than Opus 4.6, however the latter tends to be better at big picture thinking, structuring, and communicating so I tend to iterate through Opus 4.6 max -> GPT-5.2 xhigh -> GPT-5.3-Codex xhigh -> GPT-5.4 xhigh. I've found GPT-5.3-Codex is the most detail oriented, but not necessarily the best coder. One interesting thing is for my high-stakes project, I have one coder lane but use all the models do independent review and they tend to catch different subsets of implementation bugs. I also notice huge behavioral changes based on changing AGENTS.md.
In terms of the apps, while Claude Code was ahead for a long while, I'd say Codex has largely caught up in terms of ergonomics, and in some things, like the way it let's you inline or append steering, I like it better now (or where it's far, far, ahead - the compaction is night and day better in Codex).
(These observations are based on about 10-20B/mo combined cached tokens, human-in-the-loop, so heavy usage and most code I no longer eyeball, but not dark factory/slop cannon levels. I haven't found (or built) a multi-agent control plane I really like yet.)
I do regular evaluation of both codex and Claude (though not to statistical significance) and I’m of the opinion there is more in group variance on outcome performance than between them.
Codex has been consistently better on almost every level.
* (an open source framework for 2D games in Godot 4.6 GDScript, mostly using AI to review existing code)
I enjoy using CC more and use it for non coding tasks primarily, but for anything complex (honestly most of what I do is not that complex), I feel like I am trading future toil for a dopamine hit.
There are two types of vaccine be coders. Those who review the code generated and those who don’t.
Either because they don’t understand code at all, or because they don’t have time and don’t care.
Code quality is only one factor. Naive vibe coders, who don’t code otherwise, rate performance based on output alone.
https://xcancel.com/RonanFarrow/status/2041127882429206532#m
(i found your comment surprising based on my daily hn reading recollection - i mostly read top N daily and feel i only occassionally see codex stories).
Unfortunately it probably doesn't even matter here on HN considering how brigaded down this story is predictably getting.
But yeah, it was a fantastic piece.