I didn't understand how bad it was until this weekend when I sat down and tried GPT-5, first without the thinking mode and then with the thinking mode, and it misunderstood sentences, generated crazy things, lost track of everything-- completely beyond how bad I thought it could possibly be.
I've fiddled with stories because I saw that LLMs had trouble, but I did not understand that this was where we were in NLP. At first I couldn't even fully believe it because the things don't fail to follow instructions when you talk about programming.
This extends to analyzing discussions. It simply misunderstands what people say. If you try to do this kind of thing you will realise the degree to which these things are just sequence models, with no ability to think, with really short attention spans and no ability to operate in a context. I experimented with stories set in established contexts, and the model repeatedly generated things that were impossible in those contexts.
When you do this kind of thing their character as sequence models that do not really integrate things from different sequences becomes apparent.
Sure, those models are cheaper, but we also don’t really know how an ecosystem with a stale LLM and up to date RAG would behave once context drifts sufficiently, because no one is solving that problem at the moment.
In other words, yes GPT-X might work well enough for most people, but the newer demo for ShinyNewModelZ is going to pull customers of GPT-X's in regardless of both fulfilling the customer needs. There is a persistent need for advancement (or at least marketing that indicates as much) in order to have positive numbers at the end of the churn cycle.
I have major doubts that can be done without trying to push features or SOTA models, without just straight lying or deception.
https://arstechnica.com/information-technology/2025/08/opena...