And also, there are unrelated complaints of "GPT-5 can't solve the same problems 4 did". Those were very real too, and meant OpenAI did a wrong thing.
Correct, but that's true for all bugs.
In this case, the deeper bug was the AI having a training reward model based too much on user feedback.
If you have any ideas how anyone might know what "too much" is in a training reward, in advance of trying it, everyone in AI alignment will be very interested, because that's kinda a core problem in the field.
When it was introduced, the question to ask wasn't "will it go wrong" - it was "how exactly" and "by how much". Reward hacking isn't exactly a new idea in ML - and we knew with certainty that it was applicable to human feedback for years too. Let alone a proxy preference model made to mimic the preferences of an average user based on that human feedback. I get that alignment is not solved, but this wasn't a novel, unexpected pitfall.
When the GPT-4o sycophancy debacle was first unfolding, the two things that came up in AI circles were "they trained on user feedback, the stupid fucks" and "no fucking way, even the guys at CharacterAI learned that lesson already".
Guess what. They trained on user feedback. They completely fried the AI by training it on user feedback. How the fuck that happened at OpenAI and not at Bob's Stupid Sexy Chatbots is anyone's guess.
I think OpenAI is only now beginning to realize how connected some people are to their product and that the way their models behave has a huge impact.
1) Alters your trust value for correctness. I would assume some trust it more because it sounds aware like a human and is trained on a lot of data, and some trust it less because a robot should just output the data you asked for.
2) When asking questions, turning the temperature up was meant to improve variability and being more "lifelike", which of course would mean not return the most probable tokens during inference, meaning (even) less accuracy.
A third one being confidently outputting answers even when none exist was of course a more fundamental issue with the technology, but was absolutely made worse by having an extra page of useless flowery output.
I can't say I predicted this specific effect, but it was very obvious from the get-go that there was no upside to those choices.
Instead it sounds like they rushed to release this as quickly as possible, skipping all sorts of testing, and people died as a result.