I hope this isn't an artifact from optimization for scores and not actual function. Likewise it would be disheartening but not unheard of for them to reduce the performance of the previous model when releasing a new one in order to make the upgrade feel like that much more of an upgrade. I know this is certainly the case with cellphones (even though the claim is that it is unintentional) but I can't help but think the same could be true here.
All of this is coming as news that gpt5 based on a new underlying model is not far off and that gpt4(&o) may become the new gpt3.5-turbo use case for most apps that are currently trying to optimize costs with their use of the service.