Heard of similar experiences from RL acquaintances, where a prompt worked reliably for hundreds of requests per day for several months - and then suddenly the model started to make mistakes, ignore parts of the prompt, etc when a newer model was released.
I agree, it doesn't have to be deliberate malice like intentionally nerfing a model to make people switch to the newer one - it might just be that less resources are allocated to the older model once the newer one is available and so the inference parameters change - but some effect at the release of a newer model seems to be there.