We talked about this a few weeks ago so it can't be the first time you're hearing about this :) [1] You hadn't heard about it before because it really only affected API customers running services including calls that required output of 2.5k+ (rough estimate) tokens in a single message. Which is pretty much just the small subset of AI founders/developers that are in the long-form content space. And then out of those, the ones using Sonnet 3.5 at the time for these tasks, which is an even smaller number. Back then it wasn't as favoured yet as it is now, especially for content. It's also expensive for high-output tasks, so only relevant to high-margin services, mostly B2B. Most of us in that small group aren't posting online about it - we urgently work to work around it, as we did back then. Still, as I showed you, some others did post about it as well.
The only explanations were either internal system prompt changes, or updating the actual model. Since the only sharply different evals were those expecting 2.5k+ token outputs with all short ones remaining the same, and the consistency of the change was effectively 100%, it's unlikely to have been a stealth model update, though not impossible.
[1] https://news.ycombinator.com/item?id=44844311