It's all anecdata--I'm convinced anecdata is the least bad way to evaluate these models, benchmarks don't work--but this is the behavior I've come to expect from earlier Claude models as well, especially after several back and forth passes where you rejected the initial answers. I don't think it's new.