My history of closely scrutinizing Google products doesn't go back very far, so I couldn't say where the line between their old and new approach might be drawn. It also might be something that varied between product teams...
There is another possibility, that direct (verbal) user feedback is discounted as being "anecdotal" and not included when making "data driven" design decisions. But as you mentioned, this is all speculation without an insider's POV.
My intuition, however, is that their data-driven technique over-optimizes the various minutiae while allowing broader flaws to persist. I'd chalk that up to a lack of design vision to guide the testing.