A third option is they tested each individual change in isolation, and then grabbed all of the high-performing changes and lumped them together. Just because the individual changes may have been improvements within the context of the old UI doesn't mean they're still good once put together with all the other changes.
That said, I have trouble believing that changing the buttons from labels to icons could have possibly tested as an improvement. I'm with the OP; I have to mouse over every button and read the tooltip to find the Report Spam one.