I recently did an internship at one of these big companies, doing ML. I'm a researcher but had a production role. Coming in everything was really weird to me from how they setup their machines to training and evaluation. I brought up that the way they were measuring their performance was wrong and could tell they overfit their data. They didn't believe me. But then it came to be affecting my role. So I fixed it, showed them, and then they were like "oh thanks, but we're moving on to transformers now." Main part of what I did is actually make their model robust and actually work on their customer data! (I constantly hear that "industry is better because we have customers so it has to work" but I'm waiting to see things work like promised...) Of course, their transformer model took way more to train and had all the same problems, but were hidden a few levels deeper due to them dramatically scaling data and model size.
I knew the ML research community had been overly focused on benchmarks but didn't realize how much worse it was in production environments. It just seems that metric hacking is the explicitly stated goal here. But I can't trust anyone to make ML models that themselves are metric hackers. The part that got me though is that I've always been told by industry people that if I added value to the company and made products better that the work (and thus I) would be valued. I did in an uncontestable manner, and I did not in an uncontestable way. I just thought we could make cool products AND make money at the same time. Didn't realize there was far more weight to the latter than the former. I know, I'm naive.