That was the logical next step and we started on that, but it required exporting more historic data out of the HR system and filtering out anyone who started as an intern as well. Sounds simple, but in practice it's anything but. Just for the reference, data extraction, cleaning and filtering in that project took at least an order of magnitude more time than anything related to machine learning.
The project eventually lost steam and got abandoned.
>Do you still suspect a skewed result?
Absolutely. My personal intuition is that there is very little correlation between resumes and candidate quality. If that is true, any seemingly accurate predictions would be the result of a similar problem. Testing this hypothesis was a large portion of why I agreed to work on the project in the first place.