Basically, it can probably be shown that if you hire a group of completely random sample of resumes vs. hiring a random sample of people who can produce a working fizz-buzz program, that the candidates from the latter group will perform better. But if you then filter the fizz-buzz group by “can they solve this negative base math problem?” you’ll be removing a disproportionate amount of candidates who would have turned out to be excellent for your organization, and your final “super-candidate” pool will actually be weaker (for your org / that position) than the average of all those who could solve fizz-buzz.
I think the research shows that you should filter based on what you know for sure provides a true signal, then select randomly from the pool which passed your known filters. If anyone can help me find anything relevant about this, I’d appreciate it.
Too many orgs treat hiring like the “secretary problem” but that requires the assumption that you can grade everyone accurately on a continuous scale. We can’t yet do that with software engineers - theres no way to say someone is “80% awesome” vs. “93% awesome”.
I'm not a huge fan of coding tests, and I have a lot of sympathy for those who refuse to do them, but it's a case of "It's not you, it's everyone else"
This is how 99% of developers work, I don't believe anyone who says they write their code perfectly and under stress every single time.