And 40 years ago in the debate over letting kids use a calculator in school instead of calculating by hand.
A footballer follows a very constrained set of rules. If the footballer were allowed to use a car then 1) it would be easy to score a point, 2) the field would be ruined, and 3) people wouldn't pay to watch or support the team.
If a programmer uses ChatGPT to get a handle on a task with a new API, and saves a day of futzing around, how is that NOT relevant to job performance? (For the sake of argument, let's say that experiment showed that API doesn't scale well enough, resulting in a decision to scrap that approach entirely and use a different API.)
If the evaluation doesn't make sense, then the conditions placed on the evaluation don't really matter.
Similarly, "assimilated the material" only makes sense if the live coding interview really does cover "the material". To use your analogy, measuring a football prospect’s cycling times aren't that good of a test of football playing skills. I mean, yes, there's some overlap, but there are more useful ways.
And one of the example tests was "handle scores for bowling", which is far from most work-related issues.
"consistent process that’s fair for everyone."
The author addressed this idea at several points, including "Any belief that a live coding interview is a consistently reliable way to make an objective assessment represents willful ignorance at best."
Picking a name out of a hat containing potential employee names is also consistent and fair.
Just because it's easy to measure doesn't mean it's an effective predictor.