That's not necessary. You can teach a computer to recognize anomalies and route those to humans. Repeated failures is an obvious one.
> A computer’s “instinct”, meanwhile, when generating a low-confidence classification output, is to just still generate that output
That's a poorly designed system. Human failure.
If we could get pure-AI systems to be “confused by default” like humans are, such that they insist on emitting “unknown” classification-states whether you ask for them or not, they’d be a lot more like humans, and maybe I wouldn’t see humans as having such an advantage here.
Yes, I'm certain that training ML models to separately classify low-confidence outputs, and getting a human in the loop to handle these cases, is a well-known technique in ML-participant business workflow engine design. But I'm not talking about ML-participant business workflow engine design; I'm talking about the lower-level of raw ML-model architecture. I'm talking about adversarial systems component design here: trying to create ML model architectures which assume the business-workflow-engine designer is an idiot or malfeasant, and which force the designer to do the right thing whether they like it or not. (Because, well, look at most existing workflow systems. Is this design technique really as "well-known" as you say? It's certainly not universal; let alone considered part of the Engineering "duty and responsibility" of Systems Engineers—the things they, as Engineers, have to check for in order to sign off on the system; the things they'd be considered malfeasant Engineers if they forget about.)
What I'm saying is that it would be sensible to have models for which it is impossible to ask them to make a purely enumerative classification with no option for "I don't know" or "this seems like an exceptional category that I recognize, but where I haven't been trained well-enough to know what answer I should give about it." Models that automatically train "I don't know" states into themselves — or rather, where every high-confidence output state of the system "evolves out of" a base "I don't know" state, such that not just weird input, but also weird combinations of normal input that were unseen in the training data, result in "I don't know." (This is unlike current ML linear approximators, where you'll never see a model that is high-confidence about all the individual elements of something, but low-confidence about the combination of those elements. Your spam filtering engine should be confused the first time it sees GTUBE and the hacked-in algorithmic part of it says "1.0 confidence, that's spam." It should be confused by its own confidence in the face of no individual elements firing. You should have to train it that that's an allowed thing to happen—because in almost all other situations where that would happen, it'd be a bug!)
Ideally, while I'm dreaming, the model itself would also have a sort of online pseudo-training where it is fed back the business-workflow process result of its outputs — not to learn from them, but rather to act as a self-check on the higher-level workflow process (like line-of-duty humans do!) where the model would "get upset" and refuse to operate further, if the higher-level process is treating the model's "I don't know" signals no differently than its high-confidence signals (i.e. if it's bucketing "I don't know" as if it meant the same as some specific category, 100% of the time.) Essentially, where the component-as-employee would "file a grievance" with the system. The idea would be that a systems designer literally could not create a workflow with such models as components, but avoid having an "exceptional situation handling" decision-maker component (whether that be a human, or another AI with different knowledge); just like the systems designer of a factory that employs real humans wouldn't be able to tell the humans to "shut up and do their jobs" with no ability to report exceptional cases to a supervisor, without that becoming a grievance.
When designing a system with humans as components, you're forced to take into account that the humans won't do their jobs unless they can bubble up issues. Ideally, IMHO, ML models for use in business-process workflow automation would have the same property. You shouldn't be able to tell the model to "shut up and decide."
(And while a systems designer could be bullheaded and just switch to a simpler ML architecture that never "refuses to decide", if we had these hypothetical "moody" ML models, we could always then do what we do for civil engineering: building codes, government inspectors, etc. It's hard/impractical to check a whole business rules engine for exhaustive human-in-the-loop conditions; but it's easy/practical enough to just check that all the ML models in the system have architectures that force human-in-the-loop conditions.)