Hey, thank you for the link to you article. I've read it throughly and I cannot agree more. And that was written two years and a half ago, before the AI "explosion" that we saw later.
Actually, checking against confidence is something that we've tried to play with, but to my knowledge there is not a model that allows you to compare speech confidence against an specific text. Public APIs like MS ProjectOxford.ai can return a confidence, but against the "recognised" text, not against a predefined text.
Going further, this kind of approach can be very effective on words and small sentences, but I'd really love to see which specific phones the learner is failing, which can help in analysing full speaking exercises.
It works, but I am sure it should be possible to do better