undefined | Better HN

0 pointssnadal9y ago0 comments

We've worked in the past with CMU Sphinx too, and it is absolutely amazing the advances in this area in the last months.

A little bit off-topic, but do you know any recent work or paper for speech recognition in language teaching area ? (I mean, analysing and rating accuracy of speaker, detect incorrect pronunciation of phones, and so on)

0 comments

bmc75059y ago

> Do you know any recent work or paper for speech recognition in language teaching area?

What you're describing is called "speech verification". Language education is an application I'm personally very interested in, and one that almost no one discusses in the speech community (I assume because of machine translation), so if you find any research papers please let me know! I wrote a little about it: http://breandan.net/2014/02/09/the-end-of-illiteracy/

The task is actually much simpler than STT. You display some text on the screen, wait for an audio sample, then check the model's confidence that the sample matches the text. If the confidence is lower than some threshold, then you play the correct pronunciation through the speaker. The trick is doing this rapidly, so a fast local recognizer is key. I've got a little prototype on Android, and it's pretty neat for learning new words. I'd like to get it working for reading recitation, but that's a lot of work.

snadalOP9y ago

Hey, thank you for the link to you article. I've read it throughly and I cannot agree more. And that was written two years and a half ago, before the AI "explosion" that we saw later.

Actually, checking against confidence is something that we've tried to play with, but to my knowledge there is not a model that allows you to compare speech confidence against an specific text. Public APIs like MS ProjectOxford.ai can return a confidence, but against the "recognised" text, not against a predefined text.

Going further, this kind of approach can be very effective on words and small sentences, but I'd really love to see which specific phones the learner is failing, which can help in analysing full speaking exercises.

It works, but I am sure it should be possible to do better

j / k navigate · click thread line to collapse