My first step would be to cut the input into 5-minute chunks and use automatic speech recognition (ASR) to generate a rough outline for the transcription. Then each transcription chunk is posted automatically to Amazon Mechanical Turk for proofreading and editing. Turkers can earn points for good work, and this will qualify them for premium tasks which cost more.
The resulting Audio and Text can be used to improve the acoustic models for the speech recognition engine, so the automatic transcripts get better over time, and less work is required for proofreading and editing. It would be possible to train several classes of speaker-independent acoustic models, e.g. adult female speaker with German accent. Languages other than English are possible too.
This service is very similar to castingwords.com but faster and cheaper because it uses self-improving speech recognition technology.
Please let me know what you think. I'm planning to implement a simple prototype in Seattle during the next few weeks. Want to brainstorm with me over beer or coffee? We could be co-founders if we work well together.
No comments yet.