I am doing exactly this. As a matter of fact I am working on something eerily similar. I took a year of just to muck about with my own project to see what happens. In particular the field of user input.
Seems like we're thinking very similar on this problem as my approach would be to have a constant onscreen HUD.
Initially my target was gamepads and other things viable for Virtual Reality. To try to demonstrate the universality of the method however I am working on a musical input method. Where musical keys can turn into keystrokes.
When you think about it you'll find that having a HUD onscreen gives a lot of room to implement clever algorithms that can speed up the process. Think huffman trees and other things that'll make the necessary keystrokes compressed.
What I have in the pipeline as of now is an FFT based algorithm that extracts the notes from the microphone, opening up for all instruments basically. The FFT part was surprisingly simple when dealing with one note at least, the tricky part might be chords and noisy environments...
What's so cool about far out ideas like this that you do for fun is that someone somewhere might find a usecase for it. Say for changing note cheats while playing a guitar or something similar.