> ... is that the gist of it?
Yes, it is.
I'd like to improve the speech recognition and expected some advice about that.
Another possibility is to add a semantic level with NLP or use another library like Kaldi (http://kaldi-asr.org/).
Another particularity: the WAV file is serialized in JSON (as an array).