Strange, I can see it with no problems. Probably because I use VIM quite a bit, which makes use of fairly natural language gestures.
Copy two words
Select line
Paste before word
etc.
Opening apps is ever simpler: "open spotify". Compare the complexity and time required to say those two words against moving your hand to the mouse, moving the mouse to a 100x100 pixel target, and clicking twice within 100ms. Even compare it against using "Cmd-Space Spotify".
It'd require a learning period, but so does - for example - teaching the concept of the mouse to someone who's only ever used a tablet.
EDIT: And I'll copy this from another of my posts - getting good voice control won't take our keyboards and mice away from us.