Voice interfaces actually remind me a lot of command-line interfaces: If you know the a working "rune" on the tip of your tongue (e.g., "Set a timer for 10 mintues", "Play <exact title rune that gets the song you want>") it's great. But as you say, it's not always that easy to figure out new "runes". LLMs should be somewhat better for that, though.
The LLM is phenomenal at figuring out what you want, but it still has to map it to the schema of the tool. So while the job of figuring out the working “rune” is offloaded from you to the LLM, it doesn’t solve the fundamental problem of the available “runes” likely being brittle and insufficient for any given task even when the LLM knows exactly what you want to do.