Someone actually already did a Voice controlled drone with the platform https://github.com/trancept/snips_bebop/ :)
It runs on a Raspberry Pi 3, or any Linux, osX, iOS, Android device, and it works in English, French, German, Japanese, Spanish, Italian, and more coming soon!
We'd love to show what you are building on our blog if you want to use it :)
Consider the whole of infrastructure required for this application. The Pi sends a request to Alexa, which sends a request to a lambda server, which sends a request to SQS, which is polled by the Pi. Each of those AWS servers has possible subsystems and some load balancer in front of it. Single device, offline processing is so much cleaner when there's no need for general connectivity.
I used a (Python) natural language parser of my own creation (this was before intent classification was a big thing), Nuance speech recognition, and a javascript (node) interface to the AR.Drone.
Great job.