That attack described in the video isn't something the phone is producing and picking up (most phones already ignore what they playback), but rather a sound played by a laptop picked up by the phone.
And further, the attack described is a sentence that doesn't sound to a human like "Ok Google" or "Hey Siri", or whatever