A/D conversion itself wouldn't add much at the sample level (say 48KHz sample, it's about 50us per sample). However, packetisation will - a 256 byte packet of 128 samples is 128*50us = 6.4ms right there at the transmitter, and the receiver won't notify until the full packet is received. So a naive digital approach would be 12.8ms (2x6.4ms) even before anything else.
A pure analogue approach (modulated RF) on the other hand shouldn't have any human-detectable delay - it's effectively distance/speed of light with a bit of a phase shift (addtional delay) introduced by the electronics - should be only a handful of microseconds in total.