You are probably right. I think some M3s have a DSP which could be leveraged for the voice portions. I'd imagine they also have a TI CC3000-like module with the whole TCP/IP stack to handle the comms. Maybe not necessarily the 3000, as they offer same functionality in larger sized modules for a lower cost. Further, I'd say TI also because of their SimpleLink system, which allows you to connect to Wi-Fi without entering SSIDs/passwords on a device. You'd just need to download the Amazon app to facilitate the initial connection if I understand it correctly.