The discussion in the paper is nuanced on that point and does not make that claim as far as I read it. Section 2.2 (page 2):
> The content of users’ speech can reveal sensitive information (e.g., private conversations) and the voice signals can be processed to infer potentially sensitive information about the user (e.g., age, gender, health [82]). Amazon aims to limit some of these privacy issues through its platform design choices [4]. Specifically, to avoid snooping on sensitive conversations, *voice input is only recorded when a user utters the wake word*, e.g., Alexa. Further, only processed transcriptions of voice input (not the audio data) is shared with third party skills, instead of the raw audio [32]. However, despite these design choices, prior research has also shown that smart speakers often misactivate and unintentionally record con- versations [59]. In fact, there have been several real-world instances where smart speakers recorded user conversations, without users ever uttering the wake word [63].