Today, they can probably do the same via "Siri" or "OK Google" ... ?
The Siri thing isn't new tho you used to be able to shout at Siri trough some ones phone when you were on speaker but I think Apple implemented some voice recognition restrictions.
Yeah.. the "consumer as product" category
I'd guess most speaker generated audio would be from a compressed source. Audio compression generally cuts off frequencies that we can not hear. When we speak though, we must be generating a lot of inaudible frequencies. It could be determined by checking if those exist or not.
Since the output is known, similar input can then be stripped. This only works when both the output of the speaker and input of the microphone are known.
This can't be done to determine whether another speaker, such as a TV, generated the output.
Visual aid on echo cancellation: http://i.imgur.com/m2LSIz9.png