Not OP but I would never want to use this if I was running a security minded business. Why would it be okay to send out my entire dataset to the Microsoft cloud to run a few simple queries. How long is the data stored on your servers? Since it’s operating on the data directly, it’s not possible to anonymize or redact the contents of the data.
Practically speaking, if you are running recent versions of Windows (I don't remember when exactly this trend started, with 2000? XP?), you have basically no control over what data you send to them and when. You can try to block traffic but it will interfere with normal Windows operations. You can try to investigate and play cat and mouse game with Microsoft, but they will always be one step ahead of you - unless you decide to turn off automatic updates and make your system less secure.
It makes little difference if it's a VM or bare metal - what matters is network connectivity. If it's on, you lose control over the data leaving your computer. In Linux, BSD and others, you can control it in a very precise way.