I've seen these kind of accusations of spying be made a lot based on misinterpretation of very scant evidence, and without something more concrete I think that's almost certainly what's happening here. A far more likely explanation IMO is that the daemon is doing something like checking for regional availability of the Live Text feature (still somewhat problematic but definitely nowhere near the same ballpark), as partially suggested by this commenter: https://infosec.exchange/@yProd/109698545121198396
Incredibly lazy blog post IMO, if you’re going to write an article and video on an infosec site, take the time to MITM the connection so you can avoid purely tinfoil speculative reporting. Apple likely does not make this easy but it is possible to do anything when SIP is disabled.
Analytics and Siri Suggestions are off. I don't use iCloud.
Text recognition models would likely be served from an Apple CDN, not api.smoot.apple.com.
I don't know what it's sending (an API hostname suggests some dynamic server code, not just a file download), but it should not be sending anything at all. I don't want it to, and I never consented to such transmission.
I didn't make the claim that file information is being sent because I didn't want to publish anything but facts. I have not done any RE on the binary itself as yet.
> I didn't make the claim that file information is being sent because I didn't want to publish anything but facts.
When you say "Apple Has Begun Scanning Your Local Image Files Without Consent" what 95% of people will hear is exactly the claim that scanned data is being sent to Apple. I don't think you can in good conscience say that you're only publishing facts if you are aware of the rate of misinterpretation and don't attempt to clarify.
Ironically you're doing exactly what you're accusing Apple of: saying technically truthful things that say one thing that cause people to believe a different thing (which is, as far as we know, not factual).
The post makes big accusations and extrapolations without proof or research, based on a web request whose contents this 'security researcher' didn't even see. A quick web search reveals mediaanalysisd has been a part of macOS since at least 2017.
It is disappointing to see Louis Rossmann blindly repeating any random claim from any random person. This is the same person who created a 'standard' (https://consoledonottrack.com) and spammed a bunch of popular projects with it with an entitled attitude.
I am not a fan of the Apple Tim Cook is leading, but let's be reasonable and put down the pitchforks for a moment. A single web request does not immediately equate to your files being scanned without consent. Louis should know better, and you should not believe any random crap just because he repeats it.
But it doesn't seem to work. According to the aforementioned post, Apple system binaries are using cert pinning so it's difficult to intercept the network requests that they make. The suggestion was setting an environment variable to politely ask them to log their requests. I don't think mediaanalysisd respects this variable, however.
Is there a hosts entry I can add to block this behavior?
https://sneak.berlin/20230115/macos-scans-your-local-files-n...
They wanted to be the friends of Police State, and they are friends of China.
Not normal behaviour
Now Apple just moved the search to the OS via an API call to its server, and people are noticing the traffic.
When I worked in telecom, if there was a hit on an image it was reported to legal. Legal contacted the feds. Feds contacted the local PD of the user. The PD would send a cop in to pick up a burned cd. The server would zip all the users data and burn onto a dvd. We wouldnt touch the dvd, the cop would walk into the datacenter and hit eject and collect the dvd. No chain of custody issues.
I'm not sure how photo hosting services doing this for the past 2 decades is related to this when the author of the post explicitly mentions he doesn't use Apple cloud services or products that would trigger such behaviour. This was the OS analysing someone's images, stored locally on their personal computer, and calling back to an API for no discernible reason.