It won't catch anything but the dumbest of dumb criminals, because those who care about CSAM can surely figure out a better way to share images, or find a way to obfuscate their images enough to bypass the system (the lower the false positive rate, the easier it must be to trick the system).
So what's left when all the criminals this is supposed to catch have figured it out?
False positives. Only false positives.
Is it really worth turning personal devices into snitches that don't even do a good job of protecting children?
Also, numbers about false positives must be taken with a grain of salt because of the non-uniform distribution of perceptual hashes. It might be that your random vacation photos and kitty pics have a 1-in-a-million chance of a fapo, but someone who happens to (say) live in an apartment that has been laid out very similarly to a scene in pictures appearing in the CSAM database may have a massively higher chance of fapos for photos taken in their home.
Dumb is a pretty accurate description of a large fraction of criminals. For the most part you only get smart criminals when you are talking about crimes where you have to be smart to even plan and carry out the crime.
Decompress and downsample. Drop the least significant bit or two, maybe do it in the dct domain instead. SHA256. It'll preserve matching for at least some cases of recompression and downsampling. But finding an unrelated image that matches is as hard as attacking SHA256, the only false positives that could be found would be from erroneous database entries.
Is there any reading on that? I'd love it to be true.
Yes, because the point is not to protect children. It's to get everyone used to the idea that their content is being monitored. Once that is accomplished, other forms of monitoring can and will be added.
I'd also appreciate if Apple let me know if my false positives were reviewed and found to not be CASM.
How can we be sure they won’t cut costs by increasing worker load? I could see them giving each reviewer less time to review individual pictures before passing it on to law enforcement.
Trolls will be able to easily use tools slightly modify ambiguous adult porn to collide with a "known CP hash".
A human reviewer will see a blurry grayscale derivative of adult pornographic content and hit "report" every time.
False. The Apple proposed system leaks the cryptographic keys needed to decode the images conditional on the match (threshold of matches) of the faulty neuralhash perceptual hash.
Matching these hashes results in otherwise encrypted highly confidential data being decodable by apple, accessable on their servers to the relevant staff along with anyone who compromises them or coerces them.
It's true that in any arms race, a given advance gets adapted to. This will surely catch a bunch of people up front and then a pretty small number thereafter as the remainder learn to avoid iPhones. But that's how arms races work. You could say that about almost any advance in fighting CSAM.
Source: I've met a few white collar criminals.
Apparently that better way is by using Facebook. Facebook made 20.3 million reports to NCMEC in 2020.
https://www.missingkids.org/content/dam/missingkids/gethelp/...
"We found that more than 90% of this content was the same as or visually similar to previously reported content. And copies of just six videos were responsible for more than half of the child exploitative content we reported in that time period."
"we evaluated 150 accounts that we reported to NCMEC for uploading child exploitative content in July and August of 2020 and January 2021, and we estimate that more than 75% of these people did not exhibit malicious intent (i.e. did not intend to harm a child). Instead, they appeared to share for other reasons, such as outrage or in poor humor (i.e. a child’s genitals being bitten by an animal)."
Based on this, I wouldn't conclude that FB is the platform where people pedos go share their stash of child porn.
Their numbers also include Instagram, which I believe is quite popular among teenagers? I wonder how likely it is for teens' own selfies and group pics get flagged and reported to NCMEC.
(https://about.fb.com/news/2021/02/preventing-child-exploitat...)
Which appears to have resulted in what... 5 prosecutions?
Given the reported numbers of illegal images detected by similar systems within Facebook and Google, I think it is very clear that this will catch a lot of illegal content.
So closer to 1/10M. The reporting threshold is made artificially higher by requiring more than one positive.
But anyway, that's beside the point.
A perceptual hash is not uniformly distributed; it's not a random number. Likewise for photos taken in a specific setting; they do not approach the randomness of a set of random images.
So someone snapping a photos in a setting that has features similar to a set of photos in the CSAM database may risk a massively higher false positive rate. It's no longer a million sided dice, it could be a thousand sided dice when your outputs happen to be clustered around similar values due to similar setting.
But I can't say I care about false positives. To me the system is bad either way.
I really doubt this. In the long term, a few people Apple wants to frame will surely slip into the mix. If Apple didn't want Trump to win, a CASM flag a week before the election might do it.
This includes the vast majority of pedophiles.
According to Apple only images that will be uploaded to iCloud will be scanned.
If this is the case there is zero reason to scan locally and you can just scan the uploaded image once it is on the server.
Apple has not implemented E2E nor has it released a statement indicating this will be implemented in the future.
Whereas if the CSAM scanning was performed exclusively in the cloud, protection under the 4th Amendment does not exist as it would likely fall under the third party doctrine.
Now I'm not saying the US Government would let mere unconstitutionality get in the way of any surveillance program. But Apple would. You don't think Apple wouldn't be itching for another opportunity to flex in public? Especially now, with their reputation on the line? Apple would love nothing more than to have more opportunities like they got with the San Bernardino iPhone.
Scanning makes phones a greater threat, and also erodes the expectation of privacy that is a legal barrier to surveillance.
To all of a sudden introduce this scanner doesn't negate the expectation of privacy as that is how it was sold and marketed. There is an implied warranty of merchantability of how this service functions.
The government can't compel warrantless searches of Apple. 3rd party doctrine means Apple can search your iCloud, and can give it away if they choose. Same as how Apple can search your phone if you run their software, and can give away whatever they find if they choose.
There's no reason not to assume this isn't already happening, being closed source and proprietary. The question to ask is, what are we going to do about it?
In summary, I’m guessing they tried to invent a way where their server software never has to decrypt and analyze original photos, so they stay encrypted at rest.
https://www.apple.com/legal/privacy/law-enforcement-guidelin...
(Note: I have worked with law enforcement in the past specifically on a case involving Apple and two iCloud accounts. You submit a PDF of the valid warrant to Apple. Apple sends two emails one with the iCloud data encrypted. A second email with the decryption key.)
Surely that's just the data, but resized?
I suppose folks who don’t like privacy implications can downgrade to an iPhone 4 and maybe it will not support the feature.
And for the suspicious, it's of course much easier to notice if Apple would change their algorithms if they happen on device.
You’re having a house party. Because of the pandemic, you’d rather people who have COVID not attend. You can’t trust everyone to get vaccinated or get tested beforehand. So, you decided to set up a rapid-test system, just to be sure.
Would you rather test in your kitchen or your driveway?
If contagion wasn't a factor, I'd rather test in the kitchen, it's cozier.
Are you suggesting CSAM will infect more unwilling victims if it gets into a private iCloud account?
The system is specifically designed so that colliding images does not pose a threat to the user.
NeuralHash and the CSAM scanning is grotesque, but please, criticize it for what it is, not some bullshit that is easily dismissed as technical ignorance.
If it's a critical part of the system, then it should be inspected thoroughly. If Apple claims a minuscule chance of a hash collision, and the reality is that collisions are relatively common, that significantly changes the requirements for the backend system, which Apple keeps secret. We have every right to believe, bbased oon ppublic info, that Apple was expecting that NeuralHash would be almost fool-proof, leaving the backend system to be a rubber stamp. This would be tragic.
Now, how well this NeuralHash does preserve privacy is a different question, and /not/ one that is being answered by the original post here. In fact, I've not seen anybody look at the hash distribution over natural images, which would be an actual argument against the system.
Are they?
0. Most importantly: the existence of a preimage attack makes Apple's system completely useless for its original purpose. The NeuralHash collider allows the producers and distributors of CSAM material to ensure that nearly all of the next generation of CSAM will suffer from hash collisions with perfectly innocent images. Two weeks after it was deployed, Apple's CSAM scanning is now _only_ an attack vector and a privacy risk. Thanks to the preimage attack, it's now completely useless for its nominal function! Apple put a lot of effort into a system that reduced the privacy and security of all their customers, and made the company itself more exposed to the whims of governments. And for no gain whatsoever.
1. There are no known perceptual hash functions on which preimage attacks are difficult. Barring a major "secret cryptographic breakthrough", Apple's second hash function is not resistant to preimage attacks either. In fact, the second algorithm is almost certainly easier to attack than NeuralHash itself, since it has to work on the "visual derivative", a fixed-size low-resolution thumbnail of the original image.
2. But isn't Apple's second algorithm kept secret, making it difficult to perform preimage attacks against it? No.
First of all, the second algorithm cannot be kept secret. Apple doesn't have its own CSAM database (the whole point is that they don't want to deal with CSAM on their servers!), so the algorithm has to be shared with multiple organizations which do have such databases, so that they can pre-compute the hashes that Apple will match against. Due to Apple's policy, some of these organizations will be located outside the US [1]. Chances are, the hash function will leak: Apple won't know if and when that happens.
Secondly, this _is_ security by obscurity. Some people argue that keeping the hash algorithm secret is similar to keeping a cryptographic key secret. This is not the case. Of course, any security system relies on keeping _something_ secret, but these secret somethings are not created equal. The secret keys of cryptographic algorithms are designed to satisfy Kerckhoffs's assumption. This means that the key, as long as it remains secret, should be sufficient to protect the confidentiality and integrity of your system, even if your adversary knows everything else apart from the key, including the details of the algorithm you use, the hardware you have, and even all your previous plaintexts and ciphertexts (inputs and outputs).
The second hash does not have this property at all. Keeping the algorithm secret does not ensure the confidentiality or integrity of Apple's system. E.g. if somebody gets access to a reasonable number of inputs-output examples, that allows them to train their own model which behaves similarly enough to let them find perceptual hash collisions, even if they don't know the exact details of the original algorithm. This is incredibly hard for cryptographic hashes, but very easy for perceptual hashes, since a small change in the input should cause only a small change in the output of the perceptual hash algorithm. So, to maintain security, Apple doesn't have to keep just the hash algorithm (or its configuration parameters) secret, but all the inputs and outputs as well. This is bad: the fewer and simpler the secrets that one must keep to ensure system security, the easier it is to maintain system security.
Finally, the second hash algorithm is unlikely to be original (NeuralHash was original, and by all accounts it was a massive effort). If an attacker successfully guesses that Apple's secret algorithm H is closely related to a known algorithm, say PhotoDNA, they will probably be able to make a transfer attack against it. By engineering a PhotoDNA collision on the resized thumbnail (e.g. via a resizing attack, extensively discussed in a previous thread [3]), they have a reasonable chance of generating a H-collision as well. How good is fairly good? Well, something like 5% is more than enough! The attacker needs to produce a certain number of NeuralHash collisions (say 30 images) to get through the first threshold of Apple's algorithm. But after that, Apple will decode all the thumbnails in the user's safety voucher: the attacker only needs one of those 30 to get through the second hash. Given a sufficiently high probability of hash collisions, this can be achieved "blindly".
3. It's incredibly easy to come up with these kinds of attacks. Even the HN audience could come up with several reasonable plans, and could point out several reasonable issues, in two weeks. People who do malice for a living will have a much easier time with it. Even if somehow all the plans presented on HN turned out to be unviable, it will not take long for someone to stumble upon something practical. Any reassurance that Apple could provide at this point is fake. Cf. the timelines for real security: it took 17 years to come up with an analogous attack against SHA-1 [4], and two years after that to turn it into something that can be exploited in practice [5]. The existence of a preimage attack made Apple's system completely useless for its original purpose in two weeks. It's now just a security and privacy hole, with no other function. Keeping it around would be a travesty, even if it was difficult to exploit. But it's not.
[1] https://www.itnews.com.au/news/apple-to-only-seek-abuse-imag...
[2] https://en.wikipedia.org/wiki/Kerckhoffs%27s_principle
[3] https://news.ycombinator.com/item?id=28236102
[4] https://security.googleblog.com/2017/02/announcing-first-sha...
[5] https://www.zdnet.com/article/sha-1-collision-attacks-are-no...
Let's set aside the questions of where you got all these hashes to generate collisions with, how you got 30 of these mangled images into your victim's camera roll without them noticing. And let's also set aside whether your victim's device is an iPhone with iCloud Photo Library enabled (and has sufficient storage). I still don't get what these mangled images have achieved, other than giving the manual review team something other than child porn to look at.
Seems to me like it'd be easier to just find actual child porn, print it out, place it somewhere in the victim's house and then report it to the police.
That’s a really interesting attack vector I hadn’t seen mentioned previously.
Most people are talking about the potential for adversarial images to be sent to users. If they were instead injected into the database itself (either by poisoning real CSAM or social engineering) that would have far wider ramifications.
I wonder what the most widely-saved pornographic images are across iCloud users.
If actual CSAM were perturbed to match the hash of, say, images from the celebrity nude leak a few years back and added to the database then thousands of users could be sent to “human review”. Since the images are actually explicit how would the human reviewers know not to flag them to authorities?
(1) Review by apple staff (2) Access and leaking by other apple staff (3) Access by hackers who have compromised their system (4) Access by parties coercing apple/staff, including via national security letters.
All of which compromise the privacy of the user. This matters or the neuralhash comparison wouldn't exist in the first place.
Totally agree that the whole system is grotesque-- but that doesn't stop it also being grotesque in every detail as well. The fact that there are false positives when they easily could have designed a system that had none (at the expense of increased false negatives) shows that Apple doesn't especially value customer privacy even if you accept their vigilante privacy invasion. The fact that it's possible to construct adversarial false positives and that their reports didn't disclose this fact shows they either don't know what they're doing or they're not being honest about the risks (or both).
This algorithm doesn't even give exact matches for the same image on different hardware.
https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX
Note: Neural hash generated here might be a few bits off from one generated on an iOS device. This is expected since different iOS devices generate slightly different hashes anyway. The reason is that neural networks are based on floating-point calculations. The accuracy is highly dependent on the hardware. For smaller networks it won't make any difference. But NeuralHash has 200+ layers, resulting in significant cumulative errors.
Apple explained in their technical summary [0] that they'll only consider this an offence if a certain number of hashes match. They estimated the likelihood of false positives there (they don't explain which dataset was used, but it was non-CSAM naturally) is 1 out of a trillion [1]
In the very unlikely event where that 1 in a trillion occurrence happens, they have manual operators to check each of these photos. They also have a private model (unavailable to the public) to double-check these perceptual hashes which also used before alerting authorities.
[0] https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni... [1] https://www.zdnet.com/article/apple-to-tune-csam-system-to-k...
The main advantage of using exact collision is that you can then blind the perceptual hash with a cryptographic hash and avoid any leak of information. (Taking for example sha256 of this perceptual hash won't allow any attacker to get any information on the features from the hash, but if the perceptual hash are the same then the input of the sha256 is the same and therefore the output of the sha256 is the same).
This is important because it alleviates the risks of an eventual leaking the database as Apple never touched and compared sensitive content but only cryptographic hashes of the perceptual hashes.
Some other system like PhotoDNA, rely on a euclidean distance between features being less than a threshold to register a match, which allows to quantify how far the image is from CSAM, but mean that the hash leak some information about the original content.
Accidental Tech Podcast - A Storm of Asterisks https://atp.fm/443
I haven't listened to the follow up episode yet.
I still have zero opinion on this photo scanning kerfuffle. I just don't know enough. Of all the "hot takes" on this issue, ATP's has been the most comprehensive. So appreciated.
https://daringfireball.net/2021/08/apple_child_safety_initia...
1. What does “exact” mean to you in this context?
2. What else is more interesting about a hashing algorithm used to identify things, other than its collision rate?
Ad.2 If hashes are to be matched approximately, not exactly, for example will be considered a match if they differ in less than 3 bits out of 96 then the most interesting thing should be how many collisions you can find if you compare them like that.
Apple's private set intersection which leaks the keys to decrypt the images coniditional a neuralhash match requires an exact match.
They probably didn't realize they got different results on different toolchains/devices, since they target a mono-culture and the whole subsystem shows fairly little careful thought went into it. They could easily make an exact integerized version which would be consistent.
It would still be broken. :)
Not sympathetic to a rival gangster? Ok lets find an innocent victim: not a rival criminal, but an innocent witness who our protag wants to intimidate. Gangster wants to intimidate the witness, but can't get at them, so cooks up a scheme to convince the witness that the police are in his pocket. Exactly as above, causing the police to investigate the witnesses phone.
Another one might be, a certain government wants to identify opposition groups using images associated with them . Apple is not keen to be associated with that, but the government can simply generate fake child-porn (remember, programmatically generated CP is just as illegal) for each image of interest.
I would think surreptitiously placing actual child porn on a rival's phone/computer would be much, much more effective.
Cybercriminals could likely do all this remotely. Phish for apple account login, upload images. Done.
they can either just send the police an anonymous message or set up a child porn web site and have it ‘accidentally’ leak its password database, and make sure your email address is in it.
According to the U.S. law, key snippets of which are quoted on the Stratechery blog (by Ben Thompson), Apple isn’t obligated to scan for CSAM. It’s only obligated to act on CSAM if it finds them.
While it’s good for Apple to scan on its systems (iCloud) like Facebook, Google and other companies do on their servers, it’s inappropriate to do it on individual devices, which starts with the assumption that anyone who has iCloud photos enabled is a potential CSAM hoarder and needs to pay with their device’s battery life and time for the scanning to happen and report back. It’s a sort of micro-robbery that Apple is doing on the devices when there is no legal compulsion to do so.
Everything else on trusting Apple’s NeuralHash or the sanctity of the NCMEC hashes come later, IMO.
I sincerely hope Apple realizes that it’s got a dud solution on hand, eats humble pie (which it’s usually not capable of) and ditches this whole thing. I know a lot of egos at Apple are at stake here. But doing the right thing matters for a company that claims that “privacy is a fundamental human right” and has a CEO who’s a member of a marginalized/discriminated community and understands the risks of these efforts.
I would even challenge the justification to do this on servers, unless the data is public. If it's behind a personal login, you might as well consider it personal property/data. I find the distinction of where data is stored not very meaningful.
Allowing things to be searched for criminal content just because it's not in your immediate physical sphere makes no sense. It doesn't work like that in the physical world either. When I send a letter, and it leaves my house, no authority has the right to check its contents without a legitimate reason. Likewise, if I put stuff in a storage box in some warehouse, no authority can search it without a warrant.
Note that I'm talking about personal storage (iCloud, Gmail), not public social networks like Facebook.
Our major privacy blunder was accepting scanning of private data in any context. The fight should be for the absolute privacy of personal data. Where the scanning happens is mostly irrelevant.
It's unclear if their claimed threshold of 30 is before or after the false positives they intentionally introduce. I'm going to guess it's before.
That is not bad. As a tool to filter down what apple human reviewers need to look at this is pretty good.
Ultimately these images will make it to a human reviewer who can make a call as they would in any flagging system.
Could a backend server side system do a more precise hash (96 bits is not a ton) prior to human review?
Imagine that you play a game of craps against an online casino. The casino throws a virtual six-sided die, secretly generated using Microsoft Excel's random number generator. Your job is to predict the result. If you manage to predict the result 100 times in a row, you win and the casino will pay you $1000000000000 (one trillion dollars). If you ever fail to predict the result of a throw, the game is over, you lose and you pay the casino $1 (one dollar).
A casino that makes no adversarial assumptions about the clientele could argue as follows: the probability that you accidentally win the game is much less than one in one trillion, so this game is very safe, and the House Edge is excellent [3]. But this number is very misleading: it's based on naive assumptions that are completely meaningless in an adversarial context. Some of the clientele will cheat. If your adversary has a decent knowledge of mathematics at the high school level, the serial correlation in Excel's generator comes into play [4], and the relevant probability is no longer less than 1/1000000000000. In fact, the probability that the client will win is closer to 1/216 instead! When faced with a class of adversarial math majors, a casino that offers this game will promptly go bankrupt. With Apple's CSAM detection, you get to be that casino.
(reposted based on my comment on last week's thread [1])
[1] https://news.ycombinator.com/item?id=28236102
[2] https://blog.roboflow.com/neuralhash-collision/
[3] https://wizardofodds.com/gambling/house-edge/
[4] How to crack a linear congruential generator? http://www.reteam.org/papers/e59.pdf
But I really wonder why you think this is an important objection, do you think a lot of people want to go to the "get flagged for child porn" casino?
[1] https://pseudorandom.resistant.tech/obfuscated_apples.html
This system is unwanted because it puts a spy literally in your house and in your hands. It's bad enough that cloud everything blurs the line between what's yours and what's mine. Placing any law enforcement tech on a user's own device takes that line between "public" and "private" and completely erases it.
This alone should be bad enough, but some people are rather trusting. Showing that the spy is also tripping balls both exposes additional risks and emphasizes that Apple neither has their best interest at heart nor is putting adequate care into their actions. The latter gives people reason to question apple's claims of additional protection mechanisms that are non-falsifiable.
It's already a public knowledge that Apple has 2 more systems (some server-side verification and a manual check later) to prevent false-positives. So what's the point of researching collisions in NeuralHash?
Now, I can't really call something I voluntarily uploaded to Apple's servers a "private picture". But that's just a matter of perspective, and I understand that many people would disagree with me on this.
On the natural hash collisions (of which there are two), we have objects of similar shape against a solid background. It seems that a natural hash collision of a CSAM image would be unlikely (or if it does occur, it would be something that perhaps is also an infringing image).
As for the synthetic hash collisions, there are visible artifacts in the picture that, if you compare with the original picture, make the overlay of the original picture on the synthetically generated hash collision obvious. Could people get tricked into downloading memes¹ with synthetically generated hash collisions? Sure, people are idiots. But I'm guessing the majority of folks will look at the picture and say, this is a sh*t picture in this meme and download something else.
1. And that, of course, assumes that meme hosters don't apply similar scanning techniques to what they serve up.
NO. Adversarial preimages can be created that look like perfectly normal images. Please stop repeating this falsehood.
Here are some examples I generated (with a link to more):
https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issue...
Try to find a SHA256 collision.
Anywhere, ever, in the history of mankind.
This isn't for lack of looking. A lot of very smart people have looked for them. If you find one, I bet you'll be eligible for a tenured faculty slot at a good university, if not more. A whole world of secure systems would need to be re-engineered.
Hypothetical collisions of course exist, by the pigeonhole principle, just not in the real world.
Can two collisions really be called a catalog?
It could in the future be used to e.g. improves this algorithms.
Am I correct in that the primary reason folks are so upset is that the system could (probably) be easily modified such that -any- content could invoke legal action? That the main problem is really the scanning at all, and not the chances that it could be attacked by an individual actor but instead by a government?
That said, Nextcloud is my backend and I do not upload anything to iCloud (except for MS authenticator 2fa backups), so I'm safe right?
You do know that they don't see the whole photo at full megapixel resolution? They're just given "a visual derivative" of the photo for checking.
Also, you really think that the persons tasked with this process are just randos off the street and not vetted specifically?
And where do you get the "visual derivative" information? Apple sure didn't communicate that to me. All I know is some person may look at my pics at some point.
We are not speaking about a situation where not a "arbitrary" picture is miss-classifieds.
We are speaking about a situation where a innocent picture involving a naked or not fully clothed child is deemed similar to a non innocent picture of a naked or not fully clothed child.
Now you might argue that there should not be a picture a a naked or not fully clothed child of any form ever on any phone, but IMHO that is short sighed, discriminating and at best shows you don't know to much about the world and other cultures.
Let's list some simple reasons such a think could happen first:
- Photos meant for a doctor, or living partner to ask if something is normal or a problem. In many different ways.
- Photos of little children bathing or similar a e.g. dad sends to their mom who is currently on a business trip.
- etc.
Reasons people are less aware of is that not all countries are as stuck up about nakedness especially in the family. So it's totally normal for families that e.g. before or after taking a shower family member independent of age and gender walk through the apartment naked. Similar if you didn't got any shame about the naked body indoctrinated you might totally do thinks like visiting a "naked-beach" with your family (meeting other families and taking advantage of it often being less crowded) and in turn normal innocent beach family pictures contain naked children. And on itself that's not a problem. But with Apples approach stuff like this is like to trigger both systems Apple announced and wrongly label your while family as pedophiles...
Yeah two sticks (ski and nail) are visually similar on a white background. Why is this news to anyone?
EDIT: if you are going to downvote please leave a comment unless you are just downvoting for wrong think.
Besides, you could probably "naturally" obtain such type of colliding images by photographing similar-looking objects against a white (or generally featureless) background. Furthermore, it suggests/demonstrates that similar-looking images with similar backgrounds can lead to unexpected collisions in practice (i.e. "naturally"), even if you do not assume an adversarial scenario.
Are you sure that, if you take a picture of a naked body part, it won't collide with anything that looks similar in their database?
It is unlikely that there is a collision of benign image with the database and even if that happens it is not some automatic process that just sends cops to your house to raid it.
Of course we can get bunch of collitions with essentially same images, I don't get why this is so magical just squint your eyes and I'm sure you have two objects with in your reach that could be made to collide, but that isn't a gotcha on any level
At any rate, IANAL, but I'm pretty sure you can't be convicted based on a hash alone. If you get busted for possession of a picture of a nematode and you can show the jury it's just a picture of an axe that has the same value when run through this algorithm, you'll be fine. And there's a decent chance prosecutors won't chase down individuals who will just have a single collision in their photo library with this tech in the first place - people who have dozens or hundreds will be much more interesting.
The more interesting technical question for me is: do collisions transfer across models? or how to find collisions that transfer across models?
But un-natural image collisions or bad images in the database and similar are a different matter and had been the main critique point from the get to go as far as I can tell.
I wouldn't be surprised if some flat, small height fully adult (e.g. 30) woman does some sexting and goes from 0 to >40 collisions in a month. Not because of arbitrary collisions but because the similarity some of here sexting pictures might have with the ones from a 14y old but older looking girl (which e.g. where forced and ended up in the database).
E: Better yet, only run the second hash if you have a collision, which should be very rare.
I find it amusing that they probably ran this tool against a set of millions or even billions of images and this is the best they could come up with. They are practically praising Apple here lmao