A catalog of naturally occurring images whose Apple NeuralHash is identical (opens in new tab)

(github.com)

337 pointshongsy4y ago294 comments

294 comments

My take on this is that the system is by and large useless.

It won't catch anything but the dumbest of dumb criminals, because those who care about CSAM can surely figure out a better way to share images, or find a way to obfuscate their images enough to bypass the system (the lower the false positive rate, the easier it must be to trick the system).

So what's left when all the criminals this is supposed to catch have figured it out?

False positives. Only false positives.

Is it really worth turning personal devices into snitches that don't even do a good job of protecting children?

Also, numbers about false positives must be taken with a grain of salt because of the non-uniform distribution of perceptual hashes. It might be that your random vacation photos and kitty pics have a 1-in-a-million chance of a fapo, but someone who happens to (say) live in an apartment that has been laid out very similarly to a scene in pictures appearing in the CSAM database may have a massively higher chance of fapos for photos taken in their home.

tzs4y ago

> It won't catch anything but the dumbest of dumb criminals

Dumb is a pretty accurate description of a large fraction of criminals. For the most part you only get smart criminals when you are talking about crimes where you have to be smart to even plan and carry out the crime.

BiteCode_dev4y ago

Given the average user don't know what a url is and a pedophile can use the darknet, I'd say criminals are not all dumb.

2 more replies

nullc4y ago

Yes, but when you admit that the target is just the dumb criminals, then why adopt a scheme that has false positives?

Decompress and downsample. Drop the least significant bit or two, maybe do it in the dct domain instead. SHA256. It'll preserve matching for at least some cases of recompression and downsampling. But finding an unrelated image that matches is as hard as attacking SHA256, the only false positives that could be found would be from erroneous database entries.

sabellito4y ago

> Dumb is a pretty accurate description of a large fraction of criminals.

Is there any reading on that? I'd love it to be true.

2 more replies

throwaway0a5e4y ago

You only hear about the criminals who get caught and crimes that go unsolved get blamed on these kinds of criminals.

prirun4y ago

> Is it really worth turning personal devices into snitches that don't even do a good job of protecting children?

Yes, because the point is not to protect children. It's to get everyone used to the idea that their content is being monitored. Once that is accomplished, other forms of monitoring can and will be added.

panta4y ago

Exactly. It's a Trojan Horse (https://en.wikipedia.org/wiki/Trojan_Horse) to make more pervasive individual control the new normality. The current motivations are just a pretext.

Retric4y ago

Perceptual hashes are only used to reduce the search space for human review. Apple doesn’t have images in the CSAM database to do a comparison, but if it’s just a picture of a door their going to reject it. Also, because human review is an expense Apple’s incentives are to minimize the number of times it happens, thus the requirement for multiple collisions.

woofie114y ago

I don't really want my family photos reviewed by strangers. "Reducing the search space" of photos on my phone isn't an outcome I want to live with. At the time someone is looking at photos of my, my wife/husband/girlfriend/boyfriend, and my kids, they'd better have a darned good reason (e.g. a search warrant).

I'd also appreciate if Apple let me know if my false positives were reviewed and found to not be CASM.

1 more reply

jdavis7034y ago

> Apple’s incentives are to minimize the number of times it happens, thus the requirement for multiple collisions.

How can we be sure they won’t cut costs by increasing worker load? I could see them giving each reviewer less time to review individual pictures before passing it on to law enforcement.

2 more replies

zionic4y ago

Apple's human review is largely useless.

Trolls will be able to easily use tools slightly modify ambiguous adult porn to collide with a "known CP hash".

A human reviewer will see a blurry grayscale derivative of adult pornographic content and hit "report" every time.

1 more reply

nullc4y ago

> Perceptual hashes are only used to reduce the search space for human review.

False. The Apple proposed system leaks the cryptographic keys needed to decode the images conditional on the match (threshold of matches) of the faulty neuralhash perceptual hash.

Matching these hashes results in otherwise encrypted highly confidential data being decodable by apple, accessable on their servers to the relevant staff along with anyone who compromises them or coerces them.

1 more reply

wyager4y ago

Edit: I incorrectly claimed there wasn’t manual review - see below

1 more reply

wpietri4y ago

I knew a probation officer for sex offenders. They told me that most of them were quite dumb. What the repeat offenders were, though, is dedicated. They had all day to try to avoid getting caught, and the PO had a few minutes per week per offender.

It's true that in any arms race, a given advance gets adapted to. This will surely catch a bunch of people up front and then a pretty small number thereafter as the remainder learn to avoid iPhones. But that's how arms races work. You could say that about almost any advance in fighting CSAM.

woofie114y ago

I think it's only the dumb ones who get caught.

Source: I've met a few white collar criminals.

1 more reply

mayoff4y ago

> It won't catch anything but the dumbest of dumb criminals, because those who care about CSAM can surely figure out a better way to share images

Apparently that better way is by using Facebook. Facebook made 20.3 million reports to NCMEC in 2020.

https://www.missingkids.org/content/dam/missingkids/gethelp/...

foxfluff4y ago

Yeah, Facebook's blog post makes me wonder what all the stuff they report actually is. When people say CSAM, I think "kids getting raped" but apparently there's stuff that people find humorous or outrageous and spread it like a meme (and not like pornography).

"We found that more than 90% of this content was the same as or visually similar to previously reported content. And copies of just six videos were responsible for more than half of the child exploitative content we reported in that time period."

"we evaluated 150 accounts that we reported to NCMEC for uploading child exploitative content in July and August of 2020 and January 2021, and we estimate that more than 75% of these people did not exhibit malicious intent (i.e. did not intend to harm a child). Instead, they appeared to share for other reasons, such as outrage or in poor humor (i.e. a child’s genitals being bitten by an animal)."

Based on this, I wouldn't conclude that FB is the platform where people pedos go share their stash of child porn.

Their numbers also include Instagram, which I believe is quite popular among teenagers? I wonder how likely it is for teens' own selfies and group pics get flagged and reported to NCMEC.

(https://about.fb.com/news/2021/02/preventing-child-exploitat...)

nullc4y ago

> Facebook made 20.3 million reports to NCMEC in 2020.

Which appears to have resulted in what... 5 prosecutions?

UncleMeat4y ago

> It won't catch anything but the dumbest of dumb criminals, because those who care about CSAM can surely figure out a better way to share images, or find a way to obfuscate their images enough to bypass the system (the lower the false positive rate, the easier it must be to trick the system).

Given the reported numbers of illegal images detected by similar systems within Facebook and Google, I think it is very clear that this will catch a lot of illegal content.

zionic4y ago

Facebook and google are not catching 20m people a year, they're mostly flagging and removing tor/proxy-based throwaway accounts.

volta834y ago

The false positive rate reported in the blogpost for imagenet was 1 in a trillion, and the author concludes that this algorithm is better than they expected.

foxfluff4y ago

"After running the hashes against 100 million non-CSAM images, Apple found three false positives"

So closer to 1/10M. The reporting threshold is made artificially higher by requiring more than one positive.

But anyway, that's beside the point.

A perceptual hash is not uniformly distributed; it's not a random number. Likewise for photos taken in a specific setting; they do not approach the randomness of a set of random images.

So someone snapping a photos in a setting that has features similar to a set of photos in the CSAM database may risk a massively higher false positive rate. It's no longer a million sided dice, it could be a thousand sided dice when your outputs happen to be clustered around similar values due to similar setting.

But I can't say I care about false positives. To me the system is bad either way.

1 more reply

numbsafari4y ago

Sometimes the best way to catch the really smart or sophisticated criminals is to exploit their less smart and less sophisticated accomplices, co-conspirators, peers, acquaintances, or even their victims.

devmor4y ago

The point of these innovations is never the stated purposes. To catch criminals is an excuse. I would bet a great deal that this system is by and large pressured by state actors for the purpose of creating a new political surveillance tool.

ak3914y ago

can try a web demo of it here on huggingface https://huggingface.co/spaces/akhaliq/AppleNeuralHash2ONNX

woofie114y ago

> False positives. Only false positives.

I really doubt this. In the long term, a few people Apple wants to frame will surely slip into the mix. If Apple didn't want Trump to win, a CASM flag a week before the election might do it.

user-the-name4y ago

> It won't catch anything but the dumbest of dumb criminals

This includes the vast majority of pedophiles.

rowanG0774y ago

Do you have any source that pedophilia correlates very strongly with low intelligence?

foxfluff4y ago

Where did you find the statistics about pedophiles' intelligence?

roody154y ago

Apple has yet to make a valid reason for implementing client side CSAM scanning.

According to Apple only images that will be uploaded to iCloud will be scanned.

If this is the case there is zero reason to scan locally and you can just scan the uploaded image once it is on the server.

Apple has not implemented E2E nor has it released a statement indicating this will be implemented in the future.

simondotau4y ago

There is an interesting constitutional quirk which arises from the scanning being done client side, specifically for US citizens. If the US Government forced Apple to add other entries to the hash table, this would constitute a warrantless Government search of the private physical property of US citizens. This is a clear-cut, unambiguous breach of the 4th Amendment.

Whereas if the CSAM scanning was performed exclusively in the cloud, protection under the 4th Amendment does not exist as it would likely fall under the third party doctrine.

Now I'm not saying the US Government would let mere unconstitutionality get in the way of any surveillance program. But Apple would. You don't think Apple wouldn't be itching for another opportunity to flex in public? Especially now, with their reputation on the line? Apple would love nothing more than to have more opportunities like they got with the San Bernardino iPhone.

jrockway4y ago

Apple could also encrypt every upload to iCloud, and not have any scanning on the client, and still be able to say to the government "sure, you can have the files; we can't read them and neither can you". Apple wants to reduce your privacy from the government above and beyond what the law requires. The questions is: why?

3 more replies

rz2k4y ago

I think you're implying that scanning of private personal property by a corporation without a warrant protects users from searches of their content in the cloud that is authorized by a warrant or national security letter. I don't understand the mechanism if there isn't end to end encryption, and I don't understand the mechanism if there is end to end encryption.

Scanning makes phones a greater threat, and also erodes the expectation of privacy that is a legal barrier to surveillance.

1 more reply

jdavis7034y ago

What kind of non-CSAM crime could be detected with just a couple of hashes? Wouldn’t Apple need to reduce the similarity score in order to even get something close?

ok1234564y ago

Also Fifth Amendment. People are being compelled to testify against themselves by running this CSAM scanner. Apple's end-to-end encryption was sold to users as exactly that.

To all of a sudden introduce this scanner doesn't negate the expectation of privacy as that is how it was sold and marketed. There is an implied warranty of merchantability of how this service functions.

alisonkisk4y ago

Are you a lawyer or legal scholar, or just guessing?

The government can't compel warrantless searches of Apple. 3rd party doctrine means Apple can search your iCloud, and can give it away if they choose. Same as how Apple can search your phone if you run their software, and can give away whatever they find if they choose.

rangerdan4y ago

> If the US Government forced Apple to add other entries to the hash table, this would constitute a warrantless Government search of the private physical property of US citizens. This is a clear-cut, unambiguous breach of the 4th Amendment.

There's no reason not to assume this isn't already happening, being closed source and proprietary. The question to ask is, what are we going to do about it?

1 more reply

wyager4y ago

This is already a warrantless search that’s effectively controlled by the government. Obviously there’s enough chaff in the air to prevent that from being legally useful in any way.

2 more replies

koolhaas4y ago

Presumably, it’s done this way so they can say computers other than your personal device do not scan photos and “look” at decrypted and potentially innocent photos. And technically the original image is never decrypted in iCloud by Apple - if 30 images are flagged they are then able to decrypt the CSAM scan meta data which contains resized thumbnails, for confirmation.

In summary, I’m guessing they tried to invent a way where their server software never has to decrypt and analyze original photos, so they stay encrypted at rest.

roody154y ago

Apple frequently decrypts icloud data including photos based on a valid warrant. This new local scanning method does not stop apple from complying and decrypting images like they have for years.

https://www.apple.com/legal/privacy/law-enforcement-guidelin...

(Note: I have worked with law enforcement in the past specifically on a case involving Apple and two iCloud accounts. You submit a PDF of the valid warrant to Apple. Apple sends two emails one with the iCloud data encrypted. A second email with the decryption key.)

1 more reply

grlass4y ago

calling resized thumbnails metadata is a bit of a stretch imo.

Surely that's just the data, but resized?

2 more replies

zabatuvajdka4y ago

Interesting technical problem/solution. Another benefit is saving on millions of server computations when modern iOS devices have neural chips etc.

I suppose folks who don’t like privacy implications can downgrade to an iPhone 4 and maybe it will not support the feature.

1 more reply

Grustaf4y ago

Most people feel that things that happen on your device are safer than things in the cloud, you have probably noticed how Apple constantly stress that this or that happens "on device".

And for the suspicious, it's of course much easier to notice if Apple would change their algorithms if they happen on device.

YetAnotherNick4y ago

Also they will be doing scanning in their server anyways as there are other ways to upload it in iCloud than using latest iOS.

user-the-name4y ago

Apple basically never announces things before they are ready to be released, so them not announcing this means very little. They may be working on it, and their usual secrecy is biting them in the ass very hard.

mrweasel4y ago

One reason for client side could be to save on datacenter compute resources. That would seem like a perfectly valid reason, if that’s their reasoning.

AnonC4y ago

If it’s going to really save a significant amount of data center resources, then it’s also probably going to reduce the battery lifespan of all these devices significantly. That may probably be good for Apple’s bottom line temporarily, but it will hurt in the long run. I’d imagine it’d be a lot easier to optimize the data center compute resources than optimizing the scanning on individual devices and not trashing battery lifespan.

1 more reply

jtbayly4y ago

If that’s a valid reason to steal electricity and compute resources from your customers, then why not go the whole way and use all the Mac’s as storage and compute for iCloud?

websites20234y ago

> If this is the case there is zero reason to scan locally and you can just scan the uploaded image once it is on the server.

You’re having a house party. Because of the pandemic, you’d rather people who have COVID not attend. You can’t trust everyone to get vaccinated or get tested beforehand. So, you decided to set up a rapid-test system, just to be sure.

Would you rather test in your kitchen or your driveway?

addingnumbers4y ago

Why the contagion analogy?

If contagion wasn't a factor, I'd rather test in the kitchen, it's cozier.

Are you suggesting CSAM will infect more unwilling victims if it gets into a private iCloud account?

1 more reply

toxik4y ago

Sigh, for the last time, it doesn't actually matter if the NeuralHash is identical. You need multiple images matching, and then the images are compared by another system on Apple's end, which you don't know anything about.

The system is specifically designed so that colliding images does not pose a threat to the user.

NeuralHash and the CSAM scanning is grotesque, but please, criticize it for what it is, not some bullshit that is easily dismissed as technical ignorance.

tsimionescu4y ago

Then let's get rid of the NeuralHash entirely, if it doesn't matter, right?

If it's a critical part of the system, then it should be inspected thoroughly. If Apple claims a minuscule chance of a hash collision, and the reality is that collisions are relatively common, that significantly changes the requirements for the backend system, which Apple keeps secret. We have every right to believe, bbased oon ppublic info, that Apple was expecting that NeuralHash would be almost fool-proof, leaving the backend system to be a rubber stamp. This would be tragic.

toxik4y ago

The point in the NeuralHash and PSI system is to preserve user privacy as far as possible. From a technical standpoint, it is not essential - a NeuralHash function that returns 0x0000… for everything would still catch CSAM. It's just that it would upload every single image on the user's device.

Now, how well this NeuralHash does preserve privacy is a different question, and /not/ one that is being answered by the original post here. In fact, I've not seen anybody look at the hash distribution over natural images, which would be an actual argument against the system.

3 more replies

rgovostes4y ago

The conclusion section of the article associated with the GitHub repo linked here is that collisions are not common and Apple’s published collision probability matches their findings. Furthermore the thresholding scheme requires 30+ independent collisions which is astronomically improbable.

david_draco4y ago

> collisions are relatively common

Are they?

collaborative4y ago

We still need to rely on the automated "secret backend system" that nobody supposedly knows anything about

toxik4y ago

You (or at least Apple's customers) trust in and rely on Apple's proprietary software to do its job all the time. How is this different? I find this argument very weak.

5 more replies

saithound4y ago

Discussing the preimage attack on NeuralHash is not technical ignorance. Dismissing the preimage attack as irrelevant is.

0. Most importantly: the existence of a preimage attack makes Apple's system completely useless for its original purpose. The NeuralHash collider allows the producers and distributors of CSAM material to ensure that nearly all of the next generation of CSAM will suffer from hash collisions with perfectly innocent images. Two weeks after it was deployed, Apple's CSAM scanning is now _only_ an attack vector and a privacy risk. Thanks to the preimage attack, it's now completely useless for its nominal function! Apple put a lot of effort into a system that reduced the privacy and security of all their customers, and made the company itself more exposed to the whims of governments. And for no gain whatsoever.

1. There are no known perceptual hash functions on which preimage attacks are difficult. Barring a major "secret cryptographic breakthrough", Apple's second hash function is not resistant to preimage attacks either. In fact, the second algorithm is almost certainly easier to attack than NeuralHash itself, since it has to work on the "visual derivative", a fixed-size low-resolution thumbnail of the original image.

2. But isn't Apple's second algorithm kept secret, making it difficult to perform preimage attacks against it? No.

First of all, the second algorithm cannot be kept secret. Apple doesn't have its own CSAM database (the whole point is that they don't want to deal with CSAM on their servers!), so the algorithm has to be shared with multiple organizations which do have such databases, so that they can pre-compute the hashes that Apple will match against. Due to Apple's policy, some of these organizations will be located outside the US [1]. Chances are, the hash function will leak: Apple won't know if and when that happens.

Secondly, this _is_ security by obscurity. Some people argue that keeping the hash algorithm secret is similar to keeping a cryptographic key secret. This is not the case. Of course, any security system relies on keeping _something_ secret, but these secret somethings are not created equal. The secret keys of cryptographic algorithms are designed to satisfy Kerckhoffs's assumption. This means that the key, as long as it remains secret, should be sufficient to protect the confidentiality and integrity of your system, even if your adversary knows everything else apart from the key, including the details of the algorithm you use, the hardware you have, and even all your previous plaintexts and ciphertexts (inputs and outputs).

The second hash does not have this property at all. Keeping the algorithm secret does not ensure the confidentiality or integrity of Apple's system. E.g. if somebody gets access to a reasonable number of inputs-output examples, that allows them to train their own model which behaves similarly enough to let them find perceptual hash collisions, even if they don't know the exact details of the original algorithm. This is incredibly hard for cryptographic hashes, but very easy for perceptual hashes, since a small change in the input should cause only a small change in the output of the perceptual hash algorithm. So, to maintain security, Apple doesn't have to keep just the hash algorithm (or its configuration parameters) secret, but all the inputs and outputs as well. This is bad: the fewer and simpler the secrets that one must keep to ensure system security, the easier it is to maintain system security.

Finally, the second hash algorithm is unlikely to be original (NeuralHash was original, and by all accounts it was a massive effort). If an attacker successfully guesses that Apple's secret algorithm H is closely related to a known algorithm, say PhotoDNA, they will probably be able to make a transfer attack against it. By engineering a PhotoDNA collision on the resized thumbnail (e.g. via a resizing attack, extensively discussed in a previous thread [3]), they have a reasonable chance of generating a H-collision as well. How good is fairly good? Well, something like 5% is more than enough! The attacker needs to produce a certain number of NeuralHash collisions (say 30 images) to get through the first threshold of Apple's algorithm. But after that, Apple will decode all the thumbnails in the user's safety voucher: the attacker only needs one of those 30 to get through the second hash. Given a sufficiently high probability of hash collisions, this can be achieved "blindly".

3. It's incredibly easy to come up with these kinds of attacks. Even the HN audience could come up with several reasonable plans, and could point out several reasonable issues, in two weeks. People who do malice for a living will have a much easier time with it. Even if somehow all the plans presented on HN turned out to be unviable, it will not take long for someone to stumble upon something practical. Any reassurance that Apple could provide at this point is fake. Cf. the timelines for real security: it took 17 years to come up with an analogous attack against SHA-1 [4], and two years after that to turn it into something that can be exploited in practice [5]. The existence of a preimage attack made Apple's system completely useless for its original purpose in two weeks. It's now just a security and privacy hole, with no other function. Keeping it around would be a travesty, even if it was difficult to exploit. But it's not.

[1] https://www.itnews.com.au/news/apple-to-only-seek-abuse-imag...

[2] https://en.wikipedia.org/wiki/Kerckhoffs%27s_principle

[3] https://news.ycombinator.com/item?id=28236102

[4] https://security.googleblog.com/2017/02/announcing-first-sha...

[5] https://www.zdnet.com/article/sha-1-collision-attacks-are-no...

simondotau4y ago

That's a lot of words to say that you could sufficiently mangle an image that it could pass through all of Apple's algorithmic hurdles while not actually being CSAM. Of that I have no doubt. You could definitely generate a mangled image that fools multiple perceptual hash algorithms.

Let's set aside the questions of where you got all these hashes to generate collisions with, how you got 30 of these mangled images into your victim's camera roll without them noticing. And let's also set aside whether your victim's device is an iPhone with iCloud Photo Library enabled (and has sufficient storage). I still don't get what these mangled images have achieved, other than giving the manual review team something other than child porn to look at.

Seems to me like it'd be easier to just find actual child porn, print it out, place it somewhere in the victim's house and then report it to the police.

1 more reply

yeldarb4y ago

> allows the producers and distributors of CSAM material to ensure that nearly all of the next generation of CSAM will suffer from hash collisions with perfectly innocent images

That’s a really interesting attack vector I hadn’t seen mentioned previously.

Most people are talking about the potential for adversarial images to be sent to users. If they were instead injected into the database itself (either by poisoning real CSAM or social engineering) that would have far wider ramifications.

I wonder what the most widely-saved pornographic images are across iCloud users.

If actual CSAM were perturbed to match the hash of, say, images from the celebrity nude leak a few years back and added to the database then thousands of users could be sent to “human review”. Since the images are actually explicit how would the human reviewers know not to flag them to authorities?

1 more reply

nullc4y ago

(A threshold of) matchings result in the private keys for the images being leaked to Apple, where they're vulnerable to:

(1) Review by apple staff (2) Access and leaking by other apple staff (3) Access by hackers who have compromised their system (4) Access by parties coercing apple/staff, including via national security letters.

All of which compromise the privacy of the user. This matters or the neuralhash comparison wouldn't exist in the first place.

Totally agree that the whole system is grotesque-- but that doesn't stop it also being grotesque in every detail as well. The fact that there are false positives when they easily could have designed a system that had none (at the expense of increased false negatives) shows that Apple doesn't especially value customer privacy even if you accept their vigilante privacy invasion. The fact that it's possible to construct adversarial false positives and that their reports didn't disclose this fact shows they either don't know what they're doing or they're not being honest about the risks (or both).

scotty794y ago

Why are exact collisions interesting? They are not intended to be compared exactly.

This algorithm doesn't even give exact matches for the same image on different hardware.

https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX

Note: Neural hash generated here might be a few bits off from one generated on an iOS device. This is expected since different iOS devices generate slightly different hashes anyway. The reason is that neural networks are based on floating-point calculations. The accuracy is highly dependent on the hardware. For smaller networks it won't make any difference. But NeuralHash has 200+ layers, resulting in significant cumulative errors.

tgv4y ago

The hash is 96 bits long. When hashing 1 billion pictures, that gives a collision probability of 6e-12. If it were uniformly distributed. There's no way people have hashed billions of images already. It just shows that it's pretty probably there will be collisions, and on visual inspection, it looks as if the collisions will happen on visually similar images. So if there's a naked baby pic in the CSAM database, quite a few of you 100s of child pictures can be flagged.

halflings4y ago

Clearly this is not a cryptographic hash, and hence it's known hashes are not uniformly distributed.

Apple explained in their technical summary [0] that they'll only consider this an offence if a certain number of hashes match. They estimated the likelihood of false positives there (they don't explain which dataset was used, but it was non-CSAM naturally) is 1 out of a trillion [1]

In the very unlikely event where that 1 in a trillion occurrence happens, they have manual operators to check each of these photos. They also have a private model (unavailable to the public) to double-check these perceptual hashes which also used before alerting authorities.

[0] https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni... [1] https://www.zdnet.com/article/apple-to-tune-csam-system-to-k...

1 more reply

GistNoesis4y ago

Afiu, Apple's NeuralHash uses exact collisions when they do their Private Set Intersection.

The main advantage of using exact collision is that you can then blind the perceptual hash with a cryptographic hash and avoid any leak of information. (Taking for example sha256 of this perceptual hash won't allow any attacker to get any information on the features from the hash, but if the perceptual hash are the same then the input of the sha256 is the same and therefore the output of the sha256 is the same).

This is important because it alleviates the risks of an eventual leaking the database as Apple never touched and compared sensitive content but only cryptographic hashes of the perceptual hashes.

Some other system like PhotoDNA, rely on a euclidean distance between features being less than a threshold to register a match, which allows to quantify how far the image is from CSAM, but mean that the hash leak some information about the original content.

scotty794y ago

But they can't use exact matches because their algorithm doesn't create same hash for same image on their various platforms due to hardware differences in floating point arithmetic on their different devices. Unless they have on their servers floating point arithmetic emulators that can calculate exact hash for each of their different devices for each offending image then they can't only match exactly.

specialist4y ago

Per ATP: Apple will compare hashes of local photos with a national registry of child pornography photos. Once a certain (unknown) threshold is reached, let's say 20 hits, some kind of escalation occurs, with some kind of manual (human) review steps.

Accidental Tech Podcast - A Storm of Asterisks https://atp.fm/443

I haven't listened to the follow up episode yet.

I still have zero opinion on this photo scanning kerfuffle. I just don't know enough. Of all the "hot takes" on this issue, ATP's has been the most comprehensive. So appreciated.

pbronez4y ago

Thanks for the link. Daring Fireball’s analysis is the most helpful take I’ve seen on the topic so far:

https://daringfireball.net/2021/08/apple_child_safety_initia...

hackinthebochs4y ago

I'm interested in the details of the manual review. Does Apple have access to the database of original images and will they use it compare? If not, I can imagine a scenario where a photo of a naked child is flagged as matching the database, the human reviewer sees that it is in fact a naked child and assumes this must mean the image has been legally determined to be child pornography. The case gets referred to authorities and the innocent victim's life is upended for potentially years until the case plays out.

1 more reply

scotty794y ago

So there's a national database of all child pornography that Apple has access to? How is that access legal?

yeldarb4y ago

That is a good point; has Apple stated how many bits two images’ NeuralHashes can differ by and still be considered a “match” by their system?

cannabis_sam4y ago

> Why are exact collisions interesting?

1. What does “exact” mean to you in this context?

2. What else is more interesting about a hashing algorithm used to identify things, other than its collision rate?

scotty794y ago

Ad.1 Same value of each of 96 bits of the hash.

Ad.2 If hashes are to be matched approximately, not exactly, for example will be considered a match if they differ in less than 3 bits out of 96 then the most interesting thing should be how many collisions you can find if you compare them like that.

umanwizard4y ago

That explanation doesn’t make sense to me. Standard (IEEE) floating-point math is deterministic; it shouldn’t depend on the hardware.

user-the-name4y ago

The processors that implement hardware-accelerated neural nets are not necessarily using standard IEEE floating-point math.

nullc4y ago

> They are not intended to be compared exactly.

Apple's private set intersection which leaks the keys to decrypt the images coniditional a neuralhash match requires an exact match.

They probably didn't realize they got different results on different toolchains/devices, since they target a mono-culture and the whole subsystem shows fairly little careful thought went into it. They could easily make an exact integerized version which would be consistent.

It would still be broken. :)

chefandy4y ago

... does it have to prove Apple wrong about something to be interesting?

ajb4y ago

If you can get exact collisions, this can be gamed. For example, suppose there are two rival gangsters. One wants to set the police on his rival. He knows that a certain (innocuous) image is on his rival's phone. So he pays someone to generate a fake child-porn image with the same neuralhash, and ensure that it gets into the child porn DB. Then, apple reports the rival to the police, and they come and investigate him. OF course, they may notice that the image isn't the right one, but by that time they may have found other incriminating evidence.

Not sympathetic to a rival gangster? Ok lets find an innocent victim: not a rival criminal, but an innocent witness who our protag wants to intimidate. Gangster wants to intimidate the witness, but can't get at them, so cooks up a scheme to convince the witness that the police are in his pocket. Exactly as above, causing the police to investigate the witnesses phone.

Another one might be, a certain government wants to identify opposition groups using images associated with them . Apple is not keen to be associated with that, but the government can simply generate fake child-porn (remember, programmatically generated CP is just as illegal) for each image of interest.

monkeynotes4y ago

I think before a criminal investigation, or any investigation at all is pursued, a human verifying the images would dismiss the false positive.

I would think surreptitiously placing actual child porn on a rival's phone/computer would be much, much more effective.

Cybercriminals could likely do all this remotely. Phish for apple account login, upload images. Done.

3 more replies

Someone4y ago

If your enemy can get an image on your phone and in the “child porn DB”, I think they can easily get you in trouble without Apple’s help.

they can either just send the police an anonymous message or set up a child porn web site and have it ‘accidentally’ leak its password database, and make sure your email address is in it.

1 more reply

AnonC4y ago

A very relevant point on this entire discourse about Apple’s on-device CSAM scanning:

According to the U.S. law, key snippets of which are quoted on the Stratechery blog (by Ben Thompson), Apple isn’t obligated to scan for CSAM. It’s only obligated to act on CSAM if it finds them.

While it’s good for Apple to scan on its systems (iCloud) like Facebook, Google and other companies do on their servers, it’s inappropriate to do it on individual devices, which starts with the assumption that anyone who has iCloud photos enabled is a potential CSAM hoarder and needs to pay with their device’s battery life and time for the scanning to happen and report back. It’s a sort of micro-robbery that Apple is doing on the devices when there is no legal compulsion to do so.

Everything else on trusting Apple’s NeuralHash or the sanctity of the NCMEC hashes come later, IMO.

I sincerely hope Apple realizes that it’s got a dud solution on hand, eats humble pie (which it’s usually not capable of) and ditches this whole thing. I know a lot of egos at Apple are at stake here. But doing the right thing matters for a company that claims that “privacy is a fundamental human right” and has a CEO who’s a member of a marginalized/discriminated community and understands the risks of these efforts.

fleddr4y ago

"While it’s good for Apple to scan on its systems (iCloud) like Facebook, Google and other companies do on their servers, it’s inappropriate to do it on individual devices"

I would even challenge the justification to do this on servers, unless the data is public. If it's behind a personal login, you might as well consider it personal property/data. I find the distinction of where data is stored not very meaningful.

Allowing things to be searched for criminal content just because it's not in your immediate physical sphere makes no sense. It doesn't work like that in the physical world either. When I send a letter, and it leaves my house, no authority has the right to check its contents without a legitimate reason. Likewise, if I put stuff in a storage box in some warehouse, no authority can search it without a warrant.

Note that I'm talking about personal storage (iCloud, Gmail), not public social networks like Facebook.

hackinthebochs4y ago

Absolutely. If I own my data, someone processing this data on my behalf has no right or obligation to scan it for illegal content. The fact that this data sometimes sits on hard drives owned by another party just isn't a relevant factor. Presumably I still own my car when it sits in the garage at the shop. They have no right or obligation to rummage around looking for evidence of a crime. I don't see abstract data as any different.

Our major privacy blunder was accepting scanning of private data in any context. The fight should be for the absolute privacy of personal data. Where the scanning happens is mostly irrelevant.

1 more reply

Lamad1234y ago

They say only if they find 30 matching images, they'd act. So if they find 20 or 29 and don't report them, they are actually breaking the law!! I am wondering why they chose that magical number!

Tagbert4y ago

Just getting a match on the neural hash is not sufficient to declare an image CSAM. It just flags it as a candidate for review. Apple is not required to report it unless it can be verified to be CSAM. there is a multi-step process involving multiple processes and organizations to do so.

1 more reply

nullc4y ago

They think they've outsmarted the law on that one. The system is setup to generate a bunch of false positives on its own, so a "match" which is below threshold may not actually be a match-- apple can plausibly deny it.

It's unclear if their claimed threshold of 30 is before or after the false positives they intentionally introduce. I'm going to guess it's before.

slownews454y ago

"This is a false-positive rate of 2 in 2 trillion image pairs (1,431,168^2)"

That is not bad. As a tool to filter down what apple human reviewers need to look at this is pretty good.

Ultimately these images will make it to a human reviewer who can make a call as they would in any flagging system.

Could a backend server side system do a more precise hash (96 bits is not a ton) prior to human review?

saithound4y ago

Keep in mind that Apple's claimed false positive rate (one in a trillion chance of an account being flagged innocently), and the collision rate determined by Dwyer in the blog post linked from the repo [2], are both derived without making any adversarial assumptions. Given that NeuralHash collider and similar tools already exist, the practical false positive rate is now expected to be much much higher.

Imagine that you play a game of craps against an online casino. The casino throws a virtual six-sided die, secretly generated using Microsoft Excel's random number generator. Your job is to predict the result. If you manage to predict the result 100 times in a row, you win and the casino will pay you $1000000000000 (one trillion dollars). If you ever fail to predict the result of a throw, the game is over, you lose and you pay the casino $1 (one dollar).

A casino that makes no adversarial assumptions about the clientele could argue as follows: the probability that you accidentally win the game is much less than one in one trillion, so this game is very safe, and the House Edge is excellent [3]. But this number is very misleading: it's based on naive assumptions that are completely meaningless in an adversarial context. Some of the clientele will cheat. If your adversary has a decent knowledge of mathematics at the high school level, the serial correlation in Excel's generator comes into play [4], and the relevant probability is no longer less than 1/1000000000000. In fact, the probability that the client will win is closer to 1/216 instead! When faced with a class of adversarial math majors, a casino that offers this game will promptly go bankrupt. With Apple's CSAM detection, you get to be that casino.

(reposted based on my comment on last week's thread [1])

[1] https://news.ycombinator.com/item?id=28236102

[2] https://blog.roboflow.com/neuralhash-collision/

[3] https://wizardofodds.com/gambling/house-edge/

[4] How to crack a linear congruential generator? http://www.reteam.org/papers/e59.pdf

SiempreViernes4y ago

Yeah, of course the collision rate in an adversarial dataset is likely to be much higher.

But I really wonder why you think this is an important objection, do you think a lot of people want to go to the "get flagged for child porn" casino?

2 more replies

saynay4y ago

This argument is utterly incoherent. Of course they don't include into the false positive rate images that are intentionally designed to generate positive hits, what would be the point of that? The only interesting metric is the false positive rate for normal images.

1 more reply

willis9364y ago

2 collisions out of a million images. I'm not sure how big the CSAM database is but if it's a tens of thousands and there are millions of photos uploaded a day then Apple could have a problem on their hands. This is all extrapolating from a study that doesn't use photos representative of what people actually upload. I would suspect when most photos being uploaded are of humans the actual collision rate will be much higher.

grlass4y ago

This blogpost [1] by security researcher Sarah Jamie Lewis suggests that the false positive rate could be very high:

[1] https://pseudorandom.resistant.tech/obfuscated_apples.html

1 more reply

xadhominemx4y ago

They don’t take any action unless you have 30 matches in the database, which will not happen by chance.

2 more replies

ckastner4y ago

I wouldn't be surprised if 1,431,168 photo uploads is what iCloud sees in an hour.

nitrogen4y ago

The technology is not why the Apple system is unwanted. It's just extra fuel for the fire.

This system is unwanted because it puts a spy literally in your house and in your hands. It's bad enough that cloud everything blurs the line between what's yours and what's mine. Placing any law enforcement tech on a user's own device takes that line between "public" and "private" and completely erases it.

nullc4y ago

Absolutely. The problem is Apple introducing a spy into your home.

This alone should be bad enough, but some people are rather trusting. Showing that the spy is also tripping balls both exposes additional risks and emphasizes that Apple neither has their best interest at heart nor is putting adequate care into their actions. The latter gives people reason to question apple's claims of additional protection mechanisms that are non-falsifiable.

nobrains4y ago

Please help me understand. Isn't this the reason why the process involved a final manual review? If so, isn't the point of having identical hashes moot? Or is the point that having more identical hashes means reviewing more personal pictures manually, leading to a privacy issue?

mns4y ago

I don't think I would trust a huge corporation with this. Plus, leaving the review to some internal classified process where some poor faceless guy needs to reach an unrealistically high quota of reviewed images per day to get his bonus, might be a bit of a risk.

madeofpalk4y ago

it's not just internal policy - the safety vouchers will not decrypt (technically impossible) unless there are ~30 matches. It is a policy encoded in cryptography.

1 more reply

StrLght4y ago

I don't really get what this repository is trying to achieve and what's the point of collecting collisions. Collisions will happen, that's just how it is with hashes.

It's already a public knowledge that Apple has 2 more systems (some server-side verification and a manual check later) to prevent false-positives. So what's the point of researching collisions in NeuralHash?

oauea4y ago

Are you fine with an apple employee looking at all your private pictures just because some hashing algorithm decided you're a pedophile? Personally, I'm not.

StrLght4y ago

First, this doesn't change my opinion on CSAM, I still consider the thing way too intrusive until Apple announces E2E for iCloud.

Now, I can't really call something I voluntarily uploaded to Apple's servers a "private picture". But that's just a matter of perspective, and I understand that many people would disagree with me on this.

1 more reply

dhosek4y ago

I'd argue that the hash collisions (both natural and synthetic) that I've seen give me more confidence in the system, not less.

On the natural hash collisions (of which there are two), we have objects of similar shape against a solid background. It seems that a natural hash collision of a CSAM image would be unlikely (or if it does occur, it would be something that perhaps is also an infringing image).

As for the synthetic hash collisions, there are visible artifacts in the picture that, if you compare with the original picture, make the overlay of the original picture on the synthetically generated hash collision obvious. Could people get tricked into downloading memes¹ with synthetically generated hash collisions? Sure, people are idiots. But I'm guessing the majority of folks will look at the picture and say, this is a sh*t picture in this meme and download something else.

1. And that, of course, assumes that meme hosters don't apply similar scanning techniques to what they serve up.

nullc4y ago

> this is a sh*t picture in this meme and download something else.

NO. Adversarial preimages can be created that look like perfectly normal images. Please stop repeating this falsehood.

Here are some examples I generated (with a link to more):

https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issue...

1 more reply

woofie114y ago

No. Most proper cryptographic hash systems (e.g. used for verifying files, rather than data structures) never have collisions.

Try to find a SHA256 collision.

Anywhere, ever, in the history of mankind.

This isn't for lack of looking. A lot of very smart people have looked for them. If you find one, I bet you'll be eligible for a tenured faculty slot at a good university, if not more. A whole world of secure systems would need to be re-engineered.

Hypothetical collisions of course exist, by the pigeonhole principle, just not in the real world.

StrLght4y ago

Yes, but cryptographic hashes are irrelevant here because they'd allow to easily bypass CSAM by modifying/appending a single byte.

2 more replies

mlajtos4y ago

If I found one collision by accident, would that be any significant?

1 more reply

yosito4y ago

> a catalog

Can two collisions really be called a catalog?

dathinab4y ago

It's a WIP catalog where everyone who stumbles over one can put it in.

It could in the future be used to e.g. improves this algorithms.

yeldarb4y ago

PRs welcome!

eesmith4y ago

A catalog of a thousand pages begins with the first entry.

nannal4y ago

And a story may start with the first word, but if I present the word "Octopus" and say check out my story, you're going to be well within bounds to question me on it.

2 more replies

supperburg4y ago

It would be a shame if thousands of people regularly uploaded hash collisions to their iCloud overwhelming apples human review capacity

theshadowknows4y ago

I’m glad that people are trying to figure out any technical flaws in the system as best they can, but if I’m being honest I do trust Apple’s engineers to have built something that is solid from a technical stand point.

Am I correct in that the primary reason folks are so upset is that the system could (probably) be easily modified such that -any- content could invoke legal action? That the main problem is really the scanning at all, and not the chances that it could be attacked by an individual actor but instead by a government?

read_if_gay_4y ago

Governments don’t get to search your house because some people out there have CP at home. Why should your smartphone be different?

ryeguy_244y ago

This sums up the frustration very eloquently.

pille4y ago

I can’t speak for everyone, but that’s certainly a technical part of it. Another big part of the problem is that it’s insulting to presume everyone guilty, and make them to use their own resources (own phone, own battery cycles) to investigate them as if they were suspects. But that’s been discussed plenty on other threads here at HN.

tucosan4y ago

It might be solid from a technical standpoint. Once you built it, governments will be coming and asking for more. Are you aware that the Chinese government already has been granted access to the infrastructure holding the keys to iCloud in China?

peteretep4y ago

Exactly that. The tech seems fine, but I live in a country with a government that has strong censorship laws, and I do not trust Apple to not bend to countries like China in extending this to political content.

teekert4y ago

I still think the biggest problem is that at some point a human is going to look at a false positive, this may be picture of my naked children and this human may not have the best intentions with my picture.

That said, Nextcloud is my backend and I do not upload anything to iCloud (except for MS authenticator 2fa backups), so I'm safe right?

theshrike794y ago

So your threat model here is that the person at Apple tasked to check for Child Porn is an actual Paedophile and might accidentally see a false positive of your child's naked photo?

You do know that they don't see the whole photo at full megapixel resolution? They're just given "a visual derivative" of the photo for checking.

Also, you really think that the persons tasked with this process are just randos off the street and not vetted specifically?

teekert4y ago

Are you saying randos flock to this position?

And where do you get the "visual derivative" information? Apple sure didn't communicate that to me. All I know is some person may look at my pics at some point.

1 more reply

Grustaf4y ago

Since the risk of that is 1 in a trillion, a lot of people are quite happy to take that risk.

dathinab4y ago

It's not 1 in a trillion, it's MUCH higher.

We are not speaking about a situation where not a "arbitrary" picture is miss-classifieds.

We are speaking about a situation where a innocent picture involving a naked or not fully clothed child is deemed similar to a non innocent picture of a naked or not fully clothed child.

Now you might argue that there should not be a picture a a naked or not fully clothed child of any form ever on any phone, but IMHO that is short sighed, discriminating and at best shows you don't know to much about the world and other cultures.

Let's list some simple reasons such a think could happen first:

- Photos meant for a doctor, or living partner to ask if something is normal or a problem. In many different ways.

- Photos of little children bathing or similar a e.g. dad sends to their mom who is currently on a business trip.

- etc.

Reasons people are less aware of is that not all countries are as stuck up about nakedness especially in the family. So it's totally normal for families that e.g. before or after taking a shower family member independent of age and gender walk through the apartment naked. Similar if you didn't got any shame about the naked body indoctrinated you might totally do thinks like visiting a "naked-beach" with your family (meeting other families and taking advantage of it often being less crowded) and in turn normal innocent beach family pictures contain naked children. And on itself that's not a problem. But with Apples approach stuff like this is like to trigger both systems Apple announced and wrongly label your while family as pedophiles...

2 more replies

teekert4y ago

1 in a trillion with a billion devices with 10k pics is not a small chance. But what do we know? It's not like Apple is communicating any numbers anywhere so we can make a reasonable guess as to the number of false positives, and as to whether we may think it is actually worth it.

1 more reply

nextlevelwizard4y ago

How are any of these "naturally occurring" when all (4) examples are things cut out of context on a white background.

Yeah two sticks (ski and nail) are visually similar on a white background. Why is this news to anyone?

EDIT: if you are going to downvote please leave a comment unless you are just downvoting for wrong think.

verygoodname4y ago

As it is explained in the "readme" part, in this specific context, "naturally occurring" means that no one has purposefully manipulated any of the images to make them collide: that the images were already published and "out there" and happen to collide. In other words, it does not necessarily imply that the images correspond to natural photographic scenes (which seems to be your interpretation of it).

Besides, you could probably "naturally" obtain such type of colliding images by photographing similar-looking objects against a white (or generally featureless) background. Furthermore, it suggests/demonstrates that similar-looking images with similar backgrounds can lead to unexpected collisions in practice (i.e. "naturally"), even if you do not assume an adversarial scenario.

Are you sure that, if you take a picture of a naked body part, it won't collide with anything that looks similar in their database?

nextlevelwizard4y ago

It is unlikely unless you manage to capture some position and happen to have some background. This whole thing is a nothingburger. This is one of those weird things were many people have baseless gut reactions and then try to go and prove if flawed even though they don't have a complete picture.

It is unlikely that there is a collision of benign image with the database and even if that happens it is not some automatic process that just sends cops to your house to raid it.

Of course we can get bunch of collitions with essentially same images, I don't get why this is so magical just squint your eyes and I'm sure you have two objects with in your reach that could be made to collide, but that isn't a gotcha on any level

1 more reply

Cyberdog4y ago

Isn't a hash collision from similar images the point of the whole thing?

At any rate, IANAL, but I'm pretty sure you can't be convicted based on a hash alone. If you get busted for possession of a picture of a nematode and you can show the jury it's just a picture of an axe that has the same value when run through this algorithm, you'll be fine. And there's a decent chance prosecutors won't chase down individuals who will just have a single collision in their photo library with this tech in the first place - people who have dozens or hundreds will be much more interesting.

ya3r4y ago

Technically speaking, this does not prove that an adversarial attack is possible on the CSAM system of apple, Given that apple has another not released neural hash system on their servers which is potentially larger and works better than the one on device.

The more interesting technical question for me is: do collisions transfer across models? or how to find collisions that transfer across models?

erdos4d4y ago

Is it possible for the courts to use this system to search a defendant's phone for leaked documents say? Like if NSA learns that one of a small group leaked document X, can they get a court to force Apple to add the hash of Document X to the database on that group of people's phones? If so, I bet this becomes the new norm for investigating leaks.

tyingq4y ago

I guess don't upload pictures of peaches, poppy buds, phallic cacti, and so on to iCloud.

JoshTko4y ago

The threshold of collisions Apple is using before review is 40

dathinab4y ago

I think no one is anymore afraid of 40 accidental natural image collisions.

But un-natural image collisions or bad images in the database and similar are a different matter and had been the main critique point from the get to go as far as I can tell.

dathinab4y ago

Also given how many people use IPhones, how many pictures they have and how often they have many similar pictures, thinks are not necessary that simple.

I wouldn't be surprised if some flat, small height fully adult (e.g. 30) woman does some sexting and goes from 0 to >40 collisions in a month. Not because of arbitrary collisions but because the similarity some of here sexting pictures might have with the ones from a 14y old but older looking girl (which e.g. where forced and ended up in the database).

programmer_dude4y ago

Can this affect people who do not use Apple products?

nullc4y ago

As a non-apple user you could be impacted indirectly by people you know being directly impacted or by Apple's practices being imported into the law. E.g. laws that attempt to outlaw encryption lacking apple-like backdoors.

theshrike794y ago

No, how would it?

programmer_dude4y ago

Then why is it such a big deal on hackernews and elsewhere?

1 more reply

gok4y ago

The "catalog" has two entries.

floor_4y ago

Why not combine this with a second different hash?

E: Better yet, only run the second hash if you have a collision, which should be very rare.

yeldarb4y ago

Apple does, and I created a proof of concept for how it might work to guard against adversarially perturbed images here: https://blog.roboflow.com/apples-csam-neuralhash-collision/

theshrike794y ago

How do you know they aren't doing this on the backend after the initial on-device match?

Grustaf4y ago

This is of course entertaining, but since Apple has already tested for this, with 100 million images, and adjusted the rules accordingly, it has no practical implications.

hhsbz4y ago

Is it really a catalogue when there only are two of them?

I find it amusing that they probably ran this tool against a set of millions or even billions of images and this is the best they could come up with. They are practically praising Apple here lmao

j / k navigate · click thread line to collapse

294 comments

foxfluff4y ago

My take on this is that the system is by and large useless.

So what's left when all the criminals this is supposed to catch have figured it out?

False positives. Only false positives.

Is it really worth turning personal devices into snitches that don't even do a good job of protecting children?

tzs4y ago

> It won't catch anything but the dumbest of dumb criminals

BiteCode_dev4y ago

Given the average user don't know what a url is and a pedophile can use the darknet, I'd say criminals are not all dumb.

2 more replies

nullc4y ago

Yes, but when you admit that the target is just the dumb criminals, then why adopt a scheme that has false positives?

sabellito4y ago

> Dumb is a pretty accurate description of a large fraction of criminals.

Is there any reading on that? I'd love it to be true.

2 more replies

throwaway0a5e4y ago

You only hear about the criminals who get caught and crimes that go unsolved get blamed on these kinds of criminals.

prirun4y ago

> Is it really worth turning personal devices into snitches that don't even do a good job of protecting children?

panta4y ago

Exactly. It's a Trojan Horse (https://en.wikipedia.org/wiki/Trojan_Horse) to make more pervasive individual control the new normality. The current motivations are just a pretext.

Retric4y ago

woofie114y ago

I'd also appreciate if Apple let me know if my false positives were reviewed and found to not be CASM.

1 more reply

jdavis7034y ago

> Apple’s incentives are to minimize the number of times it happens, thus the requirement for multiple collisions.

How can we be sure they won’t cut costs by increasing worker load? I could see them giving each reviewer less time to review individual pictures before passing it on to law enforcement.

2 more replies

zionic4y ago

Apple's human review is largely useless.

Trolls will be able to easily use tools slightly modify ambiguous adult porn to collide with a "known CP hash".

A human reviewer will see a blurry grayscale derivative of adult pornographic content and hit "report" every time.

1 more reply

nullc4y ago

> Perceptual hashes are only used to reduce the search space for human review.

False. The Apple proposed system leaks the cryptographic keys needed to decode the images conditional on the match (threshold of matches) of the faulty neuralhash perceptual hash.

1 more reply

wyager4y ago

Edit: I incorrectly claimed there wasn’t manual review - see below

1 more reply

wpietri4y ago

woofie114y ago

I think it's only the dumb ones who get caught.

Source: I've met a few white collar criminals.

1 more reply

mayoff4y ago

> It won't catch anything but the dumbest of dumb criminals, because those who care about CSAM can surely figure out a better way to share images

Apparently that better way is by using Facebook. Facebook made 20.3 million reports to NCMEC in 2020.

https://www.missingkids.org/content/dam/missingkids/gethelp/...

foxfluff4y ago

Based on this, I wouldn't conclude that FB is the platform where people pedos go share their stash of child porn.

Their numbers also include Instagram, which I believe is quite popular among teenagers? I wonder how likely it is for teens' own selfies and group pics get flagged and reported to NCMEC.

(https://about.fb.com/news/2021/02/preventing-child-exploitat...)

nullc4y ago

> Facebook made 20.3 million reports to NCMEC in 2020.

Which appears to have resulted in what... 5 prosecutions?

UncleMeat4y ago

Given the reported numbers of illegal images detected by similar systems within Facebook and Google, I think it is very clear that this will catch a lot of illegal content.

zionic4y ago

Facebook and google are not catching 20m people a year, they're mostly flagging and removing tor/proxy-based throwaway accounts.

volta834y ago

The false positive rate reported in the blogpost for imagenet was 1 in a trillion, and the author concludes that this algorithm is better than they expected.

foxfluff4y ago

"After running the hashes against 100 million non-CSAM images, Apple found three false positives"

So closer to 1/10M. The reporting threshold is made artificially higher by requiring more than one positive.

But anyway, that's beside the point.

A perceptual hash is not uniformly distributed; it's not a random number. Likewise for photos taken in a specific setting; they do not approach the randomness of a set of random images.

But I can't say I care about false positives. To me the system is bad either way.

1 more reply

numbsafari4y ago

devmor4y ago

ak3914y ago

can try a web demo of it here on huggingface https://huggingface.co/spaces/akhaliq/AppleNeuralHash2ONNX

woofie114y ago

> False positives. Only false positives.

I really doubt this. In the long term, a few people Apple wants to frame will surely slip into the mix. If Apple didn't want Trump to win, a CASM flag a week before the election might do it.

user-the-name4y ago

> It won't catch anything but the dumbest of dumb criminals

This includes the vast majority of pedophiles.

rowanG0774y ago

Do you have any source that pedophilia correlates very strongly with low intelligence?

foxfluff4y ago

Where did you find the statistics about pedophiles' intelligence?

roody154y ago

Apple has yet to make a valid reason for implementing client side CSAM scanning.

According to Apple only images that will be uploaded to iCloud will be scanned.

If this is the case there is zero reason to scan locally and you can just scan the uploaded image once it is on the server.

Apple has not implemented E2E nor has it released a statement indicating this will be implemented in the future.

simondotau4y ago

Whereas if the CSAM scanning was performed exclusively in the cloud, protection under the 4th Amendment does not exist as it would likely fall under the third party doctrine.

jrockway4y ago

3 more replies

rz2k4y ago

Scanning makes phones a greater threat, and also erodes the expectation of privacy that is a legal barrier to surveillance.

1 more reply

jdavis7034y ago

What kind of non-CSAM crime could be detected with just a couple of hashes? Wouldn’t Apple need to reduce the similarity score in order to even get something close?

ok1234564y ago

Also Fifth Amendment. People are being compelled to testify against themselves by running this CSAM scanner. Apple's end-to-end encryption was sold to users as exactly that.

alisonkisk4y ago

Are you a lawyer or legal scholar, or just guessing?

rangerdan4y ago

There's no reason not to assume this isn't already happening, being closed source and proprietary. The question to ask is, what are we going to do about it?

1 more reply

wyager4y ago

This is already a warrantless search that’s effectively controlled by the government. Obviously there’s enough chaff in the air to prevent that from being legally useful in any way.

2 more replies

koolhaas4y ago

In summary, I’m guessing they tried to invent a way where their server software never has to decrypt and analyze original photos, so they stay encrypted at rest.

roody154y ago

Apple frequently decrypts icloud data including photos based on a valid warrant. This new local scanning method does not stop apple from complying and decrypting images like they have for years.

https://www.apple.com/legal/privacy/law-enforcement-guidelin...

1 more reply

grlass4y ago

calling resized thumbnails metadata is a bit of a stretch imo.

Surely that's just the data, but resized?

2 more replies

zabatuvajdka4y ago

Interesting technical problem/solution. Another benefit is saving on millions of server computations when modern iOS devices have neural chips etc.

I suppose folks who don’t like privacy implications can downgrade to an iPhone 4 and maybe it will not support the feature.

1 more reply

Grustaf4y ago

Most people feel that things that happen on your device are safer than things in the cloud, you have probably noticed how Apple constantly stress that this or that happens "on device".

And for the suspicious, it's of course much easier to notice if Apple would change their algorithms if they happen on device.

YetAnotherNick4y ago

Also they will be doing scanning in their server anyways as there are other ways to upload it in iCloud than using latest iOS.

user-the-name4y ago

mrweasel4y ago

One reason for client side could be to save on datacenter compute resources. That would seem like a perfectly valid reason, if that’s their reasoning.

AnonC4y ago

1 more reply

jtbayly4y ago

If that’s a valid reason to steal electricity and compute resources from your customers, then why not go the whole way and use all the Mac’s as storage and compute for iCloud?

websites20234y ago

> If this is the case there is zero reason to scan locally and you can just scan the uploaded image once it is on the server.

Would you rather test in your kitchen or your driveway?

addingnumbers4y ago

Why the contagion analogy?

If contagion wasn't a factor, I'd rather test in the kitchen, it's cozier.

Are you suggesting CSAM will infect more unwilling victims if it gets into a private iCloud account?

1 more reply

toxik4y ago

The system is specifically designed so that colliding images does not pose a threat to the user.

NeuralHash and the CSAM scanning is grotesque, but please, criticize it for what it is, not some bullshit that is easily dismissed as technical ignorance.

tsimionescu4y ago

Then let's get rid of the NeuralHash entirely, if it doesn't matter, right?

toxik4y ago

3 more replies

rgovostes4y ago

david_draco4y ago

> collisions are relatively common

Are they?

collaborative4y ago

We still need to rely on the automated "secret backend system" that nobody supposedly knows anything about

toxik4y ago

You (or at least Apple's customers) trust in and rely on Apple's proprietary software to do its job all the time. How is this different? I find this argument very weak.

5 more replies

saithound4y ago

Discussing the preimage attack on NeuralHash is not technical ignorance. Dismissing the preimage attack as irrelevant is.

2. But isn't Apple's second algorithm kept secret, making it difficult to perform preimage attacks against it? No.

[1] https://www.itnews.com.au/news/apple-to-only-seek-abuse-imag...

[2] https://en.wikipedia.org/wiki/Kerckhoffs%27s_principle

[3] https://news.ycombinator.com/item?id=28236102

[4] https://security.googleblog.com/2017/02/announcing-first-sha...

[5] https://www.zdnet.com/article/sha-1-collision-attacks-are-no...

simondotau4y ago

Seems to me like it'd be easier to just find actual child porn, print it out, place it somewhere in the victim's house and then report it to the police.

1 more reply

yeldarb4y ago

> allows the producers and distributors of CSAM material to ensure that nearly all of the next generation of CSAM will suffer from hash collisions with perfectly innocent images

That’s a really interesting attack vector I hadn’t seen mentioned previously.

I wonder what the most widely-saved pornographic images are across iCloud users.

1 more reply

nullc4y ago

(A threshold of) matchings result in the private keys for the images being leaked to Apple, where they're vulnerable to:

All of which compromise the privacy of the user. This matters or the neuralhash comparison wouldn't exist in the first place.

scotty794y ago

Why are exact collisions interesting? They are not intended to be compared exactly.

This algorithm doesn't even give exact matches for the same image on different hardware.

https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX

tgv4y ago

halflings4y ago

Clearly this is not a cryptographic hash, and hence it's known hashes are not uniformly distributed.

[0] https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni... [1] https://www.zdnet.com/article/apple-to-tune-csam-system-to-k...

1 more reply

GistNoesis4y ago

Afiu, Apple's NeuralHash uses exact collisions when they do their Private Set Intersection.

This is important because it alleviates the risks of an eventual leaking the database as Apple never touched and compared sensitive content but only cryptographic hashes of the perceptual hashes.

scotty794y ago

specialist4y ago

Accidental Tech Podcast - A Storm of Asterisks https://atp.fm/443

I haven't listened to the follow up episode yet.

I still have zero opinion on this photo scanning kerfuffle. I just don't know enough. Of all the "hot takes" on this issue, ATP's has been the most comprehensive. So appreciated.

pbronez4y ago

Thanks for the link. Daring Fireball’s analysis is the most helpful take I’ve seen on the topic so far:

https://daringfireball.net/2021/08/apple_child_safety_initia...

hackinthebochs4y ago

1 more reply

scotty794y ago

So there's a national database of all child pornography that Apple has access to? How is that access legal?

yeldarb4y ago

That is a good point; has Apple stated how many bits two images’ NeuralHashes can differ by and still be considered a “match” by their system?

cannabis_sam4y ago

> Why are exact collisions interesting?

1. What does “exact” mean to you in this context?

2. What else is more interesting about a hashing algorithm used to identify things, other than its collision rate?

scotty794y ago

Ad.1 Same value of each of 96 bits of the hash.

umanwizard4y ago

That explanation doesn’t make sense to me. Standard (IEEE) floating-point math is deterministic; it shouldn’t depend on the hardware.

user-the-name4y ago

The processors that implement hardware-accelerated neural nets are not necessarily using standard IEEE floating-point math.

nullc4y ago

> They are not intended to be compared exactly.

Apple's private set intersection which leaks the keys to decrypt the images coniditional a neuralhash match requires an exact match.

It would still be broken. :)

chefandy4y ago

... does it have to prove Apple wrong about something to be interesting?

ajb4y ago

monkeynotes4y ago

I think before a criminal investigation, or any investigation at all is pursued, a human verifying the images would dismiss the false positive.

I would think surreptitiously placing actual child porn on a rival's phone/computer would be much, much more effective.

Cybercriminals could likely do all this remotely. Phish for apple account login, upload images. Done.

3 more replies

Someone4y ago

If your enemy can get an image on your phone and in the “child porn DB”, I think they can easily get you in trouble without Apple’s help.

they can either just send the police an anonymous message or set up a child porn web site and have it ‘accidentally’ leak its password database, and make sure your email address is in it.

1 more reply

AnonC4y ago

A very relevant point on this entire discourse about Apple’s on-device CSAM scanning:

According to the U.S. law, key snippets of which are quoted on the Stratechery blog (by Ben Thompson), Apple isn’t obligated to scan for CSAM. It’s only obligated to act on CSAM if it finds them.

Everything else on trusting Apple’s NeuralHash or the sanctity of the NCMEC hashes come later, IMO.

fleddr4y ago

"While it’s good for Apple to scan on its systems (iCloud) like Facebook, Google and other companies do on their servers, it’s inappropriate to do it on individual devices"

Note that I'm talking about personal storage (iCloud, Gmail), not public social networks like Facebook.

hackinthebochs4y ago

Our major privacy blunder was accepting scanning of private data in any context. The fight should be for the absolute privacy of personal data. Where the scanning happens is mostly irrelevant.

1 more reply

Lamad1234y ago

They say only if they find 30 matching images, they'd act. So if they find 20 or 29 and don't report them, they are actually breaking the law!! I am wondering why they chose that magical number!

Tagbert4y ago

1 more reply

nullc4y ago

It's unclear if their claimed threshold of 30 is before or after the false positives they intentionally introduce. I'm going to guess it's before.

slownews454y ago

"This is a false-positive rate of 2 in 2 trillion image pairs (1,431,168^2)"

That is not bad. As a tool to filter down what apple human reviewers need to look at this is pretty good.

Ultimately these images will make it to a human reviewer who can make a call as they would in any flagging system.

Could a backend server side system do a more precise hash (96 bits is not a ton) prior to human review?

saithound4y ago

(reposted based on my comment on last week's thread [1])

[1] https://news.ycombinator.com/item?id=28236102

[2] https://blog.roboflow.com/neuralhash-collision/

[3] https://wizardofodds.com/gambling/house-edge/

[4] How to crack a linear congruential generator? http://www.reteam.org/papers/e59.pdf

SiempreViernes4y ago

Yeah, of course the collision rate in an adversarial dataset is likely to be much higher.

But I really wonder why you think this is an important objection, do you think a lot of people want to go to the "get flagged for child porn" casino?

2 more replies

saynay4y ago

1 more reply

willis9364y ago

grlass4y ago

This blogpost [1] by security researcher Sarah Jamie Lewis suggests that the false positive rate could be very high:

[1] https://pseudorandom.resistant.tech/obfuscated_apples.html

1 more reply

xadhominemx4y ago

They don’t take any action unless you have 30 matches in the database, which will not happen by chance.

2 more replies

ckastner4y ago

I wouldn't be surprised if 1,431,168 photo uploads is what iCloud sees in an hour.

nitrogen4y ago

The technology is not why the Apple system is unwanted. It's just extra fuel for the fire.

nullc4y ago

Absolutely. The problem is Apple introducing a spy into your home.

nobrains4y ago

mns4y ago

madeofpalk4y ago

it's not just internal policy - the safety vouchers will not decrypt (technically impossible) unless there are ~30 matches. It is a policy encoded in cryptography.

1 more reply

StrLght4y ago

I don't really get what this repository is trying to achieve and what's the point of collecting collisions. Collisions will happen, that's just how it is with hashes.

oauea4y ago

Are you fine with an apple employee looking at all your private pictures just because some hashing algorithm decided you're a pedophile? Personally, I'm not.

StrLght4y ago

First, this doesn't change my opinion on CSAM, I still consider the thing way too intrusive until Apple announces E2E for iCloud.

1 more reply

dhosek4y ago

I'd argue that the hash collisions (both natural and synthetic) that I've seen give me more confidence in the system, not less.

1. And that, of course, assumes that meme hosters don't apply similar scanning techniques to what they serve up.

nullc4y ago

> this is a sh*t picture in this meme and download something else.

NO. Adversarial preimages can be created that look like perfectly normal images. Please stop repeating this falsehood.

Here are some examples I generated (with a link to more):

https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issue...

1 more reply

woofie114y ago

No. Most proper cryptographic hash systems (e.g. used for verifying files, rather than data structures) never have collisions.

Try to find a SHA256 collision.

Anywhere, ever, in the history of mankind.

Hypothetical collisions of course exist, by the pigeonhole principle, just not in the real world.

StrLght4y ago

Yes, but cryptographic hashes are irrelevant here because they'd allow to easily bypass CSAM by modifying/appending a single byte.

2 more replies

mlajtos4y ago

If I found one collision by accident, would that be any significant?

1 more reply

yosito4y ago

> a catalog

Can two collisions really be called a catalog?

dathinab4y ago

It's a WIP catalog where everyone who stumbles over one can put it in.

It could in the future be used to e.g. improves this algorithms.

yeldarb4y ago

PRs welcome!

eesmith4y ago

A catalog of a thousand pages begins with the first entry.

nannal4y ago

And a story may start with the first word, but if I present the word "Octopus" and say check out my story, you're going to be well within bounds to question me on it.

2 more replies

supperburg4y ago

It would be a shame if thousands of people regularly uploaded hash collisions to their iCloud overwhelming apples human review capacity

theshadowknows4y ago

read_if_gay_4y ago

Governments don’t get to search your house because some people out there have CP at home. Why should your smartphone be different?

ryeguy_244y ago

This sums up the frustration very eloquently.

pille4y ago

tucosan4y ago

peteretep4y ago

teekert4y ago

That said, Nextcloud is my backend and I do not upload anything to iCloud (except for MS authenticator 2fa backups), so I'm safe right?

theshrike794y ago

So your threat model here is that the person at Apple tasked to check for Child Porn is an actual Paedophile and might accidentally see a false positive of your child's naked photo?

You do know that they don't see the whole photo at full megapixel resolution? They're just given "a visual derivative" of the photo for checking.

Also, you really think that the persons tasked with this process are just randos off the street and not vetted specifically?

teekert4y ago

Are you saying randos flock to this position?

And where do you get the "visual derivative" information? Apple sure didn't communicate that to me. All I know is some person may look at my pics at some point.

1 more reply

Grustaf4y ago

Since the risk of that is 1 in a trillion, a lot of people are quite happy to take that risk.

dathinab4y ago

It's not 1 in a trillion, it's MUCH higher.

We are not speaking about a situation where not a "arbitrary" picture is miss-classifieds.

We are speaking about a situation where a innocent picture involving a naked or not fully clothed child is deemed similar to a non innocent picture of a naked or not fully clothed child.

Let's list some simple reasons such a think could happen first:

- Photos meant for a doctor, or living partner to ask if something is normal or a problem. In many different ways.

- Photos of little children bathing or similar a e.g. dad sends to their mom who is currently on a business trip.

- etc.

2 more replies

teekert4y ago

1 more reply

nextlevelwizard4y ago

How are any of these "naturally occurring" when all (4) examples are things cut out of context on a white background.

Yeah two sticks (ski and nail) are visually similar on a white background. Why is this news to anyone?

EDIT: if you are going to downvote please leave a comment unless you are just downvoting for wrong think.

verygoodname4y ago

Are you sure that, if you take a picture of a naked body part, it won't collide with anything that looks similar in their database?

nextlevelwizard4y ago

It is unlikely that there is a collision of benign image with the database and even if that happens it is not some automatic process that just sends cops to your house to raid it.

1 more reply

Cyberdog4y ago

Isn't a hash collision from similar images the point of the whole thing?

ya3r4y ago

The more interesting technical question for me is: do collisions transfer across models? or how to find collisions that transfer across models?

erdos4d4y ago

tyingq4y ago

I guess don't upload pictures of peaches, poppy buds, phallic cacti, and so on to iCloud.

JoshTko4y ago

The threshold of collisions Apple is using before review is 40

dathinab4y ago

I think no one is anymore afraid of 40 accidental natural image collisions.

But un-natural image collisions or bad images in the database and similar are a different matter and had been the main critique point from the get to go as far as I can tell.

dathinab4y ago

Also given how many people use IPhones, how many pictures they have and how often they have many similar pictures, thinks are not necessary that simple.

programmer_dude4y ago

Can this affect people who do not use Apple products?

nullc4y ago

theshrike794y ago

No, how would it?

programmer_dude4y ago

Then why is it such a big deal on hackernews and elsewhere?

1 more reply

gok4y ago

The "catalog" has two entries.

floor_4y ago

Why not combine this with a second different hash?

E: Better yet, only run the second hash if you have a collision, which should be very rare.

yeldarb4y ago

Apple does, and I created a proof of concept for how it might work to guard against adversarially perturbed images here: https://blog.roboflow.com/apples-csam-neuralhash-collision/

theshrike794y ago

How do you know they aren't doing this on the backend after the initial on-device match?

Grustaf4y ago

This is of course entertaining, but since Apple has already tested for this, with 100 million images, and adjusted the rules accordingly, it has no practical implications.

hhsbz4y ago

Is it really a catalogue when there only are two of them?

I find it amusing that they probably ran this tool against a set of millions or even billions of images and this is the best they could come up with. They are practically praising Apple here lmao

j / k navigate · click thread line to collapse