undefined | Better HN

0 pointsceleritascelery4y ago0 comments

Test it… how exactly? This is detecting illegal material that they can’t use to test against.

0 comments

> This is detecting illegal material that they can’t use to test against.

But they can because they're matching the hashes to the ones provided by NCMEC, not directly against CSAM itself (which presumably stays under some kind of lock and key at NCMEC.)

Same as you can test whether you get false positives against a bunch of MD5 hashes that Fred provides without knowing the contents of his documents.

bryanrasmussen4y ago

Not knowing anything about it but I suppose various governmental agencies maintain corpora of nasty stuff and that you can say to them - hey we want to roll out anti-nasty stuff functionality in our service therefore we need access to corpora to test at which point there is probably a pretty involved process that requires governmental access also to make sure things work and are not misused otherwise -

how does anyone ever actually fight the nasty stuff? This problem structure of how do I catch examples of A if examples of A are illegal must apply in many places and ways.

vineyardmike4y ago

Test it against innocent data sets, then in prod swap it for the opaque gov db of nasty stuff and hope the gov was honest about what is in it :)

They don't need to train a model to detect the actual data set. They need to train a model to follow a pre-defined algo

1 more reply

ben_w4y ago

While I don’t have any inside knowledge at all, I would expect a company as big as Apple to be able to ask law enforcement to run Apple’s algorithm on data sets Apple themselves don’t have access to and report the result.

No idea if they did (or will), but I do expect it’s possible.

zimpenfish4y ago

> ask law enforcement to run Apple’s algorithm on data sets Apple themselves don’t have access to

Sounds like that's what they did since they say they're matching against hashes provided by NCMEC generated from their 200k CSAM corpus.

[edit: Ah, in the PDF someone else linked, "First, Apple receives the NeuralHashes corresponding to known CSAM from the above child-safety organizations."]

IfOnlyYouKnew4y ago

They want to avoid false powitives, so you would test for that by running it over innocuous photos, anyway.

j / k navigate · click thread line to collapse

0 comments

zimpenfish4y ago

> This is detecting illegal material that they can’t use to test against.

But they can because they're matching the hashes to the ones provided by NCMEC, not directly against CSAM itself (which presumably stays under some kind of lock and key at NCMEC.)

Same as you can test whether you get false positives against a bunch of MD5 hashes that Fred provides without knowing the contents of his documents.

bryanrasmussen4y ago

how does anyone ever actually fight the nasty stuff? This problem structure of how do I catch examples of A if examples of A are illegal must apply in many places and ways.

vineyardmike4y ago

Test it against innocent data sets, then in prod swap it for the opaque gov db of nasty stuff and hope the gov was honest about what is in it :)

They don't need to train a model to detect the actual data set. They need to train a model to follow a pre-defined algo

1 more reply

ben_w4y ago

No idea if they did (or will), but I do expect it’s possible.

zimpenfish4y ago

> ask law enforcement to run Apple’s algorithm on data sets Apple themselves don’t have access to

Sounds like that's what they did since they say they're matching against hashes provided by NCMEC generated from their 200k CSAM corpus.

[edit: Ah, in the PDF someone else linked, "First, Apple receives the NeuralHashes corresponding to known CSAM from the above child-safety organizations."]

IfOnlyYouKnew4y ago

They want to avoid false powitives, so you would test for that by running it over innocuous photos, anyway.

j / k navigate · click thread line to collapse