Can you explain how these theoretical political memes hash-match to an image in the NCMEC database, and then also pass the visual check?
> "No, this misses the point completely. You cannot easily trigger any automated systems merely by taking photos of 17.9 year olds and sending them to people."
Did I say "taking"? I am talking about sending (theoretical) actual images from the NCMEC database. This is functionally identical to the "attack" you describe.
Yes, I can. This is just one possible strategy: there are many others, where different things are done, and where things are done in a different order.
You use the collider [1] and one of the many scaling attacks ([2] [3] [4], just the ones linked in this thread) to create an image that matches the hash of a reasonably fresh CSAM image currently circulating on the Internet, and resizes to some legal sexual or violent image. Note that knowing such a hash and having such an image are both perfectly legal. Moreover, since the resizing (the creation of the visual derivative) is done on the client, you can tailor your scaling attack to the specific resampling algorithm.
Eventually, someone will make a CyberTipline report about the actual CSAM image whose hash you used, and the image (being a genuine CSAM image) will make its way into the NCMEC hash database. You will even be able to tell precisely when this happens, since you have the client-side half of the PST database, and you can execute the NeuralHash algorithm.
You can start circulating the meme before or after this step. Repeat until you have circulated enough photos to make sure that many people in the targeted group have exceeded the threshold.
Note that the memes will trigger automated CSAM matches, and pass the Apple employee's visual inspection: due to the safety voucher system, Apple will not inspect the full-size images at all, and they will have no way of telling that the NeuralHash is a false positive.
Okay, perhaps the three thumbnails was unclear. I didn't mean to illustrate any specific attack with it, just to convey the feeling of why it's difficult to tell apart legal and potentially illegal content based on thumbnails (i.e. why a reviewer would have to click "possible CSAM" even if the thumbnail looks like "vanilla" sexual or violent content that probably depicts adults). I'd splice in a sentence to clarify this, but I can't edit that particular comment anymore.
Ok yeah, I do agree this scaling attack potentially makes this feasible, if it essentially allows you to present a completely different image to the reviewer as to the user. Has anyone done this yet? i.e. an image that NeuralHashes to a target hash, and also scale-attacks to a target image, but looks completely different.
(Perhaps I misunderstood your original post, but this seems to be a completely different scenario to the one you originally described with reference to the three thumbnails)
This attack doesn’t work. If the resized image doesn’t match the CSAM image your NeuralHash mimicked, then when Apple runs it’s private perceptual hash, the hash value won’t match the expected value and it will be ignored without any human looking at it.
We have no reason to believe that Apple's second, secret perceptual hash provides any meaningful protection against such attacks. At best, we can hope that it'll allow early detection of attacks in a few cases, but chances are that's the best it can do. We might not ever learn: Apple now has a very strong incentive not to admit to any evidence of abuse or to any faults in their algorithm.
(Sorry, this is going to be long. I know understand most/all of this stuff, it's mostly there to provide a bit of context for the users reading our exchange)
The term "hash function" is a bit of a misnomer here. When people hear "hash", they tend to think about cryptographic hash functions, such as SHA256 or BLAKE3. When two messages have the same hash value, we say that they collide. Fortunately, cryptographic hash functions have several good properties associated with them: for example, there is no known way to generate a message that yields a given predetermined hash value, no known way to find two different messages with the same hash value, and no known way to make a small change to a message without changing the corresponding hash value. These properties make cryptographic hash functions secure, trustworthy and collision-resistant even in the face of powerful adversaries. Generally, when you decide to use two unrelated cryptographic hash algorithms instead of one, executing a preimage attacks against both hashes becomes much more difficult for the adversary.
However, as you know, the hash functions that Apple uses for identifying CSAM images are not "cryptographic hash functions" at all. They are "perceptual hash functions". The purpose of a perceptual hash is the exact opposite of a cryptographic hash: two images that humans see/hear/perceive (hence the term perceptual) to be the same or similar should have the same perceptual hash. There is no known perceptual hash function that remains secure and trustworthy in any sense in the face of (even unsophisticated) adversaries. In particular, preimage attacks against perceptual hashes are very easy, compared to the same attacks against cryptographic hashes.
Using two unrelated cryptographic hashes meaningfully increases resistance to collision and preimage attacks. Using ROT13 twice does not increase security in any meaningful sense. Using two perceptual hashes, while not as bad, is still much closer to the "using ROT13 twice for added security" than to the "using multiple cryptographic hashes" end.
Finding a SHA1 collision took 22 years, and there are still no effective preimage attacks against it. Creating the NeuralHash collider took a single week. More importantly, even if you were to use two unrelated perceptual hash functions, executing a preimage attacks against both hashes need not become much more difficult for the adversary: easy * easy is still easy. Layering cryptography upon cryptography is meaningful, but only as long as one of the layers is actually difficult to attack. This is not the case for perceptual hashes. In fact, in many similar contexts, these adversarial attacks tend to transfer: if they work against one technique or model, they often work against other models as well [3]. In the attack discussed above, the adversary has nearly full control over the "visual derivative", so even a very unsophisticated adversary can subject the target thumbnail itself to the collider before performing the resizing attack, and hope that it transfers against the second hash. If the second hash is a variant of NeuralHash (somewhat likely, it could even be NeuralHash performed on the thumbnail itself; we don't know anything about it!), or if it's a ML model trained on the same or similar datasets (quite likely), or if it's one of the known algorithms (say PhotoDNA) then some amount of transfer is likely to happen. And given an adversary that is going to distribute a large number of photos anyway, a 10% success rate is more than enough. Given the diminished state space (fixed size thumbnails, almost certainly smaller than 64x64 for legal reasons), a 10% success rate is completely plausible even with these naive approaches. An adversary that has some (even very little information) about the second hash algorithm can do much more sophisticated stuff, and perform much better.
But what if we boldly rule out all transfer results? Doesn't Apple keep their algorithm secret?! Can we think of the weights (coefficients) of the second perceptual hash as some kind of secret key in the cryptographical sense? Alas, no. Apple would have to make sure that all the outputs of the secret perceptual hash are kept secret as well. Due to the way perceptual hashing algorithms work, they provide a natural training gradient having access to sufficiently many input-outputs examples is probably enough to train a high-fidelity "clone" that allows one to generate adversarial examples and perform successful preimage attacks even if the weights of the clone are completely different from the secret weights of the original network. This can be done with standard black box techniques [4]. It's much harder (but nowhere near crypto hard, still perfectly plausible) to pull this off when they have access to one bit of output (match or no match). A single compromised Apple employee can gather enough data to do this given the ability to observe some inputs and outputs, even if said employee has no access to the innards or the magic numbers. The hash algorithm is kept secret because if it wasn't, an attack would be completely trivial: but an adversary does not need to learn this secret to mount an effective attack.
These are just two scenarios. There are many others. "Nobody has ever demonstrated such an attack working end-to-end" is not a good defense: it's been two weeks since the system was rolled out, and once an attack is executed, we probably won't learn about it for years to come. But the attacker can be rewarded way before "due process" kicks in: e.g. if a victim ever gets a job where they need to obtain a security clearance, the Background Investigation Process will reveal their "digital footprint", almost certainly including the fact that the NCMEC got a report about them, even if the FBI never followed up on it. That will prevent them from being granted interim determination, and will probably lead to them being denied a security clearance. If you pull off this attack on your political opponents, you can prevent them from getting government jobs, possibly without them ever learning why. And again, this is one single proposed attack. There were at least 6 different attacks proposed by regular HN users in the recent threads!
As a more general observation, cryptography tends to be resistant to attacks only if one can say things such as "the adversary cannot be successful unless they know some piece of information k, and we have very good mathematical reasons (e.g. computational hardness) to believe that they can't learn k". The technology is flawed: even the state-of-the-art in perceptual hashes does not satisfy this criterion. Currently, they are at best technicool gadgets, but layering technicool upon technicool cannot make their system more secure.And Apple's system is a high-profile target if there ever was one.
Barring a major breakthrough in perceptual hashing (one that Apple decided to keep secret and leave out of both whitepapers), the claim that the secret second hash will prevent collision attacks is not justified. The chances of such a secret breakthrough are very slim: it'd be like learning that SpaceX has already built a base on the Moon and has been doing regular supply runs with secret spaceships. Vaguely plausible in theory (SpaceX has people who do rocketry, Apple has people who do cybersecurity), but vanishingly unlikely in practice.
And that's before we mention that the mere existence of the collider made the entire exercise completely pointless: the real pedos can now use the collider to effectively anonymize their CSAM drops, making sure that all of their content collides with innocnent photos, and ensuring that none of the images will be picked up by NeuralHash anyway. For all practical purposes, Apple's CSAM detection is now _only_ an attack vector, and nothing else.
The first half of your post is predicated on it being likely the noise added to generate hash A using the NeuralHash is likely to produce a specific hash B with some unknown perceptual hashing function (which they specifically call out [1] as independent of the NeuralHash function precisely because they don’t want to make this easy, so speculating it might be the NeuralHash run again is incorrect). Hash A is generated via thousands of iterations of an optimization function, guessing and checking to produce a 12 bit number. What shows that same noise would produce an identical match when run through a completely different hashing function that is designed very differently specifically to avoid these attacks? Just one bit of difference will prevent a match. Nothing you’ve linked to would show any likelihood of that being anywhere close to 10 percent.
For the second part, yes if an Apple engineer (that had access to this code) leaked the internal hash function they used or a bunch of example image’s to hash values, that would allow these adversarial attacks.
Until you can show an example or paper where the same adversarial image generates a specific hash value for two unrelated perceptual hash functions, with one being hidden, it is not right to predict a high likelihood of that first scenario being possible.
Here’s a thought exercise, how long would it have taken researches to generate a hash collision with that dog image if the NeuralHash wasn’t public and you received no immediate feedback that you were “right” or getting closer along the way?
> Until you can show an example or paper where the same adversarial image generates a specific hash value for two unrelated perceptual hash functions, with one being hidden, it is not right to predict a high likelihood of that first scenario being possible.
"There is no paper attacking ROT13 done twice, therefore it must be secure". Usually, it's on the one proposing the protocol to make a case for its security. Doubly so when it's supposed to last a long time, a lot of people are interested in attacking it, and successful attacks can put people in harm's way.
You know what, if you think that this is difficult, feel free to pick an existing perceptual hash function H, cough up some money, and we'll announce a modest prize (say $4000) on HN for the first person to have a working collision attack for NeuralHash+H. H will run on a scaled-down thumbnail, and we'll keep the precise identity of the algorithm secret. If the challenge gets any traction, but nobody succeeds within 40 days, I'll pay you $4000 for your effort. If you're right, this should be easy money. (cf SHA1, which lasted 22 years)
Heck If Apple claims that this is difficult (afaict they don't, it would be unwise), they might even join in with their own preimage challenge for $$$. It'd be a no-brainer, a simple and cheap way of generating good publicity.
They claim their H is resistant to adversarial attacks, so they are claiming this to be difficult.
If I took an exact public perceptual hash function implementation and used that as H in your contest, it might be possible for a researcher attacking all public perceptual hash functions to stumble on the right one within 40 days.
I agree with you that we are trusting Apple to implement this competently. This isn’t something that can be proved to work mathematically where nothing about the implementation has to be kept secret.
So worse case everything you say could come true but to imply that is likely is wrong.
This leaves open the question of how the image gets on the device of the victim. You would have to craft a very specific image that the victim is likely to save, and the existence of such a specially crafted file would completely exonerate them.
2. Generate an objectionable image with the same hash as the target's photo. (This is obviously illegal.)
3. Submit the objectionable image to the government database.
Now the target's photo will be flagged until manually reviewed.
This doesn't sound impossible as a targeted attack, and if done on a handful of images that millions of people might have saved (popular memes?) it might even grind the manual reviews to a halt. But maybe I'm not understanding something in this (very bad idea) system.
This requires the attacker handling CSAM which defeats the benefit. The risk in all cases is anytime you actually handle CSAM then the attack is void since you're now actually guilty of the crime and have to do it (very few will cross that line).
The point though is that this is something someone's Apple phone is doing, that their device is not. So the goal is to send a hash collided images by non-Apple channels (email) where there is a reasonably good chance that image would make it's way into someone's global device photo store and into automatic iCloud uploads.
Sending an MMS would work, for example, or a picture to Signal which then someone saves to outside of Signal (a meme).
In all these cases, the original sender doesn't have an Apple device: so they're not getting scanned by the same algorithm, but more importantly their device is not spying on them. Importantly too: they've done nothing illegal.
But: the victim is getting flagged by their own device. And the victim has to have their device seized and analysed to determine (1) that it's not CSAM, (2) that they were sent those images that flagged and aren't trying to divert attention by getting themselves false pinged upfront, but then (3) the sender has committed no crime. There's no reason or even risk to investigate them, because by the time the victim has dealt with law enforcement, it's been established that no one had anything illegal.
It's the digital equivalent of a sock of cat litter testing positive as being methamphetamine, except if it was your drive through McDonald's order.
The goal is not to get convictions, the goal is harrassment.
Perhaps that's true in the narrowest sense, but aren't the odds of generating a colliding file so low as to all but rule out coincidence and therefore strongly indicate premeditated cyber-attack (which is illegal)?
If I were law enforcement, at the very least I'd want to keep tabs on these sources of false positives. Probably easy enough to convince a judge that someone capable of the "tech wizardry" to collide a hash can un-collide one too, and therefore more thorough/invasive search warrants of the source are justified.
Your argument is "the technology is flawed, there let's also arrest anyone who we suspect of generating false positives".
Like security researchers. Or the people currently inspecting the algorithm. And also frankly what are you going to do about overseas adversaries? The most likely people looking at how to exploit this would explicitly be state-sponsored Russian hackers - this is right up the alley of their desire to be able to cause low level chaos without committing to a serious attack.
And at the end of the day you've still succeeded: the point is that by the time you've established it was spurious, the target has already been through the legal wringer. The legal wringer is the point.
> "No, this misses the point completely. You cannot easily trigger any automated systems merely by taking photos of 17.9 year olds and sending them to people."
Did I say "taking"? I am talking about sending (theoretical) actual images from the NCMEC database. This is functionally identical to the "attack" you describe.