> This is a false-positive rate of 2 in 2 trillion image pairs (1,431,168^2). Assuming the NCMEC database has more than 20,000 images, this represents a slightly higher rate than Apple had previously reported. But, assuming there are less than a million images in the dataset, it's probably in the right ballpark.
The number of images in that database could well be far in excess of a million. According to NCMEC [1], in 2020 65.4 million files that were reported to them, and "[s]ince the program inception in 2002, CVIP [child victim identification project] has reviewed more than 330 million images and videos."
Of course many of those were duplicate but it would be entirely unsurprised if there were more than a million original files.
The number of images in that database could well be far in excess of a million. According to NCMEC [1], in 2020 65.4 million files that were reported to them, and "[s]ince the program inception in 2002, CVIP [child victim identification project] has reviewed more than 330 million images and videos."
Of course many of those were duplicate but it would be entirely unsurprised if there were more than a million original files.
[1] https://www.missingkids.org/ourwork/impact