> Do you have any idea that means? No, I don't know what that means. Presumably ...

joe_the_user · on Aug 13, 2021

I assume they've tested NeuralHash on big datasets of innocuous pictures, and gotten some sort of bound on the probability of false positives p, and then chosen N such that p^N << 10^-12

What's interesting about this faulty argument is that it hinges an assumption that "innocuous pictures" is a well defined space that you can use for testing and get reliable predictions from.

A neural network does classification by drawing a complex curve between one large set and another large set on a high dimensional feature space. The problem is those features can include, often include, incidental things like lighting, subject placement and so-forth. And this often work because your target data set really does uniquely have feature X. So you can get a result that your system can reliably find X but when you go out to the real world, you find those incidental features.

I don't know exactly how the NeuralHash works but I'd presume it has the same fundamental limitations. It has to find images even they've been put through easy filters that are going to change every particular pixel so it's hard to see how it wouldn't find picture A that looking like picture B if you squint.