For who knows why, I started thinking about image hashes a week ago. MD5, the same IB uses, are good for checking the integrity of a file, but if you cut even a pixel out, it treats it as a whole new file... That’s not good for images. Ideally, you’d want the computer to determine similiarity between two images from the shapes and colors, not individual bits.
And it’s fucking HARD. Fuzzy logic is way beyond my coding skills. But I had to try.
My idea was to basically scale pictures into 8*8 grids and create a hash from there. I start from the top left corner and move through each pixel of each RGB channel asking this question: is this channel pixel lighter or darker than the one before?
So I get a number of bits (00110110..) for each channel, and eventually I get a numerical string representing the overall changes of each channel.
Something like: 8426430550391263784-7825201099380511380-7825728728327139888 for my latest “Snapshot”... Red-Green-Blue in three 64bit tuples.
I found that I could scale a photo and get a near identical match. (0-3 bits off) Wholly different pictures had on average about 20-30 bits off. (Snapshot & Jätte bra had 25 bits off)
What’s good about this approach is that it’s relatively simple to calculate, and that it’s relative to the image itself. What’s still bad about it is that the result depends on the scaling function, which probably isn’t universal. Ideally I’d have calculated the average sum of pixels within region, but for testing purposes I let the imaging library deal with the interpolation.
Bugh.. Head spinning.
You can download the .py here. You'll need the PIL imaging library. Just drag and drop image on script, and it'll calculate a hash. Drag two and it'll calculate the difference.