Similarity / Dissimilarity

We’ll mix it up this time. Correct: Not I, but We. Because you’re going to get involved in this one. One simple question, many answers. Take your time to think about it before reading on; here comes the question:

Are the following pictures similar?

Let’s start with the basics: They are digital pictures shown on a website, saved in the same digital format. Not known to you, they also have the same number of pixels along the long side. To be exact, 1000 pixels as all pictures here in order to not occupy too much space (and to not get stolen – but who will steal them anyway…). Color information is stored in Adobe RGB color space, however, since all are black and white, the information for red, green, and blue is identical for every pixel anyway.

Not so fast, you will intervene. And you are right in doing so: While the long side is always 1000 pixels, they do not share the same aspect ratio, nor the same orientation. And it won’t be a stretch to claim that between every two pictures no single pixel is identical. So already after these first simple investigations we see a problem emerging: They are similar to a specific degree, identical in some respects, but disparate with respect to other criteria.

Let’s dive deeper: Are the images we see here the actual images, let alone do they inherit or show some of our reality? First of all, they do not show the full data gathered, since the original pictures are much larger, also encode color, and a variety of additional information in their RAW format. And in any case, they are just some arbitrary representation of reality without any real connection to it. One of endless possible portrayals of reality. While this does not directly touch on the original question, it is important to keep this in mind when we search and interpret their similarity.

Lastly, how do the overall pictures appear? Even if the single pixels are different between all images, combined they create patterns that can be alike. The pixels combine to a variety of forms, which, in turn, are received differently by different viewers. Waves, scales, oscillations, geometrical forms. And are they really creating these patterns or does the viewer infer them? Can we infer different patterns from the same picture?

I could go on for a while, but it’s getting too long. Let’s move on to the second question: Now, we also need to quantify the difference between every pair of pictures. On a scale from 0 to 100, how different are these two?

And what about these?

I think you are getting my point. We can create an endless list of metrics and choose what we think is best. We can apply these metrics to these simple images, or we can gather more data, larger pictures, RAW data, and then apply the metrics. We can weight and combine metrics to generate an overall score of similarity, we can try to assess how it performs in comparison to other scores. We can compare pairs of pictures and create a hierarchy of similarity. But it will never be the same when done by different people. And in the end, it’s quite arbitrary. Do we look at pixels, color, form, format, derived patterns, povoked emotions?

Most of the day I am doing such arbitrary comparisons. Not between images, but between DNA strands. Instead of pixels I am looking at sequences of A, C, G, and T. Depending on the chosen metric, a variety of results emerges. There is no correct metric, no correct similarity measure. There is no correct way to describe reality, neither to analyze and exploit it. There is an infinite number and every single one creates another distinct result.

But fortunately, in the end, it somehow seems to work – at least sometimes, when it solves a problem in biology research or medicine; but most of the time I don’t get how.

