Adrian
1 min readFeb 14, 2021

--

The distance is a measure per se associated with a set to form a metric space on which the respective distance makes sense. Typically, one considers the Euclidean space, however there can be defined mappings into any type of physical objects.

Probably the most important aspect on picking a metric is data’s type (numeric, categorical, binary), the space considered, respectively mod or algorithms’ intrinsic characteristics.

Except the metrics which by definition limit the range of their image, all the other are sensitive to scale variations. The need for normalizing the data is not necessarily a downside.

The definition of distance works in 3-dimensions as well in n-dimensions however the higher dimensions and further concepts like hyperspace can be more difficult to grasp and visualize.

One can modify the cosine similarity to consider vector’s length. I suppose the cosine similarity has more to do with geometric rotations.

For Hamming distance the vectors can be easily scaled to the maximum size (0 added upfront).

Chebyshev distance is probably a better approximation for the cases in which one dimension drives the show, while the others can be neglected.

Haversine reflects the distance between two points on a geodesic, which is also a line from a mathematical standpoint. Even if the earth is not a sphere, a sphere can be upon case a good approximation. The models built with the various metrics are in the end approximations as well.

--

--

Adrian
Adrian

Written by Adrian

IT professional/blogger with more than 24 years experience in IT - Software Engineering, BI & Analytics, Data, Project, Quality, Database & Knowledge Management