The distance is a measure per se associated with a set to form a metric space on which the respective distance makes sense. Typically, one considers the Euclidean space, however there can be defined mappings into any type of physical objects.

Probably the most important aspect on picking a metric is data’s type (numeric, categorical, binary), the space considered, respectively mod or algorithms’ intrinsic characteristics.

Except the metrics which by definition limit the range of their image, all the other are sensitive to scale variations. The need for normalizing the data is not necessarily a downside.

The definition of distance works in 3-dimensions as well in n-dimensions however the higher dimensions and further concepts like hyperspace can be more difficult to grasp and visualize.

One can modify the cosine similarity to consider vector’s length. I suppose the cosine similarity has more to do with geometric rotations.

For Hamming distance the vectors can be easily scaled to the maximum size (0 added upfront).

Chebyshev distance is probably a better approximation for the cases in which one dimension drives the show, while the others can be neglected.

Haversine reflects the distance between two points on a geodesic, which is also a line from a mathematical standpoint. Even if the earth is not a sphere, a sphere can be upon case a good approximation. The models built with the various metrics are in the end approximations as well.

--

--

--

IT professional/blogger with more than 19 years experience in IT - Software Engineering, BI & Analytics, Data, Project, Quality, Database & Knowledge Management

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adrian

Adrian

IT professional/blogger with more than 19 years experience in IT - Software Engineering, BI & Analytics, Data, Project, Quality, Database & Knowledge Management

More from Medium

Cerner EMR Review

Cerner EMR

Sparkify- Churn prediction

The Berkeley Polling Project

What The Hell is “Hermeneutic Phenomenology”?