Euclidean distance in python

Calculating Euclidean Distance with NumPy

Euclidean distance is a fundamental distance metric pertaining to systems in Euclidean space.

Euclidean space is the classical geometrical space that you get familiar with in the Math class, typically bound to 3 dimensions. Though, it can also be prescribed to any non-negative integer dimension as well.

Euclidean distance is the shortest line between two points in Euclidean space.

The name comes from Euclid, who is widely recognized as «the father of geometry», as this was the only space people at the time would typically conceive of. Through time, different types of space have been observed in Physics and Mathematics, such as Affine space, and non-Euclidean spaces and geometry are very unintuitive for our cognitive perception.

In 3-dimensional Euclidean space, the shortest line between two points will always be a straight line between them, though this doesn’t hold for higher dimensions.

Given this fact, Euclidean distance isn’t always the most useful metric to keep track of when dealing with many dimensions, and we’ll focus on 2D and 3D Euclidean space to calculate the Euclidean distance.

Measuring distance for high-dimensional data is typically done with other distance metrics such as Manhattan distance.

Generally speaking, Euclidean distance has major usage in development of 3D worlds, as well as Machine Learning algorithms that include distance metrics, such as K-Nearest Neighbors. Typically, Euclidean distance willl represent how similar two data points are — assuming some clustering based on other data has already been performed.

Mathematical Formula

The mathematical formula for calculating the Euclidean distance between 2 points in 2D space:
$$
d(p,q) = \sqrt[2]
$$
The formula is easily adapted to 3D space, as well as any dimension:
$$
d(p,q) = \sqrt[2]
$$
The general formula can be simplified to:
$$
d(p,q) = \sqrt[2]
$$
A sharp eye may notice the similarity between Euclidean distance and Pythagoras’ Theorem:
$$
C^2 = A^2 + B^2
$$

There in fact is a relationship between these — Euclidean distance is calculated via Pythagoras’ Theorem, given the Cartesian coordinates of two points.

Because of this, Euclidean distance is sometimes known as Pythagoras’ distance, as well, though, the former name is much more well-known.

Note: The two points are vectors, but the output should be a scalar (which is the distance).

Читайте также:  What is java io tmpdir

We’ll be using NumPy to calculate this distance for two points, and the same approach is used for 2D and 3D spaces:

import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D fig = plt.figure() ax = fig.add_subplot(111, projection = '3d') ax.scatter(0, 0, 0) ax.scatter(3, 3, 3) plt.show() 

Calculating Euclidean Distance in Python with NumPy

First, we’ll need to install the NumPy library:

Now, let’s import it and set up our two points, with the Cartesian coordinates as (0, 0, 0) and (3, 3, 3):

import numpy as np # Initializing the points point_1 = np.array((0, 0, 0)) point_2 = np.array((3, 3, 3)) 

Now, instead of performing the calculation manually, let’s utilize the helper methods of NumPy to make this even easier!

np.sqrt() and np.sum()

The operations and mathematical functions required to calculate Euclidean Distance are pretty simple: addition, subtraction, as well as the square root function. Multiple additions can be replaced with a sum, as well:
$$
d(p,q) = \sqrt[2]
$$

NumPy provides us with a np.sqrt() function, representing the square root function, as well as a np.sum() function, which represents a sum. With these, calculating the Euclidean Distance in Python is simple and intuitive:

# Get the square of the difference of the 2 vectors square = np.square(point_1 - point_2) # Get the sum of the square sum_square = np.sum(square) 

This gives us a pretty simple result:

Which is equal to 27. All that’s left is to get the square root of that number:

# The last step is to get the square root and print the Euclidean distance distance = np.sqrt(sum_square) print(distance) 

In true Pythonic spirit, this can be shortened to just a single line:

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

distance = np.sqrt(np.sum(np.square(point_1 - point_2))) 

And you can even use the built-in pow() and sum() methods of the math module of Python instead, though they require you to hack around a bit with the input, which is conveniently abstracted using NumPy, as the pow() function only works with scalars (each element in the array individually), and accepts an argument — to which power you’re raising the number.

This approach, though, intuitively looks more like the formula we’ve used before:

from math import * distance = np.sqrt(sum(pow(a-b, 2) for a, b in zip(point_1, point_2))) print(distance) 

np.linalg.norm()

The np.linalg.norm() function represents a Mathematical norm. In essence, a norm of a vector is it’s length. This length doesn’t have to necessarily be the Euclidean distance, and can be other distances as well. Euclidean distance is the L2 norm of a vector (sometimes known as the Euclidean norm) and by default, the norm() function uses L2 — the ord parameter is set to 2.

Читайте также:  Тег IMG

If you were to set the ord parameter to some other value p, you’d calculate other p-norms. For instance, the L1 norm of a vector is the Manhattan distance!

With that in mind, we can use the np.linalg.norm() function to calculate the Euclidean distance easily, and much more cleanly than using other functions:

distance = np.linalg.norm(point_1-point_2) print(distance) 

This results in the L2/Euclidean distance being printed:

L2 normalization and L1 normalization are heavily used in Machine Learning to normalize input data.

np.dot()

We can also use a Dot Product to calculate the Euclidean distance. In Mathematics, the Dot Product is the result of multiplying two equal-length vectors and the result is a single number — a scalar value. Because of the return type, it’s sometimes also known as a «scalar product». This operation is often called the inner product for the two vectors.

To calculate the dot product between 2 vectors you can use the following formula:
$$
\vec

\cdot \vec =
$$

With NumPy, we can use the np.dot() function, passing in two vectors.

If we calculate a Dot Product of the difference between both points, with that same difference — we get a number that’s in a relationship with the Euclidean Distance between those two vectors. Extracting the square root of that number nets us the distance we’re searching for:

# Take the difference between the 2 points diff = point_1 - point_2 # Perform the dot product on the point with itself to get the sum of the squares sum_square = np.dot(diff, diff) # Get the square root of the result distance = np.sqrt(sum_square) print(distance) 

Of course, you can shorten this to a one-liner as well:

distance = np.sqrt(np.dot(point_1-point_2, point_1-point_2)) print(distance) 

Using the Built-In math.dist()

Python has its built-in method, in the math module, that calculates the distance between 2 points in 3d space. However, this only works with Python 3.8 or later.

math.dist() takes in two parameters, which are the two points, and returns the Euclidean distance between those points.

Note: Please note that the two points must have the same dimensions (i.e both in 2d or 3d space).

Now, to calculate the Euclidean Distance between these two points, we just chuck them into the dist() method:

import math distance = math.dist(point_1, point_2) print(distance) 

Conclusion

Euclidean distance is a fundamental distance metric pertaining to systems in Euclidean space.

Euclidean space is the classical geometrical space that you get familiar with in the Math class, typically bound to 3 dimensions. Though, it can also be prescribed to any non-negative integer dimension as well.

Euclidean distance is the shortest line between two points in Euclidean space.

The metric is used in many contexts within data mining, machine learning, and several other fields, and is one of the fundamental distance metrics.

Читайте также:  Welcome to Test!

Источник

sklearn.metrics.pairwise .euclidean_distances¶

Compute the distance matrix between each pair from a vector array X and Y.

For efficiency reasons, the euclidean distance between a pair of row vector x and y is computed as:

dist(x, y) = sqrt(dot(x, x) - 2 * dot(x, y) + dot(y, y)) 

This formulation has two advantages over other ways of computing distances. First, it is computationally efficient when dealing with sparse data. Second, if one argument varies but the other remains unchanged, then dot(x, x) and/or dot(y, y) can be pre-computed.

However, this is not the most precise way of doing this computation, because this equation potentially suffers from “catastrophic cancellation”. Also, the distance matrix returned by this function may not be exactly symmetric as required by, e.g., scipy.spatial.distance functions.

Parameters : X of shape (n_samples_X, n_features)

An array where each row is a sample and each column is a feature.

Y of shape (n_samples_Y, n_features), default=None

An array where each row is a sample and each column is a feature. If None , method uses Y=X .

Y_norm_squared array-like of shape (n_samples_Y,) or (n_samples_Y, 1) or (1, n_samples_Y), default=None

Pre-computed dot-products of vectors in Y (e.g., (Y**2).sum(axis=1) ) May be ignored in some cases, see the note below.

squared bool, default=False

Return squared Euclidean distances.

X_norm_squared array-like of shape (n_samples_X,) or (n_samples_X, 1) or (1, n_samples_X), default=None

Pre-computed dot-products of vectors in X (e.g., (X**2).sum(axis=1) ) May be ignored in some cases, see the note below.

Returns : distances ndarray of shape (n_samples_X, n_samples_Y)

Returns the distances between the row vectors of X and the row vectors of Y .

Distances between pairs of elements of X and Y.

To achieve a better accuracy, X_norm_squared and Y_norm_squared may be unused if they are passed as np.float32 .

>>> from sklearn.metrics.pairwise import euclidean_distances >>> X = [[0, 1], [1, 1]] >>> # distance between rows of X >>> euclidean_distances(X, X) array([[0., 1.], [1., 0.]]) >>> # get distance to origin >>> euclidean_distances(X, [[0, 0]]) array([[1. ], [1.41421356]]) 

Источник

Оцените статью