Python numpy cosine similarity

Cosine Similarity in Python – How to Calculate

Cosine similarity measures the similarity between two vectors of an inner product space by calculating the cosine of the angle between the two vectors.

Python Cosine similarity is one of the most widely used and powerful similarity measures.

Use dot() and norm() functions of python NumPy package to calculate Cosine Similarity in python.

Cosine Similarity Formula

For two vectors, A and B, the Cosine Similarity in Python is calculated as:

Cosine Similarity = ΣAiBi / (√ΣAi 2 √ΣBi 2 )

In this article, we will discuss how to calculate cosine similarity in python and cosine similarity examples.

Cool Tip: Learn how to calculate mean squared error (MSE) in python!

Using Numpy for Cosine Similarity

We will be using numpy library available in python to calculate cosine similarity between two vectors.

If you don’t have numpy library installed then use the below command on the windows command prompt for NumPy library installation

Let’s understand with examples about how to calculate Cosine similarity in python with given below python code

Calculate Cosine Similarity in Python

lets assume we have data as below;- x = [1, 1, 1, 1, 0, 0, 0, 0, 0] y = [0, 0, 1, 1, 1, 1, 0, 0, 0]

Using numpy.array()function we will create x & y arrays of the same length.

#import modules import numpy as np from numpy import dot from numpy.linalg import norm #define arrays x = np.array([1, 1, 1, 1, 0, 0, 0, 0, 0]) y = np.array([0, 0, 1, 1, 1, 1, 0, 0, 0]) #calculate Cosine Similarity python result = dot(x, y)/(norm(x)*norm(y)) print("The Cosine Similarity between two vectors is: ",result)

In the above code, we import numpy package to use dot() and norm() functions to calculate Cosine Similarity in python.

Using dot(x, y)/(norm(x)*norm(y)) , we calculate the cosine similarity between two vectors x & y in python.

The output of the above cosine similarity in python code :

//Output The Cosine Similarity between two vectors is: 0.5

Cool Tip: Check here article on how to calculate MAPE in python!

Calculate Cosine Similarity between arrays of same length in Python

In this example, we will calculate Python Cosine similarity between two randomly generated arrays of the same length in python with the given below code.

#import modules import numpy as np from numpy import dot from numpy.linalg import norm #define arrays x = np.random.randint(10, size=100) y = np.random.randint(10, size=100) # Calculate Cosine Similarity Python result = dot(x, y)/(norm(x)*norm(y)) print("The Cosine Similarity between two vectors is: ",result)

In the above code using numpy.random.randint() , we create two random arrays of size 100.

Читайте также:  Catalog controller product category php

Using dot(x, y)/(norm(x)*norm(y)) we calculate the cosine similarity between two vectors x & y in Python.

The output of the above cosine similarity in python code.

#output The Cosine Similarity between two vectors is: 0.6373168018459651

Cool Tip: Learn how to calculate SMAPE in python!

Calculate Cosine Similarity between arrays of different length in Python

In this example, we will calculate Cosine similarity Python between two randomly generated arrays of different lengths in python with the given below code.

import numpy as np from numpy import dot from numpy.linalg import norm #define arrays x = np.random.randint(10, size=90) #length=90 y = np.random.randint(10, size=100) #length=100 #calculate Cosine Similarity Python result = dot(x, y)/(norm(x)*norm(y)) print("The Cosine Similarity between two vectors is: ",result)

In the above code, it will raise the ValueError because the arrays are of different lengths.

Note:- We are not able to calculate the cosine similarity between the arrays of different lengths.

The Error of the above cosine similarity in python code is shown below.

#ERROR ValueError: shapes (90,) and (100,) not aligned: 90 (dim 0) != 100 (dim 0)

Cool Tip: Learn how to calculate Euclidean distance in python!

Conclusion

I hope, you may find how to calculate Cosine Similarity in python tutorial with step by step illustration of cosine similarity examples educational and helpful.

Using NumPy package in Python, cosine similarity can be calculated using dot() and norm() functions.

Источник

Sklearn Cosine Similarity : Implementation Step By Step

Modulenotfounderror no module named gensim ( Solved )

We can import sklearn cosine similarity function from sklearn.metrics.pairwise. It will calculate the cosine similarity between two NumPy arrays. In this article, We will implement cosine similarity step by step.

sklearn cosine similarity: Python –

Suppose you have two documents of different sizes. Now how you will compare both documents or find similarities between them? Cosine Similarity is a metric that allows you to measure the similarity of the documents.

Cosine Similarity formulae

The formulae for finding the cosine similarity is the below.

We will implement this function in various small steps. Let’s start.

Step 1: Importing package –

Firstly, In this step, We will import cosine_similarity module from sklearn.metrics.pairwise package. Here will also import NumPy module for array creation. Here is the syntax for this.

Step 2: Vector Creation –

Secondly, In order to demonstrate the cosine similarity function, we need vectors. Here vectors are NumPy arrays. Let’s create NumPy array.

Step 3: Cosine Similarity-

Finally, Once we have vectors, We can call cosine_similarity() by passing both vectors. It will calculate the cosine similarity between these two. It will be a value between [0,1]. If it is 0 then both vectors are completely different. But in the place of that, if it is 1, It will be completely similar.

Читайте также:  Css опустить div вниз

Complete code with output-

Let’s put the code from each step together. Here it is-

Here we have used two different vectors. After applying this function, We got a cosine similarity of around 0.45227 . Which signifies that it is not very similar and not very different. In Actually scenario, We use text embedding as NumPy vectors. We can use TF-IDF, Count vectorizer, FastText or bert, etc for an embedding generation.

Conclusion –

cosine similarity is one of the best ways to judge or measure the similarity between documents. Irrespective of the size, This similarity measurement tool works fine. We can also implement this without sklearn module. But It will be a more tedious task. Sklearn simplifies this. I hope this article, must have cleared implementation. Still, if you found, any information gaps. Please let us know. You may also comment below.

Frequently Asked Questions

1. What is cosine similarity?

Cosine similarity is used in information retrieval and text mining. It calculates the similarity between two vectors. If you have two documents and want to find the similarity between them you have to find the cosine angle between the two vectors to check similarity.

2. How does cosine similarity work?

Let’s say you have two documents. The best way to check whether both documents are similar or not is to find the cosine similarity between each document. Its value is between -1 and 1. If the value is 1 or close to 1 then both documents are the same and if it is close to -1 then the documents are not the same.

Suppose you have a large number of documents then using the information of cosine similarity you can cluster documents. This helps you to find relevant information about documents.

3. What are the applications of cosine similarity ?

There are many real life applications of cosine similarity. You can use it for information retrieval like similarity between documents. Below are the more examples on it.

1. Text Mining and Natural Language Processing (NLP)

2. Document Similarity and Plagiarism Detection

3. Image Similarity

4. Collaborative Filtering

5. Clustering and Dimensionality Reduction

6. Audio Analysis

Thanks
Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

We respect your privacy and take protecting it seriously

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Источник

Cosine Similarity in Python

Cosine similarity measures the angle between two non-zero vectors of an inner product space. In Python, the cosine similarity is calculated by taking the “dot” product of the vector and dividing it by the magnitude product of the vector. Python provides different modules, such as “scikit-learn”, “scipy”, etc., for calculating the cosine similarity of 1-D or 2-D vectors.

Various methods are used to calculate the cosine similarity in Python. Here are the methods for calculating cosine similarity:

Method 1: Using Numpy Module

The Numpy module provides a function “np.array()”, “np.dot()” and “norm()” to calculate the cosine similarity in Python. Let’s understand it by the following examples:

Читайте также:  Java hibernate find all

Example 1: Finding the Cosine Similarity of Two 1-D Vectors

In the below code, the cosine similarity between two “1-D” vectors is calculated using different functions of the Numpy module:

import numpy from numpy.linalg import norm v1 = numpy.array([2,1,2,3,2,9]) v2 = numpy.array([3,4,2,4,5,5]) print(numpy.dot(v1,v2)/(norm(v1)*norm(v2)))
  • The “np.array()” function is used to create the “1-D” vector by accepting the element of the list as an argument.
  • The “np.dot()” function accepts the vector “v1” and vector “v2” as an argument and returns the dot product.
  • The “norm()” function accepts the vector as an argument and returns the vector norm.
  • The dot product of two given vectors is divided by the multiplication of two vector norms to get the cosine similarity in Python.

The above output shows the cosine similarity of the input “1-D” vectors.

Example 2: Finding the Cosine Similarity of Two 2-D Vectors

In the below code, the cosine similarity between two “2-D” vectors is calculated using the Numpy module:

import numpy from numpy.linalg import norm v1 = numpy.array([[1,2,3],[3,2,1],[-2,1,-3]]) v2 = numpy.array([[4,2,4],[2,-2,5],[3,4,-4]]) print(numpy.sum(v1*v2, axis=1)/(norm(v1, axis=1)*norm(v2, axis=1)))
  • The “np.array()” function is used to create the “2-D” arrays vector.
  • The “np.sum()” function is used to add the element of the multiplicative array “v1” and “v2”.
  • Next, the returned value of the “np.sum()” function is divided by the norm vector product. This calculation will return the cosine similarity of the input vector in Python.

The above output shows the cosine similarity of the input “2-D” vectors.

Method 2: Using scikit-learn Module

The “scikit-learn” module provides the function “cosine_similarity()” to calculate the cosine similarity of the input vectors:

from sklearn.metrics.pairwise import cosine_similarity,cosine_distances import numpy v1 = numpy.array([22,34,56,78,97]) v2 = numpy.array([32,25,35,56,77]) print(cosine_similarity(v1.reshape(1,-1),v2.reshape(1,-1)))
  • The “np.array()” function is used to create the vectors.
  • The “cosine_similarity()” function is used to calculate the cosine similarity of input vectors “v1” and “v2”.

The above output shows the “cosine similarity” of two “1-D” vectors.

Method 3: Using scipy Module

The scipy module also provides a function “spatial.distance.cosine()” that is used for calculating the cosine similarity in Python:

import numpy from scipy import spatial v1 = numpy.array([22,34,56,78,97]) v2 = numpy.array([32,25,35,56,77]) cosim = spatial.distance.cosine(v1, v2) print(1 - cosim)
  • The “spatial.distance.cosine()” function is used to calculate the cosine similarity by taking the vector as an argument.
  • The final calculation returned by “spatial.distance.cosine()” is subtracted from “1”.

The above output shows the “cosine similarity” calculation using the “1-spatial.distance.cosine()” method.

Conclusion

To calculate the cosine similarity, the “Numpy” module functions, the “scipy” module function, and the “scikit-learn” module function are used in Python. The “Numpy” module provides functions such as “np.dot()” and “norm()” to calculate the dot and norm of the vector and then perform some calculations to calculate the cosine similarity. The “spatial.distance.cosine()” function of the “scipy” module is also used to calculate the cosine similarity. This Python guide presented a thorough guide on how to get the cosine similarity.

Источник

Оцените статью