Numpy correlation confusion [duplicate]


February 2019


2.4k time


This question already has an answer here:

I want to compute the correlation between 2 arrays. For this, I would like to use NumPy.

I used the numpy.correlate function on a small example:

import numpy as np

a = [1, 2, 3]
b = [2, 3, 4]

np.correlate(a, b)
>>> np.array([20])

I don't really know how to interpret that result. What I would like to have is a number between -1 and 1 to indicate the correlation, with 1 meaning the arrays are positively related and -1 meaning the arrays are negatively related.

How can I get this number?

1 answers


You're using the wrong function. You're looking for numpy.corrcoef, which actually calculates the correlation coefficient.

a = [1, 2, 3]
b = [2, 3, 4]

>>> np.corrcoef(a, b)
array([[ 1.,  1.],
       [ 1.,  1.]])

As mentioned by Hooked, this returns a matrix of values from the covariance matrix.

Should you want the Pearson correlation coefficient, you can use pearsonr from scipy.stats.stats. Hooked's answer here is a proper implementation of this method.