Imagine that you have a matrix `X` with dimensions `(N x k)` and a vector `y` of size `(1 x k)`, and you want to calculate the linear correlation coefficient between `y` and each row of `X` to obtain a vector of correlation coefficients, `r`, of size `(N x 1)` .

The first way you might consider doing this is in a loop with the `np.corrcoef` function, which returns the linear correlation coefficient between two vectors:

r = np.zeros((N,1))
for i in xrange(0,N):
r[k] = np.corrcoef(X[i,:], y)[0,1]

If `N` is large, or if you need to perform this calculation many times (with an outer loop wrapped around it), this will be very slow. We can time it as follows:

for t in xrange(0,10): # average 10 timings
start = time.time()
r = np.zeros((10000,1))
for k in xrange(0,10000):
r[k] = np.corrcoef(vic_runoff[k,:], obs_runoff)[0,1]
end = time.time()
times[t] = end-start
print np.mean(times)

In this example with `N = 10000`, the average time is **0.48 seconds** per calculation of the `r` vector. Too slow!

How can we speed this up? Some NumPy functions, such as `sum` and `mean`, have an `axis` argument to perform the operation along a particular dimension, but `corrcoef` does not (presumably because people often want the correlations between the rows of the matrix `X`, rather than the correlation of all rows with respect to a single vector). Recall the formula for the sample correlation coefficient between 1D vectors `x` and `y`:

linear correlation coefficient (from Wikipedia)

We can write a function using NumPy’s vectorized arithmetic to compute these values all at once rather than in a loop. For example, `np.multiply(X,y)` (also given by `X*y`) performs element-wise multiplication of the vector `y` over all rows of the matrix `X`. The function might look something like this:

def vcorrcoef(X,y):
Xm = np.reshape(np.mean(X,axis=1),(X.shape[0],1))
ym = np.mean(y)
r_num = np.sum((X-Xm)*(y-ym),axis=1)
r_den = np.sqrt(np.sum((X-Xm)**2,axis=1)*np.sum((y-ym)**2))
r = r_num/r_den
return r

If we try timing it as before:

for t in xrange(0,10): # average 10 timings
start = time.time()
r = vcorrcoef(vic_runoff,obs_runoff)
end = time.time()
times[t] = end-start
print np.mean(times)

The average time is only **0.0017 seconds**—a reduction of about 99.6%, or a 280x speedup compared to the non-vectorized approach. Not bad! It’s often said that Python is slow, but the matrix operations in NumPy can clearly be used to our advantage.