Matrices addition in numpy

Question

I need to calculate the sum of a list of matrices, however, I can't use np.sum, even with axis=0, I don't know why. The current solution is a loop, but is there a better way for that?

import numpy as np
SAMPLE_SIZES = [10, 100, 1000, 10000]
ITERATIONS = 1
MEAN = np.array([1, 1])
COVARIANCE = np.array([[1, 0.5], [0.5, 1]])
for sample_size in SAMPLE_SIZES:
    max = -1
    for i in range(ITERATIONS):
        xs = np.random.multivariate_normal(MEAN, COVARIANCE, size=sample_size)
        sigma = [[0, 0], [0, 0]]
        for x in xs:
            sigma += np.outer((x-MEAN), (x-MEAN)) / (sample_size-1)

In the code above, can I replace the last loop using some numpy function? I guess using a loop would be not efficient if the data is very large.

Tell us about xs. Array? List? Shape ,dtype?. Is MEAN a scalar? — hpaulj
– hpaulj, Commented Sep 14, 2016 at 7:03
Please show us your erroneous implementation with np.sum, and the error message. — Praveen
– Praveen, Commented Sep 14, 2016 at 7:14

Community · Accepted Answer · 2017-05-23 10:27:24Z

2

Read up about numpy broadcasting.

xs = np.random.multivariate_normal(MEAN, COVARIANCE, size=sample_size)

xs now has shape (sample_size, 2), which means you can just subtract MEAN directly. You now need to take the outer product between xs - MEAN and xs - MEAN while adding over the sample_size axis. This is best done using np.einsum:

>>> sigma = np.einsum('ij,ik->jk', xs - MEAN, xs - MEAN) / sample_size
>>> sigma    
array([[ 1.00216043,  0.49549231],
       [ 0.49549231,  1.00004423]])

An alternative is to use broadcasting:

>>> sigma = np.sum((xs - MEAN)[:, :, np.newaxis]
                   * (xs - MEAN)[:, np.newaxis, :], axis=0) / sample_size

Though the broadcasting solution seems easier to understand, np.einsum will usually be more efficient than broadcasting.

Additional note: Note that I've divided by sample_size, and not by sample_size - 1. This is because for estimating the covariance matrix of a random variable with known mean, you need to divide by sample_size. Use sample_size - 1 when you are estimating the mean from the same dataset as well, and using it in your covariance estimate. Otherwise your covariance estimate will be biased.

edited May 23, 2017 at 10:27

CommunityBot

11 silver badge

answered Sep 14, 2016 at 18:04

Praveen

7,2723 gold badges47 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

1a1a11a Over a year ago

Thank you for your help, could you explain a little bit more the additional note? Actually this is part of my homework assignment, I just followed the equation given by professor. But I would like to hear it if this is an error

Praveen Over a year ago

@1a1a11a I'd have to give you a whole course on estimation to explain why... Read about unbiased estimators, and then try to derive the unbiased estimator of variance for a gaussian random variable with known mean, and for one with unknown mean.

Paul Brodersen · Accepted Answer · 2016-09-14 15:15:02Z

0

If you just want to compute the empirical covariance then I would suggest using numpy.cov(xs.T).

Otherwise, the last 3 lines can be replaced by:

xm = xs - np.mean(xs, axis=0)
sigma = np.inner(xm.T. xm.T) / (sample_size-1)

answered Sep 14, 2016 at 15:15

Paul Brodersen

12.7k26 silver badges51 bronze badges

1 Comment

1a1a11a Over a year ago

Thank you for your help, but the answer above is more relevant.

Collectives™ on Stack Overflow

Matrices addition in numpy

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related