How to get sum of values in a numpy array based on another array with repetitive indices

Question

data_values = np.random.rand(10)
data_ind = np.random.randint(0,10,10)
    
data_values = (array([0.81444589, 0.57734696, 0.54130794, 0.22339518, 0.916973  ,
            0.14956333, 0.74504583, 0.36218693, 0.17958372, 0.47195214]),
    
data_ind = array([7, 5, 2, 2, 0, 6, 6, 1, 4, 3]))

Desired output:

0 - 0.91693   
1 - 0.36218693  
2 - 0.54130794 + 0.22339518  
3 - 0.47195214  
4 - 0.17958372  
5 - 0.57734696  
6 -  0.14956333 + 0.74504583  
output = array([0.916973, 0.36218694, 0.7647031, 0.47195214, 0.17958371, 0.577347, 0.89460915, 0.8144459], dtype=float32)

I have written a long way

nodal_values = np.zeros(8, dtype=np.float32)  
for nodes in range(8):  
    nodal_values[nodes] = np.sum(data_values[np.where(data == nodes)[0]])

The above method takes lot of time, whereas

a = ((np.mgrid[:M,:N] == b)[0] * c).sum(axis=1)

gives memory error for large data with millions.

I am looking for an optimized way.

Camilo Martínez M. · Accepted Answer · 2021-04-20 22:41:41Z

Please checkout stackoverflow question guidelines in order to ask better questions, as well as properly format them.

Options

Original code

This is what you want to optimize for large values of N (I took the liberty of editing your code so that it does not have hardcoded values and fixed a typo, data_values instead of data):

data_values = np.random.rand(N) 
data_ind = np.random.randint(0, N, N)

xsize = data_ind.max() + 1
nodal_values = np.zeros(xsize, dtype=np.float32)  
for nodes in range(xsize):  
    nodal_values[nodes] = np.sum(data_values[np.where(data_ind == nodes)[0]])

Slightly better version (for readability)

I created the following version which improves readability and takes away the use of np.where:

idx = np.arange(xsize)[:, None] == data_ind
nodal_values = [np.sum(data_values[idx[i]]) for i in range(xsize)] # Python list

Much better version

I implemented the accepted answer in here (be sure to check it out to understand it better) by @Divakar to your case:

_, idx, _ = np.unique(data_ind, return_counts=True, return_inverse=True)
nodal_values = np.bincount(idx, data_values) # Same shape and type as your version

Comparison

Using your original values:

data_values = np.array([0.81444589, 0.57734696, 0.54130794, 0.22339518, 0.916973, 0.14956333, 0.74504583, 0.36218693, 0.17958372, 0.47195214])
data_ind = np.array([7, 5, 2, 2, 0, 6, 6, 1, 4, 3])

I got the following performance using timeit module (mean ± std. dev. of 7 runs, 10000000 loops each):

Original code: 49.2 +- 11.1 ns
Much better version: 45.2 +- 4.98 ns
Slightly better version: 36.4 +- 2.81 ns

For really small values of N, i.e, 1 to 10, there is no significant difference. However, for big ones, there is no question as to which one to use; both versions with for-loops take too long, while the vectorized implementation does it extremely fast.

Code to test it out

import numpy as np
import timeit
import matplotlib.pyplot as plt

def original_code():
    xsize = data_ind.max() + 1
    nodal_values = np.zeros(xsize, dtype=np.float32)
    for nodes in range(xsize):
        nodal_values[nodes] = np.sum(data_values[np.where(data_ind == nodes)[0]])

def much_better():
    _, idx, _ = np.unique(data_ind, return_counts=True, return_inverse=True)
    nodal_values = np.bincount(idx, data_values)

def slightly_better():
    xsize = data_ind.max() + 1
    idx = np.arange(xsize)[:, None] == data_ind
    nodal_values = [np.sum(data_values[idx[i]]) for i in range(xsize)]

sizes = [i*5 for i in range(1, 7)]
original_code_times = np.zeros((len(sizes),))
slightly_better_times = np.zeros((len(sizes),))
much_better_times = np.zeros((len(sizes),))
for i, N in enumerate(sizes):
    print(N)
    data_values = np.random.rand(N)
    data_ind = np.random.randint(0, N, N)

    # Divided by 100 repeats to get average
    original_code_times[i] = timeit.timeit(original_code, number=100) / 100
    much_better_times[i] = timeit.timeit(much_better, number=100) / 100
    slightly_better_times[i] = timeit.timeit(slightly_better, number=100) / 100

# Multiply by 1000 to get everything in ms
original_code_times *= 1000
slightly_better_times *= 1000
much_better_times *= 1000

# %%
plt.figure(dpi=120)
plt.title("Small N's")
plt.plot(sizes, original_code_times, label="Original code")
plt.plot(sizes, slightly_better_times, label="Slightly better")
plt.plot(sizes, much_better_times, label="Much better")
plt.ylabel("Time [ms]")
plt.xlabel("N")
plt.xticks(sizes)
plt.legend()
plt.savefig("small_N.png", dpi=120)
plt.show()
plt.close()

I hope this helps anyone who may stumble upon this.

Collectives™ on Stack Overflow

How to get sum of values in a numpy array based on another array with repetitive indices

1 Answer 1

Options

Original code

Slightly better version (for readability)

Much better version

Comparison

Code to test it out

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Options

Original code

Slightly better version (for readability)

Much better version

Comparison

Code to test it out

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related