0
data_values = np.random.rand(10)
data_ind = np.random.randint(0,10,10)
    
data_values = (array([0.81444589, 0.57734696, 0.54130794, 0.22339518, 0.916973  ,
            0.14956333, 0.74504583, 0.36218693, 0.17958372, 0.47195214]),
    
data_ind = array([7, 5, 2, 2, 0, 6, 6, 1, 4, 3]))

Desired output:

0 - 0.91693   
1 - 0.36218693  
2 - 0.54130794 + 0.22339518  
3 - 0.47195214  
4 - 0.17958372  
5 - 0.57734696  
6 -  0.14956333 + 0.74504583  
output = array([0.916973, 0.36218694, 0.7647031, 0.47195214, 0.17958371, 0.577347, 0.89460915, 0.8144459], dtype=float32)

I have written a long way

nodal_values = np.zeros(8, dtype=np.float32)  
for nodes in range(8):  
    nodal_values[nodes] = np.sum(data_values[np.where(data == nodes)[0]])

The above method takes lot of time, whereas

a = ((np.mgrid[:M,:N] == b)[0] * c).sum(axis=1)

gives memory error for large data with millions.

I am looking for an optimized way.

1 Answer 1

0

Please checkout stackoverflow question guidelines in order to ask better questions, as well as properly format them.


Options

Original code

This is what you want to optimize for large values of N (I took the liberty of editing your code so that it does not have hardcoded values and fixed a typo, data_values instead of data):

data_values = np.random.rand(N) 
data_ind = np.random.randint(0, N, N)

xsize = data_ind.max() + 1
nodal_values = np.zeros(xsize, dtype=np.float32)  
for nodes in range(xsize):  
    nodal_values[nodes] = np.sum(data_values[np.where(data_ind == nodes)[0]])

Slightly better version (for readability)

I created the following version which improves readability and takes away the use of np.where:

idx = np.arange(xsize)[:, None] == data_ind
nodal_values = [np.sum(data_values[idx[i]]) for i in range(xsize)] # Python list

Much better version

I implemented the accepted answer in here (be sure to check it out to understand it better) by @Divakar to your case:

_, idx, _ = np.unique(data_ind, return_counts=True, return_inverse=True)
nodal_values = np.bincount(idx, data_values) # Same shape and type as your version

Comparison

Using your original values:

data_values = np.array([0.81444589, 0.57734696, 0.54130794, 0.22339518, 0.916973, 0.14956333, 0.74504583, 0.36218693, 0.17958372, 0.47195214])
data_ind = np.array([7, 5, 2, 2, 0, 6, 6, 1, 4, 3])

I got the following performance using timeit module (mean ± std. dev. of 7 runs, 10000000 loops each):

Original code: 49.2 +- 11.1 ns
Much better version: 45.2 +- 4.98 ns
Slightly better version: 36.4 +- 2.81 ns

For really small values of N, i.e, 1 to 10, there is no significant difference. However, for big ones, there is no question as to which one to use; both versions with for-loops take too long, while the vectorized implementation does it extremely fast.

Small N comparison Big N comparison

Code to test it out

import numpy as np
import timeit
import matplotlib.pyplot as plt

def original_code():
    xsize = data_ind.max() + 1
    nodal_values = np.zeros(xsize, dtype=np.float32)
    for nodes in range(xsize):
        nodal_values[nodes] = np.sum(data_values[np.where(data_ind == nodes)[0]])

def much_better():
    _, idx, _ = np.unique(data_ind, return_counts=True, return_inverse=True)
    nodal_values = np.bincount(idx, data_values)

def slightly_better():
    xsize = data_ind.max() + 1
    idx = np.arange(xsize)[:, None] == data_ind
    nodal_values = [np.sum(data_values[idx[i]]) for i in range(xsize)]

sizes = [i*5 for i in range(1, 7)]
original_code_times = np.zeros((len(sizes),))
slightly_better_times = np.zeros((len(sizes),))
much_better_times = np.zeros((len(sizes),))
for i, N in enumerate(sizes):
    print(N)
    data_values = np.random.rand(N)
    data_ind = np.random.randint(0, N, N)

    # Divided by 100 repeats to get average
    original_code_times[i] = timeit.timeit(original_code, number=100) / 100
    much_better_times[i] = timeit.timeit(much_better, number=100) / 100
    slightly_better_times[i] = timeit.timeit(slightly_better, number=100) / 100

# Multiply by 1000 to get everything in ms
original_code_times *= 1000
slightly_better_times *= 1000
much_better_times *= 1000

# %%
plt.figure(dpi=120)
plt.title("Small N's")
plt.plot(sizes, original_code_times, label="Original code")
plt.plot(sizes, slightly_better_times, label="Slightly better")
plt.plot(sizes, much_better_times, label="Much better")
plt.ylabel("Time [ms]")
plt.xlabel("N")
plt.xticks(sizes)
plt.legend()
plt.savefig("small_N.png", dpi=120)
plt.show()
plt.close()

I hope this helps anyone who may stumble upon this.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.