Indexed sum of numpy array?

Question

I would like to generate a numpy array by performing a sum of indexed values from another array

For for example, given the following arrays:

row_indices = np.array([[1, 1, 1], [0, 1, 1]])
col_indices = np.array([[0, 0, 1], [1, 1, 1]])
values = np.array([[2, 2, 3], [2, 4, 4]])

I would like so set a new array indexed_sum in the following way:

for i in range(row_indices.size):
    indexed_sum[row_indices.flat[i], col_indices.flat[i]] += values.flat[i]

Such that:

indexed_sum = np.array([[0, 2], [4, 11]])

However, since this is a python loop and these arrays can be very large, this takes an unacceptable amount of time. Is there an efficient numpy method that I can use to accomplish this?

No. One can assume much larger arrays. The output is created separately, of different size from the input arrays, and I can effectively constrain the index values to make sure none are outside the valid range of the output array. — professional_yet_not_trackable
– professional_yet_not_trackable, Commented Jan 25, 2022 at 16:36
I think yours is a good solution, any other way would be a high memory cost solution (in terms of nd.array sizes). I think you could modify a little bit the code to make it faster, but not by much.. — Ulises Bussi
– Ulises Bussi, Commented Jan 25, 2022 at 17:04

Engineero · Accepted Answer · 2022-01-25 18:36:42Z

4

You might find success with numba, another Python package. I timed the following two functions in a Jupyter notebook with %timeit. Results are below:

import numba
import numpy as np


# Your loop, but in a function.
def run_sum(row_indicies, col_indicies, values):
    indexed_sum = np.zeros((row_indicies.max() + 1, col_indicies.max() + 1))
    for i in range(row_indicies.size):
        indexed_sum[row_indicies.flat[i], col_indicies.flat[i]] += values.flat[i]
    return indexed_sum


# Your loop with a numba decorator.
@numba.jit(nopython=True)  # note you may be able to parallelize too
def run_sum_numba(row_indicies, col_indicies, values):
    indexed_sum = np.zeros((row_indicies.max() + 1, col_indicies.max() + 1))
    for i in range(row_indicies.size):
        indexed_sum[row_indicies.flat[i], col_indicies.flat[i]] += values.flat[i]
    return indexed_sum

My example data to have something bigger to chew on:

row_id_big = np.random.randint(0, 100, size=(1000,))
col_id_big = np.random.randint(0, 100, size=(1000,))
values_big = np.random.randint(0, 10, size=(1000,))

Results:

%timeit run_sum(row_id_big, col_id_big, values_big)
# 1.04 ms ± 12.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit run_sum_numba(row_id_big, col_id_big, values_big)
# 3.85 µs ± 44.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The loop with the numba decorator is a couple hundred times faster in this example. I'm not positive about the memory usage compared to your example. I had to initialize a numpy array to have somewhere to put the data, but if you have a better way of doing that step you might be able to improve performance further.

A note with numba: you need to run your loop once to start seeing the major speed improvements. You might be able to initialize the jit with just a toy example like yours here and see the same speedup.

edited Jan 25, 2022 at 18:36

answered Jan 25, 2022 at 17:26

Engineero

13k5 gold badges56 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ulises Bussi Over a year ago

this is great. Here my times for ading the decorator to out_size = 50 in_shape = (5000,5000) OP method with decorator: 0.2307071130053373 OP method: 22.997501948993886

Engineero Over a year ago

Yeah, numba is great for when you can't avoid loops but can't seem to use any built-in vectorized loops. Basically compiles your loop on the spot into something a lot faster.

Dharman · Accepted Answer · 2022-01-25 17:25:49Z

Since the tradeoff between speed and memmory-ussage I think your method is well situable. But you can still make it faster:

avoid flattening the arrays inside the loop this will save you some time
instead of using .flat[:] or .flatten() use .ravel() (I'm not sure why but it seems to be faster)
also avoid for i in range.. just zip in the values of interest (see method3)

Here a good solution that will speed-up things:

    r_i_f = row_indices.ravel()
    c_i_f = col_indices.ravel()
    v_f = values.ravel()
    
    indexed_sum = np.zeros((row_indices.max()+1,col_indices.max()+1))
    for i,j,v in zip(r_i_f,c_i_f,v_f):
        indexed_sum[i, j] += v
    return indexed_sum

To see a comparision here's some toy code (correct any detail it's not proportioned and let me know if it works well for you)




def method1(values,row_indices,col_indices):
    """OP method"""
    indexed_sum = np.zeros((row_indices.max()+1,col_indices.max()+1))
    for i in range(row_indices.size):
        indexed_sum[row_indices.flat[i], col_indices.flat[i]] += values.flat[i]
    return indexed_sum

def method2(values,row_indices,col_indices):
    """just raveling before loop. Time saved here is considerable"""
    r_i_f = row_indices.ravel()
    c_i_f = col_indices.ravel()
    v_f = values.ravel()
    
    indexed_sum = np.zeros((row_indices.max()+1,col_indices.max()+1))
    for i in range(row_indices.size):
        indexed_sum[r_i_f[i], c_i_f[i]] += v_f[i]
    return indexed_sum

def method3(values,row_indices,col_indices):
    """raveling, then avoiding range(...), just zipping
    the time saved here is small but by no cost"""
    r_i_f = row_indices.ravel()
    c_i_f = col_indices.ravel()
    v_f = values.ravel()
    
    indexed_sum = np.zeros((row_indices.max()+1,col_indices.max()+1))
    for i,j,v in zip(r_i_f,c_i_f,v_f):
        indexed_sum[i, j] += v
    return indexed_sum




from time import perf_counter
import numpy as np


out_size = 50
in_shape = (5000,5000)

values = np.random.randint(10,size=in_shape)
row_indices = np.random.randint(out_size,size=in_shape)
col_indices = np.random.randint(out_size,size=in_shape)


t1 = perf_counter()
v1 = method1(values,row_indices,col_indices)
t2 = perf_counter()
v2 = method2(values,row_indices,col_indices)
t3 = perf_counter()
v3 = method3(values,row_indices,col_indices)
t4 = perf_counter()

print(f"method1: {t2-t1}")
print(f"method2: {t3-t2}")
print(f"method3: {t4-t3}")

Outputs for values of shape 5000x5000 and output shaped as 50x50:

method1: 23.66934896100429
method2: 14.241692076990148
method3: 11.415708078013267

aditional a comparison between fltten methods (in my computer)

q = np.random.randn(5000,5000)
t1 = perf_counter()
q1 = q.flatten()
t2 = perf_counter()
q2 = q.ravel()
t3 = perf_counter()
q3 = q.reshape(-1)
t4 = perf_counter()
q4 = q.flat[:]
t5 = perf_counter()
#print times:
print(f"q.flatten: {t2-t1}")
print(f"q.ravel: {t3-t2}")
print(f"q.reshape(-1): {t4-t3}")
print(f"q.flat[:]: {t5-t4}")

Outputs:

q.flatten: 0.043878231997950934
q.ravel: 5.550700007006526e-05
q.reshape(-1): 0.0006349250033963472
q.flat[:]: 0.08832104799512308

This is useful, and it represents a four-fold improvement in runtime on real data, but it's still much longer than I'd like for a simple operation (and exceeds my acceptable runtime constraints).

CJR · Accepted Answer · 2022-01-25 17:46:53Z

2

There's a lot of options for this, but they're all reinventing a wheel that kinda already exists.

import numpy as np
from scipy import sparse

row_indices = np.array([[1, 1, 1], [0, 1, 1]])
col_indices = np.array([[0, 0, 1], [1, 1, 1]])
values = np.array([[2, 2, 3], [2, 4, 4]])

What you want is the built-in behavior for the scipy sparse matrices:

arr = sparse.coo_matrix((values.flat, (row_indices.flat, col_indices.flat)))

Which yields a sparse data structure:

>>> arr
<2x2 sparse matrix of type '<class 'numpy.int64'>'
    with 6 stored elements in COOrdinate format>

But you can convert it back to a numpy array easily:

>>> arr.A
array([[ 0,  2],
       [ 4, 11]])

answered Jan 25, 2022 at 17:46

CJR

3,9872 gold badges13 silver badges24 bronze badges

Comments

professional_yet_not_trackable · Accepted Answer · 2022-01-25 21:21:45Z

2

There are some good answers here, but in the end I cheated and wrote an extension module method using the numpy C API, which runs in the trivial time that I wanted.

The code is precisely as boring as one would expect, but since an answer would seem incomplete without some, here is the core of it. It does make some unfortunate assumptions about typing that I mean to fill in with time.

int* row_data = PyArray_DATA(row_indices);
int* col_data = PyArray_DATA(col_indices);
double* value_data = PyArray_DATA(values);
double* output_data = PyArray_DATA(sum_obj);

for(int i = 0; i < input_rows; ++i)
{
    for(int j = 0; j < input_cols; ++j)
    {
        long output_row = row_data[i*input_cols+j];
        long output_col = col_data[i*input_cols+j];
        output_data[output_row*out_col_count+output_col] += value_data[i*input_cols+j];
    }
}

edited Jan 25, 2022 at 21:21

answered Jan 25, 2022 at 21:16

professional_yet_not_trackable

2071 silver badge13 bronze badges

1 Comment

professional_yet_not_trackable Over a year ago

The more I think about it, the longer this code has to go. It doesn't take into account any kind of ordering or strides, and it needs more work before it can be truly generic, though it worked well for my immediate need.

Collectives™ on Stack Overflow

Indexed sum of numpy array?

4 Answers 4

2 Comments

2 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related