2

I would like to generate a numpy array by performing a sum of indexed values from another array

For for example, given the following arrays:

row_indices = np.array([[1, 1, 1], [0, 1, 1]])
col_indices = np.array([[0, 0, 1], [1, 1, 1]])
values = np.array([[2, 2, 3], [2, 4, 4]])

I would like so set a new array indexed_sum in the following way:

for i in range(row_indices.size):
    indexed_sum[row_indices.flat[i], col_indices.flat[i]] += values.flat[i]

Such that:

indexed_sum = np.array([[0, 2], [4, 11]])

However, since this is a python loop and these arrays can be very large, this takes an unacceptable amount of time. Is there an efficient numpy method that I can use to accomplish this?

3
  • 1
    your output is always a 2x2 array? Commented Jan 25, 2022 at 16:29
  • No. One can assume much larger arrays. The output is created separately, of different size from the input arrays, and I can effectively constrain the index values to make sure none are outside the valid range of the output array. Commented Jan 25, 2022 at 16:36
  • I think yours is a good solution, any other way would be a high memory cost solution (in terms of nd.array sizes). I think you could modify a little bit the code to make it faster, but not by much.. Commented Jan 25, 2022 at 17:04

4 Answers 4

4

You might find success with numba, another Python package. I timed the following two functions in a Jupyter notebook with %timeit. Results are below:

import numba
import numpy as np


# Your loop, but in a function.
def run_sum(row_indicies, col_indicies, values):
    indexed_sum = np.zeros((row_indicies.max() + 1, col_indicies.max() + 1))
    for i in range(row_indicies.size):
        indexed_sum[row_indicies.flat[i], col_indicies.flat[i]] += values.flat[i]
    return indexed_sum


# Your loop with a numba decorator.
@numba.jit(nopython=True)  # note you may be able to parallelize too
def run_sum_numba(row_indicies, col_indicies, values):
    indexed_sum = np.zeros((row_indicies.max() + 1, col_indicies.max() + 1))
    for i in range(row_indicies.size):
        indexed_sum[row_indicies.flat[i], col_indicies.flat[i]] += values.flat[i]
    return indexed_sum

My example data to have something bigger to chew on:

row_id_big = np.random.randint(0, 100, size=(1000,))
col_id_big = np.random.randint(0, 100, size=(1000,))
values_big = np.random.randint(0, 10, size=(1000,))

Results:

%timeit run_sum(row_id_big, col_id_big, values_big)
# 1.04 ms ± 12.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit run_sum_numba(row_id_big, col_id_big, values_big)
# 3.85 µs ± 44.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The loop with the numba decorator is a couple hundred times faster in this example. I'm not positive about the memory usage compared to your example. I had to initialize a numpy array to have somewhere to put the data, but if you have a better way of doing that step you might be able to improve performance further.

A note with numba: you need to run your loop once to start seeing the major speed improvements. You might be able to initialize the jit with just a toy example like yours here and see the same speedup.

Sign up to request clarification or add additional context in comments.

2 Comments

this is great. Here my times for ading the decorator to out_size = 50 in_shape = (5000,5000) OP method with decorator: 0.2307071130053373 OP method: 22.997501948993886
Yeah, numba is great for when you can't avoid loops but can't seem to use any built-in vectorized loops. Basically compiles your loop on the spot into something a lot faster.
2

Since the tradeoff between speed and memmory-ussage I think your method is well situable. But you can still make it faster:

  1. avoid flattening the arrays inside the loop this will save you some time
  2. instead of using .flat[:] or .flatten() use .ravel() (I'm not sure why but it seems to be faster)
  3. also avoid for i in range.. just zip in the values of interest (see method3)

Here a good solution that will speed-up things:

    r_i_f = row_indices.ravel()
    c_i_f = col_indices.ravel()
    v_f = values.ravel()
    
    indexed_sum = np.zeros((row_indices.max()+1,col_indices.max()+1))
    for i,j,v in zip(r_i_f,c_i_f,v_f):
        indexed_sum[i, j] += v
    return indexed_sum

To see a comparision here's some toy code (correct any detail it's not proportioned and let me know if it works well for you)




def method1(values,row_indices,col_indices):
    """OP method"""
    indexed_sum = np.zeros((row_indices.max()+1,col_indices.max()+1))
    for i in range(row_indices.size):
        indexed_sum[row_indices.flat[i], col_indices.flat[i]] += values.flat[i]
    return indexed_sum

def method2(values,row_indices,col_indices):
    """just raveling before loop. Time saved here is considerable"""
    r_i_f = row_indices.ravel()
    c_i_f = col_indices.ravel()
    v_f = values.ravel()
    
    indexed_sum = np.zeros((row_indices.max()+1,col_indices.max()+1))
    for i in range(row_indices.size):
        indexed_sum[r_i_f[i], c_i_f[i]] += v_f[i]
    return indexed_sum

def method3(values,row_indices,col_indices):
    """raveling, then avoiding range(...), just zipping
    the time saved here is small but by no cost"""
    r_i_f = row_indices.ravel()
    c_i_f = col_indices.ravel()
    v_f = values.ravel()
    
    indexed_sum = np.zeros((row_indices.max()+1,col_indices.max()+1))
    for i,j,v in zip(r_i_f,c_i_f,v_f):
        indexed_sum[i, j] += v
    return indexed_sum




from time import perf_counter
import numpy as np


out_size = 50
in_shape = (5000,5000)

values = np.random.randint(10,size=in_shape)
row_indices = np.random.randint(out_size,size=in_shape)
col_indices = np.random.randint(out_size,size=in_shape)


t1 = perf_counter()
v1 = method1(values,row_indices,col_indices)
t2 = perf_counter()
v2 = method2(values,row_indices,col_indices)
t3 = perf_counter()
v3 = method3(values,row_indices,col_indices)
t4 = perf_counter()

print(f"method1: {t2-t1}")
print(f"method2: {t3-t2}")
print(f"method3: {t4-t3}")

Outputs for values of shape 5000x5000 and output shaped as 50x50:

method1: 23.66934896100429
method2: 14.241692076990148
method3: 11.415708078013267


aditional a comparison between fltten methods (in my computer)

q = np.random.randn(5000,5000)
t1 = perf_counter()
q1 = q.flatten()
t2 = perf_counter()
q2 = q.ravel()
t3 = perf_counter()
q3 = q.reshape(-1)
t4 = perf_counter()
q4 = q.flat[:]
t5 = perf_counter()
#print times:
print(f"q.flatten: {t2-t1}")
print(f"q.ravel: {t3-t2}")
print(f"q.reshape(-1): {t4-t3}")
print(f"q.flat[:]: {t5-t4}")
Outputs:

q.flatten: 0.043878231997950934
q.ravel: 5.550700007006526e-05
q.reshape(-1): 0.0006349250033963472
q.flat[:]: 0.08832104799512308

2 Comments

This is useful, and it represents a four-fold improvement in runtime on real data, but it's still much longer than I'd like for a simple operation (and exceeds my acceptable runtime constraints).
go with@Engineero solution, i think it's what you need
2

There's a lot of options for this, but they're all reinventing a wheel that kinda already exists.

import numpy as np
from scipy import sparse

row_indices = np.array([[1, 1, 1], [0, 1, 1]])
col_indices = np.array([[0, 0, 1], [1, 1, 1]])
values = np.array([[2, 2, 3], [2, 4, 4]])

What you want is the built-in behavior for the scipy sparse matrices:

arr = sparse.coo_matrix((values.flat, (row_indices.flat, col_indices.flat)))

Which yields a sparse data structure:

>>> arr
<2x2 sparse matrix of type '<class 'numpy.int64'>'
    with 6 stored elements in COOrdinate format>

But you can convert it back to a numpy array easily:

>>> arr.A
array([[ 0,  2],
       [ 4, 11]])

Comments

2

There are some good answers here, but in the end I cheated and wrote an extension module method using the numpy C API, which runs in the trivial time that I wanted.

The code is precisely as boring as one would expect, but since an answer would seem incomplete without some, here is the core of it. It does make some unfortunate assumptions about typing that I mean to fill in with time.

int* row_data = PyArray_DATA(row_indices);
int* col_data = PyArray_DATA(col_indices);
double* value_data = PyArray_DATA(values);
double* output_data = PyArray_DATA(sum_obj);

for(int i = 0; i < input_rows; ++i)
{
    for(int j = 0; j < input_cols; ++j)
    {
        long output_row = row_data[i*input_cols+j];
        long output_col = col_data[i*input_cols+j];
        output_data[output_row*out_col_count+output_col] += value_data[i*input_cols+j];
    }
}

1 Comment

The more I think about it, the longer this code has to go. It doesn't take into account any kind of ordering or strides, and it needs more work before it can be truly generic, though it worked well for my immediate need.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.