0

Below is code with the functionality I want on some simple sample data. Basically I binned data using np.digitize, and then I computed a column index based on this question. bin_idx is known to never decrease in case that helps. How can I index to get the 2D array without an explicit loop? One complication is that the number of values in each row/bin varies. I will later do different statistics on each bin/row, max just being an example.

import numpy as np

x = np.arange(10)
bin_idx = np.array([0, 0, 0, 1, 2, 3, 3, 4, 4, 4])
col_idx = np.array([0, 1, 2, 0, 0, 0, 1, 0, 1, 2])

binned = np.ones((bin_idx[-1]+1, np.max(col_idx)+1)) * np.nan
for i in range(len(x)):
    binned[bin_idx[i], col_idx[i]] = x[i]
print(binned)
row_max = np.nanmax(binned, 1)
print(row_max)
1
  • Aside: if you're working with data, pandas might be more natural than working with bare numpy; here, you're basically reimplementing something like df.pivot(1,2,0).max(axis=1). Commented Feb 25, 2016 at 16:16

1 Answer 1

3

Numpy indexing allows you to pass sequences as indices. Also check out Numpy's full method used below to create the binned array.

binned = np.full((bin_idx[-1]+1, np.max(col_idx)+1), np.nan)
binned[bin_idx, col_idx] = x
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.