apply function to unique values of NumPy array

Question

I have a function f which I would like to apply to all elements of an arbitrarily-shaped and -ordered NumPy array x. Because the function evaluation is expensive and x may contain duplicate values, I first reduce x to unique values, a one-dimensional array xu.

xu, ind = np.unique(x, return_inverse=True)

I then create an array for the function values

yu = np.full(len(xu), np.nan)

and fill in this array by applying f elementwise.

I would now like to create an array y of the same shape as x, so that corresponding entries contain the result of the function. My attempt:

y = np.full(x.shape, np.nan)
y[ind] = yu

This fails if x isn't already one-dimensional. (You may guess that I'm used to Matlab, where linear indexing of a multidimensional array works.) What I would need for this is a one-dimensional view on y which I can apply [ind] = to, to assign to the correct elements.

Question 1: Is there such a one-dimensional view on a multidimensional array?

Alternatively, I could create y as one-dimensional, assign values, and then reshape.

y = np.full(x.size, np.nan)
y[ind] = yu
y = np.reshape(y, x.shape)

This seems to work, but I'm unsure whether I have to account for the storage order of x.

Question 2: Does ind returned by np.unique always follow 'C' order, which is default for np.reshape, or does it depend on the internal structure of x?

Your title has nothing to do with the actual question. You can remove the entire section about applying the function, and your question will still be the same. — Mad Physicist
– Mad Physicist, Commented Jan 14, 2021 at 23:41

Mad Physicist · Accepted Answer · 2021-01-15 21:41:42Z

The indices for np.unique operates on a raveled array. This is documented under the first parameter:

Unless axis is specified, this will be flattened if it is not already 1-D.

Ravelling/flattening always happens in C order, regardless of the memory layout. Flattening is just raveling that guarantees a copy. That means that it creates a copy when your array is not in C order:

>>> x = np.zeros((3, 3), order='F')
>>> x.ravel().base is x
False
>>> y = np.zeros((3, 3))
>>> y.ravel().base is y
True

x.ravel() is equivalent to x.reshape(-1). That means that if you can unravel the result with something like flat_y.reshape(original_x_shape):

xu, ind = np.unique(x, return_inverse=True)
yu = np.zeros_like(xu)
for i in range(len(xu)):
    yu[i] = fn(xu[i])
y_flat = yu[ind]
y = y_flat.reshape(x.shape)

Since you are reshaping a contiguous buffer, y and y_flat share the same memory:

>>> y.base is y_flat
True

Fancy indexing, as in the expression y_flat = yu[ind] will always make a copy, since you can't tell if the data is contiguous or not in the general case.

Part of the reason that linear indexing always works in MATLAB is that it guarantees contiguous arrays, always stored in column-major order. Numpy maintains a length in strides in each dimension, so it supports non-contiguous arrays. That allows numpy to do things like transpose an array, or get simple slices from it, without making a copy of the underlying data.

On a side note, if you want to avoid explicitly calling reshape on y, can call it on ind instead:

xu, ind = np.unique(x, return_inverse=True)
yu = np.zeros_like(xu)
for i in range(len(xu)):
    yu[i] = fn(xu[i])
y = yu[ind.reshape(x.shape)]

Collectives™ on Stack Overflow

apply function to unique values of NumPy array

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related