1

I have a function f which I would like to apply to all elements of an arbitrarily-shaped and -ordered NumPy array x. Because the function evaluation is expensive and x may contain duplicate values, I first reduce x to unique values, a one-dimensional array xu.

xu, ind = np.unique(x, return_inverse=True)

I then create an array for the function values

yu = np.full(len(xu), np.nan)

and fill in this array by applying f elementwise.

I would now like to create an array y of the same shape as x, so that corresponding entries contain the result of the function. My attempt:

y = np.full(x.shape, np.nan)
y[ind] = yu

This fails if x isn't already one-dimensional. (You may guess that I'm used to Matlab, where linear indexing of a multidimensional array works.) What I would need for this is a one-dimensional view on y which I can apply [ind] = to, to assign to the correct elements.

Question 1: Is there such a one-dimensional view on a multidimensional array?

Alternatively, I could create y as one-dimensional, assign values, and then reshape.

y = np.full(x.size, np.nan)
y[ind] = yu
y = np.reshape(y, x.shape)

This seems to work, but I'm unsure whether I have to account for the storage order of x.

Question 2: Does ind returned by np.unique always follow 'C' order, which is default for np.reshape, or does it depend on the internal structure of x?

2
  • Your title has nothing to do with the actual question. You can remove the entire section about applying the function, and your question will still be the same. Commented Jan 14, 2021 at 23:41
  • I fixed the typos in my answer. Commented Jan 15, 2021 at 21:32

1 Answer 1

1

The indices for np.unique operates on a raveled array. This is documented under the first parameter:

Unless axis is specified, this will be flattened if it is not already 1-D.

Ravelling/flattening always happens in C order, regardless of the memory layout. Flattening is just raveling that guarantees a copy. That means that it creates a copy when your array is not in C order:

>>> x = np.zeros((3, 3), order='F')
>>> x.ravel().base is x
False
>>> y = np.zeros((3, 3))
>>> y.ravel().base is y
True

x.ravel() is equivalent to x.reshape(-1). That means that if you can unravel the result with something like flat_y.reshape(original_x_shape):

xu, ind = np.unique(x, return_inverse=True)
yu = np.zeros_like(xu)
for i in range(len(xu)):
    yu[i] = fn(xu[i])
y_flat = yu[ind]
y = y_flat.reshape(x.shape)

Since you are reshaping a contiguous buffer, y and y_flat share the same memory:

>>> y.base is y_flat
True

Fancy indexing, as in the expression y_flat = yu[ind] will always make a copy, since you can't tell if the data is contiguous or not in the general case.

Part of the reason that linear indexing always works in MATLAB is that it guarantees contiguous arrays, always stored in column-major order. Numpy maintains a length in strides in each dimension, so it supports non-contiguous arrays. That allows numpy to do things like transpose an array, or get simple slices from it, without making a copy of the underlying data.

On a side note, if you want to avoid explicitly calling reshape on y, can call it on ind instead:

xu, ind = np.unique(x, return_inverse=True)
yu = np.zeros_like(xu)
for i in range(len(xu)):
    yu[i] = fn(xu[i])
y = yu[ind.reshape(x.shape)]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.