2

Forgive me if something about what I'm about to ask sounds stupid, I've just started with numpy and multi-dimensional arrays in Python :D

That said, I've got a 3D array of [85 x 235 x 327]. Each position holds a discrete value, and, in most cases, NaN.

First thing I'd like to do is iterate over this array and remove the NaN values, building a new array that contains only valid values.

I've tried this:

    for index,value in np.ndenumerate( data ):
    print "index value: " + str(index)
    print "value: " + str(value)

But this will only execute one pass...not really sure what ndenumerate does.

Also tried this:

indexOne = waves.shape[0]
indexTwo = waves.shape[1]
indexThree = waves.shape[2]

for i in range(indexOne):
    for j in range(indexTwo):
        for k in range(indexThree):
            a = waves[i,j,k]
            print a.data

And while this does iterates...taking into account that I have 6531825 points...this is going to take forever...thus, is there any built-in function to remove values from an existing array without having to iterate all the elements?

5
  • 1
    What do you mean by "remove"? That makes it sound like you just want a flat array containing only non-nan values. Or do you want the values replaced with something else? Commented Jan 23, 2014 at 12:10
  • Have you read through wiki.scipy.org/Tentative_NumPy_Tutorial or scipy-lectures.github.io/intro/numpy/index.html ? I think you will find it helpful. Commented Jan 23, 2014 at 12:26
  • @senderle I just want get rid of those values, having a final array with all the values, preserving the shape, if possible, otherwise a flat array. Commented Jan 23, 2014 at 13:39
  • @MrE Yeah, I did, thanks for sharing tough :) Commented Jan 23, 2014 at 14:07
  • title of question is misleading here... Commented Nov 4, 2018 at 21:35

2 Answers 2

1

It depends a little on what you want the final array to look like. Here's something that literally does what you say. However, it doesn't preserve the shape. Setting up the array:

>>> a = numpy.linspace(0, 26, 27).reshape(3, 3, 3)
>>> a[1][0] = numpy.nan
>>> a
array([[[  0.,   1.,   2.],
        [  3.,   4.,   5.],
        [  6.,   7.,   8.]],

       [[ nan,  nan,  nan],
        [ 12.,  13.,  14.],
        [ 15.,  16.,  17.]],

       [[ 18.,  19.,  20.],
        [ 21.,  22.,  23.],
        [ 24.,  25.,  26.]]])

Then you can create a mask with isnan:

>>> numpy.isnan(a)
array([[[False, False, False],
        [False, False, False],
        [False, False, False]],

       [[ True,  True,  True],
        [False, False, False],
        [False, False, False]],

       [[False, False, False],
        [False, False, False],
        [False, False, False]]], dtype=bool)

And use it to index a:

>>> a[~numpy.isnan(a)]
array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,  12.,  13.,
        14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,  22.,  23.,  24.,
        25.,  26.])

You could use a similar trick to do many other things with nan values. For example:

>>> a[numpy.isnan(a)] = 0
>>> a
array([[[  0.,   1.,   2.],
        [  3.,   4.,   5.],
        [  6.,   7.,   8.]],

       [[  0.,   0.,   0.],
        [ 12.,  13.,  14.],
        [ 15.,  16.,  17.]],

       [[ 18.,  19.,  20.],
        [ 21.,  22.,  23.],
        [ 24.,  25.,  26.]]])
Sign up to request clarification or add additional context in comments.

3 Comments

Ok, thanks for the detailed answer...is this advisable with a array as big as mine? From my point of view, I don't mind losing the shape, since, if I have no values...I don't mind losing that column, really.
It's true that this produces a second boolean array of the same size as the original; two arrays with 6 million elements isn't too much to handle for most modern computers though. Still, there are a lot of nan-related functions in numpy that might help reduce memory usage if you need it. For example, you can create a "masked array" using numpy.ma.masked_invalid; the masked array takes up more memory, but may save memory or be more efficient in some operations.
Thanks senderle, this effectively works. Any idea on how numpy flattens the multidimensional array into a 1D one? Just wondering how many "info" I'm losing in the process :D
1

nan_to_num does exactly what you want:

Replace nan with zero and inf with finite numbers.

Returns an array or scalar replacing Not a Number (NaN) with zero, (positive) infinity with a very large number and negative infinity with a very small (or negative) number.

Use it like:

x = np.nan_to_num(x)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.