4

I am very confused when it comes to the logic of the NumPy axis argument. In some cases it affects the row when axis = 0 and in some cases it affects the columns when axis = 0. Example:

a = np.array([[1,3,6,7,4],[3,2,5,9,1]])
array([[1,3,6,7,4],
       [3,2,5,9,1]])

np.sort(a, axis = 0)   #This sorts the columns
array([[1, 2, 5, 7, 1],  
       [3, 3, 6, 9, 4]])

np.sort(a, axis=1)     #This sorts the rows           
array([[1, 3, 4, 6, 7],
       [1, 2, 3, 5, 9]])

#####################################################################
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
arr
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

np.delete(arr,obj = 1, axis = 0)        # This deletes the row
array([[ 1,  2,  3,  4],
       [ 9, 10, 11, 12]])

np.delete(arr,obj = 1, axis = 1)        #This deletes the column
array([[ 1,  3,  4],
       [ 5,  7,  8],
       [ 9, 11, 12]])

If there is some logic here that I am missing I would love to learn it.

1
  • 1
    In cases where the 2d terminology is confusing, it may help to first think about the 1d case, or 3d where the action affects one axis different from the other 2. Commented Apr 21, 2021 at 19:52

2 Answers 2

6

It's perhaps simplest to remember it as 0=down and 1=across.

This means: enter image description here Use axis=0 to apply a method down each column, or to the row labels (the index). Use axis=1 to apply a method across each row, or to the column labels. Here's a picture to show the parts of a DataFrame that each axis refers to:

It's also useful to remember that Pandas follows NumPy's use of the word axis. The usage is explained in NumPy's glossary of terms:

Axes are defined for arrays with more than one dimension. A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1). [my emphasis]

So, concerning the method in the question, np.sort(axis=1), seems to be correctly defined. It takes the mean of entries horizontally across columns, that is, along each individual row. On the other hand, np.sort(axis=0) would be an operation acting vertically downwards across rows.

Similarly, np.delete(name, axis=1) refers to an action on column labels, because they intuitively go across the horizontal axis. Specifying axis=0 would make the method act on rows instead.

Sign up to request clarification or add additional context in comments.

2 Comments

Wow, that is some backwards thinking. After thinking about it for a while I think I see what you are saying. By indicating axis=0 it is a downward action, and because in the delete statement we are limiting it to obj=1 which means at the index of 1, go down and delete those values until you reach the end of the array. So even though it looks like it is affecting the row, it is really still following the same rule but just being constrained to obj=1
That's the correct thought process, please upvote the answer if you liked
0
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
arr
# array([[ 1,  2,  3,  4],
#        [ 5,  6,  7,  8],
#        [ 9, 10, 11, 12]])

arr has 2 dimensions, use the empty slice : to select the first and second axis arr[:,:]. From the documentation of np.delete regarding the second parameter obj:

obj : slice, int or array of ints Indicate indices of sub-arrays to remove along the specified axis.

If we want to delete obj=1 from axis=0 we are effectively removing arr[[1],:] from arr

arr[[1],:] # array([[5, 6, 7, 8]])

With the same intuition, we can remove obj=1 from axis=1

arr[:,[1]] # array([[ 2],
           #        [ 6],
           #        [10]])

When sorting the array arr above along axis=0 we are comparing the following elements:

# array([[1, 2, 5, 7, 1]])
# array([[5, 6, 7, 8]])
# array([[ 9, 10, 11, 12]])

The array is already sorted in this case but the comparison is done between two rows. For example array([[5, 6, 7, 8]]) is compared with array([[ 9, 10, 11, 12]]) by doing an element-wise comparison.

Sorting the array on axis=1 we are comparing the following elements

# array([[1],    array([[ 2],    array([[ 3],    array([[ 4],
#        [5],           [ 6],           [ 7],           [ 8],
#        [9]])          [10]])          [11]])          [12]])

Notice the difference of axis usage between np.delete and np.sort. np.delete will remove the complete row/column while np.sort will use the complete row/column for comparison.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.