Applying Numpy functions on Pandas data frame

Question

I have a numpy array as following:

     array([[1, 2],
            [3, 4],
            [5, 6],
            [7, 8]])

The array is called myArray, and I perform two indexing operations on the 2D array and get following results:

     In[1]: a2 = myArray[1:]
            a2

     Out[1]:array([[3, 4],
                   [5, 6],
                   [7, 8]])


     In[2]: a1 = myArray[:-1]
            a1

     Out[2]:array([[1, 2],
                   [3, 4],
                   [5, 6]])

Now, I perform numpy function to get following results:

     In[]: theta = np.arccos((a1*a2).sum(axis= 1)/(np.sqrt((a1**2).sum(axis= 1)*(a2**2).sum(axis= 1))))
           theta
     Out[]: array([ 0.1798535 ,  0.05123717,  0.02409172])

I perform the same sequence of operations on an equivalent data frame:

    In[]: df = pd.DataFrame(data = myArray, columns = ["x", "y"])
          df
    Out[]: 
         x    y
      0  1    2
      1  3    4
      3  5    6
      4  7    8

   In[]: b2 = df[["x", "y"]].iloc[1:]
   Out[]: b2
            x   y
       1    3   4
       2    5   6
       3    7   8

   In[]: b1 = df[["x", "y"]].iloc[:-1]
         b1
   Out[]: 
            x   y
       0    1   2
       1    3   4
       2    5   6

But now when I am trying to get theta for the data frame, I am only getting 0's and NaN values

      In[]: theta2 = np.arccos((b1*b2).sum(axis= 1)/(np.sqrt((b1**2).sum(axis= 1)*(b2**2).sum(axis= 1))))
            theta2
      Out[]: 
            0    NaN
            1    0.0
            2    0.0
            3    NaN
            dtype: float64

Is it the right way I am applying the numpy functions to indexed data frames ? How should I get the same result for theta when applying it for data frame ?

UPDATE

As suggested below, using b1.values and b2.values works, but now when I am constructing a function, and applying it to the df, I keep getting value error:

       def theta(group):
             b2 = df[["x", "y"]].iloc[1:]
             b1 = df[["x", "y"]].iloc[:-1]

             t = np.arccos((b1.values*b2.values).sum(axis= 1)/
              (np.sqrt((b1.values**2).sum(axis= 1)*(b2.values**2).sum(axis= 1))))

       return t

       df2 = df.apply(theta)

This gives ValueError

       ValueError: Shape of passed values is (2, 3), indices imply (2, 4)

Please let me know where I am wrong.

Thanks in advance.

@piRSquared Can you please help me with the UPDATE part here. — Liza
– Liza, Commented May 8, 2017 at 14:26

Allen Qin · Accepted Answer · 2017-05-08 05:59:29Z

2

The index of b1 and b2 is not aligned.

If you do:

b2.index=b1.index

np.arccos((b1*b2).sum(axis= 1)/(np.sqrt((b1**2).sum(axis= 1)*(b2**2).sum(axis= 1))))

Should output:

Out[75]: 
0    0.179853
1    0.051237
2    0.024092
dtype: float64

If you don't want to change index, you can call df.values explicitly:

np.arccos((b1.values*b2.values).sum(axis= 1)/(np.sqrt((b1.values**2).sum(axis= 1)*(b2.values**2).sum(axis= 1))))

answered May 8, 2017 at 5:59

Allen Qin

20k9 gold badges55 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Liza Over a year ago

Thanks a lot, this is what I was expecting.

Allen Qin Over a year ago

@Liza, can you show what's your expected output with your update?

Liza Over a year ago

I am sorry for this late reply. I expect the same answers with which u helped me previously i.e array([ 0.1798535 , 0.05123717, 0.02409172]). I am applying the same operations but have created a function theta() and implementing it in that.

Allen Qin Over a year ago

df.apply will apply a function row wise or column wise to a dataframe. You can simply call theta('') which will give you the same output. btw, the group parameter is not required.

Collectives™ on Stack Overflow

Applying Numpy functions on Pandas data frame

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related