getting the index of a row in a pandas apply function

Question

I am trying to access the index of a row in a function applied across an entire DataFrame in Pandas. I have something like this:

df = pandas.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
>>> df
   a  b  c
0  1  2  3
1  4  5  6

and I'll define a function that access elements with a given row

def rowFunc(row):
    return row['a'] + row['b'] * row['c']

I can apply it like so:

df['d'] = df.apply(rowFunc, axis=1)
>>> df
   a  b  c   d
0  1  2  3   7
1  4  5  6  34

Awesome! Now what if I want to incorporate the index into my function? The index of any given row in this DataFrame before adding d would be Index([u'a', u'b', u'c', u'd'], dtype='object'), but I want the 0 and 1. So I can't just access row.index.

I know I could create a temporary column in the table where I store the index, but I'm wondering if it is stored in the row object somewhere.

Aside: is there a reason you need to use apply? It's much slower than performing vectorized ops on the frame itself. (Sometimes apply is the simplest way to do something, and performance considerations are often exaggerated, but for your particular example it's as easy not to use it.) — DSM
– DSM, Commented Oct 30, 2014 at 16:26
@DSM in actuality I am calling another objects constructor for each row using different row elements. I just wanted to put a minimal example together to illustrate the question. — Mike
– Mike, Commented Oct 30, 2014 at 17:27

EdChum · Accepted Answer · 2018-04-19 08:46:59Z

309

To access the index in this case you access the name attribute:

In [182]:

df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
def rowFunc(row):
    return row['a'] + row['b'] * row['c']

def rowIndex(row):
    return row.name
df['d'] = df.apply(rowFunc, axis=1)
df['rowIndex'] = df.apply(rowIndex, axis=1)
df
Out[182]:
   a  b  c   d  rowIndex
0  1  2  3   7         0
1  4  5  6  34         1

Note that if this is really what you are trying to do that the following works and is much faster:

In [198]:

df['d'] = df['a'] + df['b'] * df['c']
df
Out[198]:
   a  b  c   d
0  1  2  3   7
1  4  5  6  34

In [199]:

%timeit df['a'] + df['b'] * df['c']
%timeit df.apply(rowIndex, axis=1)
10000 loops, best of 3: 163 µs per loop
1000 loops, best of 3: 286 µs per loop

EDIT

Looking at this question 3+ years later, you could just do:

In[15]:
df['d'],df['rowIndex'] = df['a'] + df['b'] * df['c'], df.index
df

Out[15]: 
   a  b  c   d  rowIndex
0  1  2  3   7         0
1  4  5  6  34         1

but assuming it isn't as trivial as this, whatever your rowFunc is really doing, you should look to use the vectorised functions, and then use them against the df index:

In[16]:
df['newCol'] = df['a'] + df['b'] + df['c'] + df.index
df

Out[16]: 
   a  b  c   d  rowIndex  newCol
0  1  2  3   7         0       6
1  4  5  6  34         1      16

edited Apr 19, 2018 at 8:46

answered Oct 30, 2014 at 16:25

EdChum

397k204 gold badges837 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Konstantin Over a year ago

Would be nice if name would be a named tuple in case of a Multindex, so that a specific index level could be queried by its name.

Tamás Sajti Over a year ago

Only returns the index value in that row if we have at least one column in this df, otherwise it will return None.

smci · Accepted Answer · 2023-01-13 21:20:03Z

68

Either:

1. with `row.name` inside the `apply(..., axis=1)` call:

df = pandas.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'], index=['x','y'])

   a  b  c
x  1  2  3
y  4  5  6

df.apply(lambda row: row.name, axis=1)

x    x
y    y

2. with `iterrows()` (slower)

DataFrame.iterrows() allows you to iterate over rows, and access their index:

for idx, row in df.iterrows():
    ...

edited Jan 13, 2023 at 21:20

answered Feb 16, 2018 at 4:04

smci

34.2k21 gold badges118 silver badges152 bronze badges

1 Comment

dpb Over a year ago

and, if concerned, 'itertuples' generally performs far better: stackoverflow.com/questions/24870953/…

Freek Wiekmeijer · Accepted Answer · 2019-03-19 10:37:07Z

14

To answer the original question: yes, you can access the index value of a row in apply(). It is available under the key name and requires that you specify axis=1 (because the lambda processes the columns of a row and not the rows of a column).

Working example (pandas 0.23.4):

>>> import pandas as pd
>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
>>> df.set_index('a', inplace=True)
>>> df
   b  c
a      
1  2  3
4  5  6
>>> df['index_x10'] = df.apply(lambda row: 10*row.name, axis=1)
>>> df
   b  c  index_x10
a                 
1  2  3         10
4  5  6         40

answered Mar 19, 2019 at 10:37

Freek Wiekmeijer

4,9881 gold badge34 silver badges41 bronze badges

1 Comment

Charles Fox Over a year ago

Also works for dataframes with MultiIndex: row.name becomes a tuple.

Collectives™ on Stack Overflow

getting the index of a row in a pandas apply function

3 Answers 3

2 Comments

1. with `row.name` inside the `apply(..., axis=1)` call:

2. with `iterrows()` (slower)

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1. with row.name inside the apply(..., axis=1) call:

2. with iterrows() (slower)

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related

1. with `row.name` inside the `apply(..., axis=1)` call:

2. with `iterrows()` (slower)