Pandas substring using another column as the index

Question

I'm trying to use one column containing the start index to subselect a string column.

df = pd.DataFrame({'string': ['abcdef', 'bcdefg'], 'start_index': [3, 5]})
expected = pd.Series(['def', 'g'])

I know that you can substring with the following

df['string'].str[3:]

However, in my case, the start index may vary, so I tried:

df['string'].str[df['start_index']:]

But it return NaNs.

EDIT: What if I don't want to use a loop / list comprehension; i.e. vectorized method preferred.

EDIT2: In this small test case, it seems like list comprehension is faster.

from itertools import islice
%timeit df.apply(lambda x: ''.join(islice(x.string, x.start_index, None)), 1)
%timeit pd.Series([x[y:] for x , y in zip(df.string,df.start_index) ])

631 µs ± 1.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
101 µs ± 233 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Might take a look here: stackoverflow.com/questions/39042214/… — rafaelc
– rafaelc, Commented Jun 14, 2019 at 22:11

BENY · Accepted Answer · 2019-06-14 21:50:18Z

1

Using for loop with zip of two columns , why we are using for loop here, you can check the link

[x[y:] for x , y in zip(df.string,df.start_index) ]
Out[328]: ['def', 'g']

answered Jun 14, 2019 at 21:50

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Leszek Zarna Over a year ago

It's so slow solution that it's impractical for larger data sets

BENY Over a year ago

@LeszekZarna stackoverflow.com/questions/54028199/…

Collectives™ on Stack Overflow

Pandas substring using another column as the index

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related