2

I have a dataframe of text strings which essentially represents one or many journeys per row. I'm trying to split the legs of the journey so I can see them individually. The example input dataframe looks as follows:

UPDATED:

df_input = pd.DataFrame([{'var1':'A/A1', 'var2':'x/y/z', 'var3':'abc1'}, 
                         {'var1':'B', 'var2':'xx/yy', 'var3':'abc2'}, 
                         {'var1':'c', 'var2':'zz', 'var3':'abcd'}])

   var1 var2    var3
0  A/A1 x/y/z   abc1
1   B   xx/yy   abc2
2   c   zz      abcd

The output I'm trying to get should look as follows. So for the first example, the journey legs are A to A1 then A1 to x then x to y and then y to z. If there is also a way to add an additional column indicating the journey leg number (1,2,3 etc.) that'll be very helpful. var3 has no importance here, but I've just included it to show that there are other columns which get repeated when the rows are split.

df_output = pd.DataFrame([{'var1': 'A', 'var2': 'A1', 'var3':'abc1'}, 
                          {'var1': 'A1', 'var2': 'x', 'var3':'abc1'},
                          {'var1': 'x', 'var2': 'y', 'var3':'abc1'},
                          {'var1': 'y', 'var2': 'z', 'var3':'abc1'},
                          {'var1': 'B', 'var2': 'xx', 'var3':'abc2'},
                          {'var1': 'xx', 'var2': 'yy', 'var3':'abc2'},
                          {'var1': 'c', 'var2': 'zz', 'var3':'abcd'}])

  var1 var2 var3
0   A   A1  abc1
1   A1  x   abc1
2   x   y   abc1
3   y   z   abc1
4   B   xx  abc2
5   xx  yy  abc2
6   c   zz  abcd

Can someone please help?

Thanks

2 Answers 2

5

Try with explode

df=df_input.assign(var2=df_input.var2.str.split('/')).explode('var2')
  var1 var2  var3
0    A    x  abc1
0    A    y  abc1
0    A    z  abc1
1    B   xx  abc2
1    B   yy  abc2
2    c   zz  abcd

Then groupby + shift

df.var1=df.groupby(level=0).var2.shift().fillna(df.var1)
df
  var1 var2  var3
0    A    x  abc1
0    x    y  abc1
0    y    z  abc1
1    B   xx  abc2
1   xx   yy  abc2
2    c   zz  abcd
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! Can this also work in the scenario when I have multiple enteries in var1 column? I've updated the original question to show the example.
I tried repeating the code on both columns but it doesn't work that way. Not sure how to handle this scenario when there are multiple / separated entries in both columns
4

Solution

Try this.

EDIT: Made a change based on the suggestion from @Ben.T.

df = pd.concat([df.rename(columns={'var2': 'var2old'}), 
                df.var2.str.split('/').explode()], 
               axis=1, join='outer')
## CREDIT: @Ben.T
df['var1'] = df['var1'].where(df['var1'].ne(df['var1'].shift()), df['var2'].shift())
print(df)

Output:

  var1 var2old  var3 var2
0    A   x/y/z  abc1    x
0    x   x/y/z  abc1    y
0    y   x/y/z  abc1    z
1    B   xx/yy  abc2   xx
1   xx   xx/yy  abc2   yy
2    c      zz  abcd   zz

Dummy Data

The data originally posted by the OP (Original Poster of the question).

import pandas as pd

df = pd.DataFrame([{'var1':'A', 'var2':'x/y/z', 'var3':'abc1'}, 
                   {'var1':'B', 'var2':'xx/yy', 'var3':'abc2'}, 
                   {'var1':'c', 'var2':'zz', 'var3':'abcd'}])

5 Comments

@Ben.T Thank you. Good catch! I will get back with the solution later.
@Ben.T added the suggested solution to make the answer complete. Thank you.
Thanks! Can this also work in the scenario when I have multiple enteries in var1 column? I've updated the original question to show the example.
It will not work as it is, since your updated data follows a different logic. However, you can try something similar and flatten column-var1 first and then apply this solution on it. I would suggest that you create a new question with your updated data and leave a reference to that here. The solutions posted here work for what you had asked initially. Now, when you change your question, these solutions APPEAR to be not correct. So, I request you to not drop (at least) your original data from the question. Thank you.
Makes sense, will do that. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.