splitting a column into multiple columns with specific name in pandas dataframe

Question

I have following dataframe:

pri    sec
TOM    AB,CD,EF
JACK   XY,YZ
HARRY  FG
NICK   KY,NY,SD,EF,FR

I need following output with column names as following(based on how many , separated fields exists in column 'sec'):

pri    sec             sec0  sec1  sec2  sec3 sec4
TOM    AB,CD,EF        AB    CD    EF    NaN  NaN
JACK   XY,YZ           XY    YZ    NaN   NaN  NaN
HARRY  FG              FG    NaN   NaN   NaN  NaN
NICK   KY,NY,SD,EF,FR  KY    NY    SD    EF   ER

Can I get any suggestions?

jezrael · Accepted Answer · 2018-01-11 12:32:18Z

29

Use join + split + add_prefix:

df = df.join(df['sec'].str.split(',', expand=True).add_prefix('sec'))
print (df)
     pri             sec sec0  sec1  sec2  sec3  sec4
0    TOM        AB,CD,EF   AB    CD    EF  None  None
1   JACK           XY,YZ   XY    YZ  None  None  None
2  HARRY              FG   FG  None  None  None  None
3   NICK  KY,NY,SD,EF,FR   KY    NY    SD    EF    FR

And if need NaNs add fillna:

df = df.join(df['sec'].str.split(',', expand=True).add_prefix('sec').fillna(np.nan))
print (df)
     pri             sec sec0 sec1 sec2 sec3 sec4
0    TOM        AB,CD,EF   AB   CD   EF  NaN  NaN
1   JACK           XY,YZ   XY   YZ  NaN  NaN  NaN
2  HARRY              FG   FG  NaN  NaN  NaN  NaN
3   NICK  KY,NY,SD,EF,FR   KY   NY   SD   EF   FR

edited Jan 11, 2018 at 12:32

answered Jan 11, 2018 at 12:30

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Zero Over a year ago

df.join(df['sec'].str.split(',', expand=True).add_prefix('sec')) to add prefix.

Kiann Over a year ago

hi, I have a similar problem, but the panda array I have within takes the form of : [1, 3, 5, ...........2 ]. There are 50 values in each cell. I tried using .str.split(',', expand = True) but it doesn't seem to work

jezrael Over a year ago

@Kiann - So what is print (df['column'].apply(type)) ? There are lists or strings?

Kiann Over a year ago

hi @jezrael, it shows as class list. <class 'list'>

Kiann Over a year ago

found a related thread and can use this : df2 = pd.DataFrame(df['column'].values.tolist()) ... but wonder if there is a way to assign the column names such that one can set it as 'state_xx'

rnso · Accepted Answer · 2018-01-11 15:46:43Z

Try following code (explanations as comments). It finds max length of items in "sec" column and creates names accordingly:

maxlen = max(list(map(lambda x: len(x.split(",")) ,df.sec))) # find max length in 'sec' column
cols = ["sec"+str(x)   for x in range(maxlen)]      # create new column names 
datalist = list(map(lambda x: x.split(","), df.sec)) # create list from entries in "sec" 
newdf = pd.DataFrame(data=datalist, columns=cols)   # create dataframe of new columns
newdf = pd.concat([df, newdf], axis=1)              # add it to original dataframe
print(newdf)

Output:

     pri             sec sec0  sec1  sec2  sec3  sec4
0    TOM        AB,CD,EF   AB    CD    EF  None  None
1   JACK           XY,YZ   XY    YZ  None  None  None
2  HARRY              FG   FG  None  None  None  None
3   NICK  KY,NY,SD,EF,FR   KY    NY    SD    EF    FR

Collectives™ on Stack Overflow

splitting a column into multiple columns with specific name in pandas dataframe

2 Answers 2

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related