0

I am trying to use vectorization on a pandas dataframe to create a new column. The dataframe is fairly huge(millions of records). I am showing a dummy example here. I am showing a non vecotorised version which works but is not very efficient. I am trying to implement the vectorised version while using the function(the actual function is fairly complicated than the one shown here).

import pandas as pd
import numpy as np

df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
df['color'] = np.where(df['Set']=='Z', 'green', 'red')


def test(row):
    if row['color'] =='green':
        value='Green'
    elif row['color'] =='red':
        value=row['Type']
    else: 
        value=row['Set']
    return value

def test1(s,t,c):
    if c =='green':
        value='Green'
    elif c =='red':
        value=t
    else: 
        value=s
    return value

df['new_color']=df.apply(test,axis=1)
#df['new_color']=test1(df.Set,df.Type,df.color)
print(df)

   Set Type color  new_color
0   Z    A  green     Green
1   Z    B  green     Green
2   X    B    red         B
3   Y    C    red         C

Any help would be appreciated.

1 Answer 1

1

You can do with np.where

df['NC']=np.where(df.color=='green','Green',df.Type)

df
Out[1234]: 
  Set Type  color     NC
0   Z    A  green  Green
1   Z    B  green  Green
2   X    B    red      B
3   Y    C    red      C
Sign up to request clarification or add additional context in comments.

1 Comment

that's good suggestion. However, I need to retain the function because the actual conditions are a bit more complicated than the one shown in dummy example above. Its easier for me to wrap all conditions in a function and use it. I am trying to avoid apply which is slow.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.