2

I have a dataframe such as:

col-a   col-b
1       None
1       Failed
1       Passed
2       None
2       Passed
3       Inconclusive
3       Passed

and a hierarchy of terms:

Failed > Inconclusive > Passed > None

How can I get something like:

1       Failed
2       Passed
3       Inconclusive

Thanks!

4 Answers 4

2

You can create dictionary for column created by Series.map, then sorting by both columns with DataFrame.sort_values and get first unique row per groups by DataFrame.drop_duplicates:

d = {'Failed':0,'Inconclusive':1, 'Passed':2, None: 3}
df['new'] = df['col-b'].map(d)
df = df.sort_values(['col-a', 'new']).drop_duplicates('col-a').drop('new', 1)
print (df)
   col-a         col-b
1      1        Failed
4      2        Passed
5      3  Inconclusive

Another idea with DataFrameGroupBy.idxmin:

d = {'Failed':0,'Inconclusive':1, 'Passed':2, None: 3}
df =  df.loc[df['col-b'].map(d).groupby(df['col-a']).idxmin()]
print (df)
   col-a         col-b
1      1        Failed
4      2        Passed
5      3  Inconclusive
Sign up to request clarification or add additional context in comments.

Comments

2
h = {'Failed':1, 'Inconclusive': 2, 'Passed':3, 'None':4}

(
    df.assign(b=df['col-b'].map(h))
    .groupby(by='col-a')
    .apply(lambda x: x.sort_values(by=['b']).head(1))
    .reset_index(drop=True)
    .drop('b',1)
)

col-a   col-b
0   1   Failed
1   2   Passed
2   3   Inconclusive

Comments

1

Use

Ex.

import pandas as pd

df = pd.DataFrame({'col-a': [1,1,1,2,2,3,3],
               'col-b': ['None','Failed','Passed','None','Passed','Inconclusive','Passed']})

df = df.drop(df[df['col-b'] == 'None'].index).groupby('col-a').first().reset_index()
# or
# m = df['col-b'].apply(lambda x: x == 'None')
# df = df[~m].groupby('col-a').first().reset_index()
print(df)

or mask and groupby, if None is class NoneType.

df = pd.DataFrame({'col-a': [1,1,1,2,2,3,3],
               'col-b': [None,'Failed','Passed',None,'Passed','Inconclusive','Passed']})
m = df['col-b'].apply(lambda x: x is None)
df = df[~m].groupby('col-a').first().reset_index()
print(df)

O/P:

   col-a         col-b
0      1        Failed
1      2        Passed
2      3  Inconclusive

Comments

0

You can use map if i right understand question.

import pandas as pd

df = pd.DataFrame({'col-a': [1,1,1,2,2,3,3], 
                   'col-b': [None,'Failed','Passed',None,'Passed','Inconclusive','Passed']})

df['rang'] = df['col-b'].map({'Failed':1, 'Passed':2, 'Inconclusive':3})

df:

    col-a   col-b        rang
0   1       None         NaN
1   1       Failed       1.0
2   1       Passed       2.0
3   2       None         NaN
4   2       Passed       2.0
5   3       Inconclusive 3.0
6   3       Passed       2.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.