2

I am having difficulties creating a new column with a value that's based on the value of an existing column in that same dataframe. The existing column is numeric and I'm trying give the new column a categorical value of high, medium, low based on something like:

low: < (max-min)/3

med: (max-min)/3 - (max-min)/3 *2

high: > (max-min)/3 *2

Still learning Pandas, so any help is appreciated. Thanks!

EDIT:

This is what I have attempted:

df_unit_day_hour['Level_Score'] = pd.cut(df_unit_day_hour['Level_Score'], q=3, labels=['low', 'medium', 'high'])

I think it's almost what I need, but I'm getting an error (KeyError). Would it be because df_unit_day_hour['Level_Score'] is a float?

1
  • Please post raw input data, code to reproduce your df and the desired output, thanks Commented Jun 2, 2015 at 12:32

1 Answer 1

6

Sounds like you want to recreate the Series.cut function

Consider this example below:

import numpy as np
import pandas as pd

df = pd.DataFrame({'val':np.random.choice(10, 10)})
df['cat'] = pd.cut(df['val'], [-1,2,5,10], labels=['low', 'medium', 'high'])
    df

   val   cat
0    6  high
1    2   low
2    7  high
3    7  high
4    8  high
5    8  high
6    9  high
7    6  high
8    2   low
9    0   low
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your response. That seemed to put me on the right track, but I'm getting a KeyError. I updated my post to show what I attempted. Thanks again.
@user1624577, I updated my example to explain better how to use the cut/qcut functions.
Very much appreciated! I could have done this in SAS in a couple of minutes, but I'm trying to break away from that platform. Thanks again!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.