0

Based on the code below, I'm trying to assign some columns to my DataFrame which has been grouped by month of the date and works well :

all_together = (df_clean.groupby(df_clean['ContractDate'].dt.strftime('%B'))
                  .agg({'Amount': [np.sum, np.mean, np.min, np.max]})
                  .rename(columns={'sum': 'sum_amount', 'mean': 'avg_amount', 'amin': 'min_amount', 'amax': 'max_amount'}))

But for some reason when I try to plot the result(in any kind as plot), it's not able to recognize my "ContractDate" as a column and also any of those renamed names such as: 'sum_amount'.

Do you have any idea that what's the issue and what am I missing as a rule for plotting the data?

I have tried the code below for plotting and it asks me what is "ContractDate" and what is "sum_amount"!

all_together.groupby(df_clean['ContractDate'].dt.strftime('%B'))['sum_amount'].nunique().plot(kind='bar')
#or
all_together.plot(kind='bar',x='ContractDate',y='sum_amount')

I really appreciate your time

Cheers, z.A

1 Answer 1

1

When you apply groupby function on a DataFrame, it makes the groupby column as index(ContractDate in your case). So you need to reset the index first to make it as a column.

df = pd.DataFrame({'month':['jan','feb','jan','feb'],'v2':[23,56,12,59]})
t = df.groupby('month').agg('sum')

Output:

       v2
month   
feb    115
jan    35

So as you see, you're getting months as index. Then when you reset the index:

t.reset_index()

Output:

    month   v2
0   feb     115
1   jan     35

Next when you apply multiple agg functions on a single column in the groupby, it will create a multiindexed dataframe. So you need to make it as single level index:

t = df.groupby('month').agg({'v2': [np.sum, np.mean, np.min, np.max]}).rename(columns={'sum': 'sum_amount', 'mean': 'avg_amount', 'amin': 'min_amount', 'amax': 'max_amount'})

    v2
sum_amount  avg_amount  min_amount  max_amount
month               
feb 115 57.5    56  59
jan 35  17.5    12  23

It created a multiindex.if you check t.columns, you get

MultiIndex(levels=[['v2'], ['avg_amount', 'max_amount', 'min_amount', 'sum_amount']],
           labels=[[0, 0, 0, 0], [3, 0, 2, 1]])

Now use this:

t.columns = t.columns.get_level_values(1)
t.reset_index(inplace=True)

You will get a clean dataframe:

    month   sum_amount  avg_amount  min_amount  max_amount
0   feb       115          57.5       56          59
1   jan       35           17.5       12          23

Hope this helps for your plotting.

Sign up to request clarification or add additional context in comments.

8 Comments

wow! it worked very well ! Many thanks for clarifying the mistake and describing the solution :)
the only issue is , it doesn't show me the plot properly! it's just like only one bar for one month, not any other months . do you have any idea ? or is there a better way than "df.plot " to plot the result ?
It is working properly for me when i use t.plot(kind='bar',x='month',y='sum_amount'). How does your aggregated data look like?
I used the same code! it works for max or min but for sum amount for some reason only shows one bar on January not any other months. while the numbers of the other 5 months are quite close or in negative side !
I am assuming you have bigger number for january and very small numbers for other months. It is plotting all months, but since it is setting the range to accomodate january month, the other months bars are very small to notice.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.