How do I visualize grouped data from a dataframe using Python?

Question

After grouping two columns of data in my dataframe, I obtained a small table of integers whose image I've attached below.

Please click here for the image of the data

This was the code used for grouping:

count = x_train.groupby(['bool_loc', 'target']).size()

I am trying to visualize this data (type int64) using python and thought that maybe a histogram with two categories 0 and 1 (for column 'bool_loc') and each category having two bars (for column 'target') with their heights representing frequency would be a good way to do so. I tried like this:

# create figure and axis
fig, ax = plt.subplots()
# plot histogram
ax.hist(count)
# set title and labels
ax.set_title('Relation Between Location Data Presence and Disaster Tweets')
ax.set_xlabel('Location Data Presence')
ax.set_ylabel('Frequency of Tweets')

The histogram I obtained:

Image of obtained histogram

It seems that the frequency data has been plotted along the x-axis (it should be on the y-axis) instead of the data in 'bool_loc'.

marc_s · Accepted Answer · 2020-07-12 17:19:12Z

1

I tried to visualize histograms based on the shape of your dataframe. Here is the result: 2 histograms with 2 bins

I'm not sure if this complies with your data input, as I simply made similar dataframe to the one in your post. Probably you have it done differently.

The code is below:

import pandas as pd

import matplotlib
import numpy as np
import matplotlib.pyplot as plt

# make dataframe
arrays = [[0, 0, 1, 1],
          [0, 1, 0, 1]]
data = [1458, 1075, 2884, 2196] 
df = pd.DataFrame(data, index=arrays, columns=['frequency'])

# get data from DF series
y1 = df.loc[0,'frequency'].to_list()
y2 = df.loc[1,'frequency'].to_list()

# get data arrays
arr1 = [0] * y1[0] + [1] * y1[1]
arr2 = [0] * y2[0] + [1] * y2[1]

# set matplotlib plot
fig, ax = plt.subplots()

# plot histogram
num_bins = 2
ax.hist([arr1, arr2], num_bins, density=False, label=['bool_loc 0', 'bool_loc 1'])
plt.legend(loc='upper right')
plt.show()

edited Jul 12, 2020 at 17:19

marc_s

760k186 gold badges1.4k silver badges1.5k bronze badges

answered Jun 28, 2020 at 12:25

Andrey Povalyaev

686 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Tahia Tabassum Over a year ago

Thank you for your answer! Your code created the exact histogram I wanted to create. However there is one small error which I found and corrected: the numbers 1075 and 2884 should be switched in the data list to get the correct representation in the graph. I have one more question - is there a way to remove the numbering in the x axis and replace it with labels for each pair?

Andrey Povalyaev Over a year ago

You can explicitely specify the ticks labels on the axes with ax.set_xticklabels(). link . Very quick and very ugly workaround for your particulat case: ax.set_xticklabels(('', '0', '1', '0', '1'))

Nimantha · Accepted Answer · 2023-10-31 11:44:17Z

0

Try this

print (data.target.value_counts(normalize=True).plot(kind='bar'))

edited Oct 31, 2023 at 11:44

Nimantha

6,5476 gold badges32 silver badges78 bronze badges

answered Jun 28, 2020 at 12:03

Israr Awan

558 bronze badges

1 Comment

Tahia Tabassum Over a year ago

Hello! Thank you so much for your reply. I did try creating the graph myself and have edited the question to include the code I used. I implemented your code as print(count.value_counts(normalize=True).plot(kind='bar')) In this case however, the four frequency's are plotted as four separate bars in the graph, which isn't what I wanted.

Collectives™ on Stack Overflow

How do I visualize grouped data from a dataframe using Python?

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related