1

After grouping two columns of data in my dataframe, I obtained a small table of integers whose image I've attached below.

Please click here for the image of the data

This was the code used for grouping:

count = x_train.groupby(['bool_loc', 'target']).size() 

I am trying to visualize this data (type int64) using python and thought that maybe a histogram with two categories 0 and 1 (for column 'bool_loc') and each category having two bars (for column 'target') with their heights representing frequency would be a good way to do so. I tried like this:

# create figure and axis
fig, ax = plt.subplots()
# plot histogram
ax.hist(count)
# set title and labels
ax.set_title('Relation Between Location Data Presence and Disaster Tweets')
ax.set_xlabel('Location Data Presence')
ax.set_ylabel('Frequency of Tweets')

The histogram I obtained:

Image of obtained histogram

It seems that the frequency data has been plotted along the x-axis (it should be on the y-axis) instead of the data in 'bool_loc'.

2 Answers 2

1

I tried to visualize histograms based on the shape of your dataframe. Here is the result: 2 histograms with 2 bins

I'm not sure if this complies with your data input, as I simply made similar dataframe to the one in your post. Probably you have it done differently.

The code is below:

import pandas as pd

import matplotlib
import numpy as np
import matplotlib.pyplot as plt

# make dataframe
arrays = [[0, 0, 1, 1],
          [0, 1, 0, 1]]
data = [1458, 1075, 2884, 2196] 
df = pd.DataFrame(data, index=arrays, columns=['frequency'])

# get data from DF series
y1 = df.loc[0,'frequency'].to_list()
y2 = df.loc[1,'frequency'].to_list()

# get data arrays
arr1 = [0] * y1[0] + [1] * y1[1]
arr2 = [0] * y2[0] + [1] * y2[1]

# set matplotlib plot
fig, ax = plt.subplots()

# plot histogram
num_bins = 2
ax.hist([arr1, arr2], num_bins, density=False, label=['bool_loc 0', 'bool_loc 1'])
plt.legend(loc='upper right')
plt.show()
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your answer! Your code created the exact histogram I wanted to create. However there is one small error which I found and corrected: the numbers 1075 and 2884 should be switched in the data list to get the correct representation in the graph. I have one more question - is there a way to remove the numbering in the x axis and replace it with labels for each pair?
You can explicitely specify the ticks labels on the axes with ax.set_xticklabels(). link . Very quick and very ugly workaround for your particulat case: ax.set_xticklabels(('', '0', '1', '0', '1'))
0

Try this

print (data.target.value_counts(normalize=True).plot(kind='bar'))

1 Comment

Hello! Thank you so much for your reply. I did try creating the graph myself and have edited the question to include the code I used. I implemented your code as print(count.value_counts(normalize=True).plot(kind='bar')) In this case however, the four frequency's are plotted as four separate bars in the graph, which isn't what I wanted.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.