0

Hi i am trying to display a large merged data set into a scatter plot graph to find the relationship between The gdp per capita and number of kids. The dataframe looks someting like this. How do i remove the rows with Nan Values and plot the scatter? Or do i just plot the graph straight and it will ignore all rows with NaN value? Any help would be great thanks :) Also , when calculating the mean of the second and third column , do i show the result on another column?

Country | Number of kids | GDP per capita
  A     |      4         |    2345
  B     |      2         |    2156
  C     |     NaN        |    1156
  D     |     5          |    958
  E     |     NaN        |    NaN
  F     |     8          |    NaN
.
.
.
 Z      |     3          |    2 
1
  • 1
    have you tried "plotting the graph straight"? Commented Sep 11, 2018 at 7:24

1 Answer 1

1

Use the pandas' dropna() function to remove nan and then plot it with a scatter() plot of matplotlib.

import pandas as pd
import matplotlib.pyplot as plt
enter code here
df = <your dataset>
plot_df = df.dropna()
plt.scatter(plot_df['Number of kids'], plot_df['GDP per capita'])

If your dataset is very large consider to use the sample function to randomly sample data:

df = df.sample(1000)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.