Search⌘ K

To NumPy

Explore how to convert a pandas DataFrame with categorical data into a fully quantitative NumPy matrix suitable for machine learning. Learn to filter data, handle missing values, and transform categorical features into indicator variables using pandas get_dummies. This lesson equips you to prepare real-world datasets for input into machine learning frameworks.

Chapter Goals:

  • Learn how to convert a DataFrame to a NumPy matrix
  • Write code to modify an MLB dataset and convert it to a NumPy matrix

A. Machine learning

The DataFrame object is great for storing a dataset and performing data analysis in Python. However, most machine learning frameworks (e.g. TensorFlow), work directly with NumPy data. Furthermore, the NumPy data used as input to machine learning models must solely contain quantitative values.

Therefore, to use a DataFrame's data with a machine learning model, we need to convert the DataFrame to a NumPy matrix of quantitative data. So even the categorical features of a DataFrame, such as gender and birthplace, must be converted to quantitative values.

B. Indicator features

When converting a DataFrame to a ...