Open In App

Creating a Dataframe using Excel Files

Last Updated : 10 Dec, 2025
Comments
Improve
Suggest changes
48 Likes
Like
Report

A DataFrame is essentially a 2-dimensional labeled data structure with rows and columns, making it ideal for handling structured data from Excel. Pandas library provides different way to load Excel files and convert them into DataFrame objects a tabular data structure similar to an Excel sheet.

Note: We will use a sample Excel file named SampleWork.xlsx you can download it from here. Make sure to keep the file in the same folder as your Python script.

Reading Default Sheet of an Excel File

When you read an Excel file using pd.read_excel() without specifying any sheet name, Pandas automatically loads the first sheet of the workbook. This is the most common method, especially when your file contains only one sheet or the required data is already on the first sheet.

Python
import pandas as pd
df = pd.read_excel("SampleWork.xlsx")
print(df)

Output

Name Age Stream Percentage
0 Ankit 18 Math 95
1 Rahul 19 Science 90
2 Shaurya 20 Commerce 85
3 Aishwarya 18 Math 80
4 Priyanka 19 Science 75

Explanation:

  • pd.read_excel("SampleWork.xlsx") reads the default (first) sheet
  • df stores the loaded DataFrame

Reading a Specific Sheet Using sheet_name

When an Excel file contains multiple sheets, you can tell Pandas exactly which sheet to load using the sheet_name parameter. You may provide either the sheet’s index (starting from 0) or its name to directly access the data you need.

Python
import pandas as pd
df = pd.read_excel("SampleWork.xlsx", sheet_name=1)
print(df)

Output

Name Age Stream Percentage
0 Priya 18 Math 95
1 shivangi 19 Science 90
2 Jeet 20 Commerce 85
3 Ananya 18 Math 80
4 Swapnil 19 Science 75

Explanation: sheet_name=1 loads the second sheet

Reading Only Specific Columns Using usecols

When we need the entire dataset from an Excel sheet, we can use the usecols parameter allows you to load only the required columns by specifying their names or index positions.

Python
import pandas as pd
df = pd.read_excel("SampleWork.xlsx", usecols=[0, 3])
print(df)

Output

Name Percentage
0 Ankit 95
1 Rahul 90
2 Shaurya 85
3 Aishwarya 80
4 Priyanka 75

Explanation: usecols=[0, 3] loads only selected columns and ignores all other data

Handling Missing Values Using na_values

Excel files sometimes contain placeholder text such as "Missing" or "NA" to indicate empty data. With the na_values parameter, Pandas can automatically convert these placeholders into proper NaN values.

Python
import pandas as pd
df = pd.read_excel("SampleWork.xlsx", sheet_name=2, na_values="Missing")
print(df)

Output

Name Age Stream Percentage
0 Priya 18 Math 95
1 shivangi 19 Science 90
2 Jeet 20 NaN 85
3 Ananya 18 Math 80
4 Swapnil 19 Science 75

Explanation: na_values="Missing" converts the word “Missing” into NaN

Skipping Rows Using skiprows

When Excel files contain extra information at the top like titles, notes, or empty rows that you don’t want in your DataFrame, we can use skiprows parameter which helps you ignore these unwanted rows so Pandas starts reading only from the actual data.

Python
import pandas as pd
df = pd.read_excel("SampleWork.xlsx", sheet_name=1, skiprows=2)
print(df)

Output

shivangi 19 Science 90
0 Jeet 20 Commerce 85
1 Ananya 18 Math 80
2 Swapnil 19 Science 75

Explanation: skiprows=2 ignores first two rows before reading data

Setting a Custom Header Row Using header

Not all Excel files have column names in the first row. With the header parameter, you can tell Pandas which row contains the actual column names, ensuring your DataFrame loads with clean and correct headers.

Python
import pandas as pd
df = pd.read_excel("SampleWork.xlsx", sheet_name=1, header=2)
print(df)

Output

shivangi 19 Science 90
0 Jeet 20 Commerce 85
1 Ananya 18 Math 80
2 Swapnil 19 Science 75

Explanation: header=2 treats the third row as header

Reading Multiple Sheets Together

If you want to load more than one sheet at a time, Pandas allows you to pass a list of sheet names or indexes. This is useful when your data is spread across multiple sheets but still related.

Python
import pandas as pd
df = pd.read_excel("SampleWork.xlsx", sheet_name=[0, 1], na_values="Missing")
print(df)

Output

{0: Name Age Stream Percentage
0 Ankit 18 Math 95
1 Rahul 19 Science 90
2 Shaurya 20 Commerce 85
3 Aishwarya 18 Math 80
4 Priyanka 19 Science 75,

1: Name Age Stream Percentage
0 Priya 18 Math 95
1 shivangi 19 Science 90
2 Jeet 20 Commerce 85
3 Ananya 18 Math 80
4 Swapnil 19 Science 75}

Explanation: It returns an OrderedDict where, each key = sheet index and each value = dataFrame.

Reading All Sheets Together Using sheet_name=None

When you want to read every sheet in the Excel file at once, simply set sheet_name=None. Pandas will load all sheets and return them in a dictionary like structure, where each sheet becomes a separate DataFrame.

Python
import pandas as pd
df = pd.read_excel("SampleWork.xlsx", sheet_name=None, na_values="Missing")
print(df)

Output

{'Sheet1': Name Age Stream Percentage
0 Ankit 18 Math 95
1 Rahul 19 Science 90
2 Shaurya 20 Commerce 85
3 Aishwarya 18 Math 80
4 Priyanka 19 Science 75,

'Sheet2': Name Age Stream Percentage
0 Priya 18 Math 95
1 shivangi 19 Science 90
2 Jeet 20 Commerce 85
3 Ananya 18 Math 80
4 Swapnil 19 Science 75,

'Sheet3': Name Age Stream Percentage
0 Priya 18 Math 95
1 shivangi 19 Science 90
2 Jeet 20 NaN 85
3 Ananya 18 Math 80
4 Swapnil 19 Science 75}

Explanation: sheet_name=None loads all sheets


Explore