Creating a Dataframe using Excel Files
A DataFrame is essentially a 2-dimensional labeled data structure with rows and columns, making it ideal for handling structured data from Excel. Pandas library provides different way to load Excel files and convert them into DataFrame objects a tabular data structure similar to an Excel sheet.
Note: We will use a sample Excel file named SampleWork.xlsx you can download it from here. Make sure to keep the file in the same folder as your Python script.
Reading Default Sheet of an Excel File
When you read an Excel file using pd.read_excel() without specifying any sheet name, Pandas automatically loads the first sheet of the workbook. This is the most common method, especially when your file contains only one sheet or the required data is already on the first sheet.
import pandas as pd
df = pd.read_excel("SampleWork.xlsx")
print(df)
Output
Name Age Stream Percentage
0 Ankit 18 Math 95
1 Rahul 19 Science 90
2 Shaurya 20 Commerce 85
3 Aishwarya 18 Math 80
4 Priyanka 19 Science 75
Explanation:
- pd.read_excel("SampleWork.xlsx") reads the default (first) sheet
- df stores the loaded DataFrame
Reading a Specific Sheet Using sheet_name
When an Excel file contains multiple sheets, you can tell Pandas exactly which sheet to load using the sheet_name parameter. You may provide either the sheet’s index (starting from 0) or its name to directly access the data you need.
import pandas as pd
df = pd.read_excel("SampleWork.xlsx", sheet_name=1)
print(df)
Output
Name Age Stream Percentage
0 Priya 18 Math 95
1 shivangi 19 Science 90
2 Jeet 20 Commerce 85
3 Ananya 18 Math 80
4 Swapnil 19 Science 75
Explanation: sheet_name=1 loads the second sheet
Reading Only Specific Columns Using usecols
When we need the entire dataset from an Excel sheet, we can use the usecols parameter allows you to load only the required columns by specifying their names or index positions.
import pandas as pd
df = pd.read_excel("SampleWork.xlsx", usecols=[0, 3])
print(df)
Output
Name Percentage
0 Ankit 95
1 Rahul 90
2 Shaurya 85
3 Aishwarya 80
4 Priyanka 75
Explanation: usecols=[0, 3] loads only selected columns and ignores all other data
Handling Missing Values Using na_values
Excel files sometimes contain placeholder text such as "Missing" or "NA" to indicate empty data. With the na_values parameter, Pandas can automatically convert these placeholders into proper NaN values.
import pandas as pd
df = pd.read_excel("SampleWork.xlsx", sheet_name=2, na_values="Missing")
print(df)
Output
Name Age Stream Percentage
0 Priya 18 Math 95
1 shivangi 19 Science 90
2 Jeet 20 NaN 85
3 Ananya 18 Math 80
4 Swapnil 19 Science 75
Explanation: na_values="Missing" converts the word “Missing” into NaN
Skipping Rows Using skiprows
When Excel files contain extra information at the top like titles, notes, or empty rows that you don’t want in your DataFrame, we can use skiprows parameter which helps you ignore these unwanted rows so Pandas starts reading only from the actual data.
import pandas as pd
df = pd.read_excel("SampleWork.xlsx", sheet_name=1, skiprows=2)
print(df)
Output
shivangi 19 Science 90
0 Jeet 20 Commerce 85
1 Ananya 18 Math 80
2 Swapnil 19 Science 75
Explanation: skiprows=2 ignores first two rows before reading data
Setting a Custom Header Row Using header
Not all Excel files have column names in the first row. With the header parameter, you can tell Pandas which row contains the actual column names, ensuring your DataFrame loads with clean and correct headers.
import pandas as pd
df = pd.read_excel("SampleWork.xlsx", sheet_name=1, header=2)
print(df)
Output
shivangi 19 Science 90
0 Jeet 20 Commerce 85
1 Ananya 18 Math 80
2 Swapnil 19 Science 75
Explanation: header=2 treats the third row as header
Reading Multiple Sheets Together
If you want to load more than one sheet at a time, Pandas allows you to pass a list of sheet names or indexes. This is useful when your data is spread across multiple sheets but still related.
import pandas as pd
df = pd.read_excel("SampleWork.xlsx", sheet_name=[0, 1], na_values="Missing")
print(df)
Output
{0: Name Age Stream Percentage
0 Ankit 18 Math 95
1 Rahul 19 Science 90
2 Shaurya 20 Commerce 85
3 Aishwarya 18 Math 80
4 Priyanka 19 Science 75,1: Name Age Stream Percentage
0 Priya 18 Math 95
1 shivangi 19 Science 90
2 Jeet 20 Commerce 85
3 Ananya 18 Math 80
4 Swapnil 19 Science 75}
Explanation: It returns an OrderedDict where, each key = sheet index and each value = dataFrame.
Reading All Sheets Together Using sheet_name=None
When you want to read every sheet in the Excel file at once, simply set sheet_name=None. Pandas will load all sheets and return them in a dictionary like structure, where each sheet becomes a separate DataFrame.
import pandas as pd
df = pd.read_excel("SampleWork.xlsx", sheet_name=None, na_values="Missing")
print(df)
Output
{'Sheet1': Name Age Stream Percentage
0 Ankit 18 Math 95
1 Rahul 19 Science 90
2 Shaurya 20 Commerce 85
3 Aishwarya 18 Math 80
4 Priyanka 19 Science 75,'Sheet2': Name Age Stream Percentage
0 Priya 18 Math 95
1 shivangi 19 Science 90
2 Jeet 20 Commerce 85
3 Ananya 18 Math 80
4 Swapnil 19 Science 75,'Sheet3': Name Age Stream Percentage
0 Priya 18 Math 95
1 shivangi 19 Science 90
2 Jeet 20 NaN 85
3 Ananya 18 Math 80
4 Swapnil 19 Science 75}
Explanation: sheet_name=None loads all sheets