How to read Excel data by column name in python using xlrd

Question

I am trying to read the data of large excel file(almost 100000 row). I am using 'xlrd Module' in python to fetch the data from excel. I want to fetch data by column name(Cascade,Schedule Name,Market) instead of column number(0,1,2). Because my excel columns are not fixed. i know how to fetch data in case of fixed column.

here is the code by which i am fetching data from the excel for fixed column

import xlrd

file_location =r"C:\Users\Desktop\Vision.xlsx"
workbook=xlrd.open_workbook(file_location)
sheet= workbook.sheet_by_index(0)
print(sheet.ncols,sheet.nrows,sheet.name,sheet.number)

for i in range(sheet.nrows):
   flag = 0
   for j in range(sheet.ncols):
      value=sheet.cell(i,j).value

If anyone has any solution of this, kindly let me know

Thanks

Edit your Question and give an examples of "by column name instead of column number" — stovfl
– stovfl, Commented Nov 14, 2018 at 21:07

Xukrao · Accepted Answer · 2018-11-14 22:21:18Z

4

Alternatively you could also make use of pandas, which is a comprehensive data analysis library with built-in excel I/O capabilities.

import pandas as pd

file_location =r"C:\Users\esatnir\Desktop\Sprint Vision.xlsx"

# Read out first sheet of excel file and return as pandas dataframe
df = pd.read_excel(file_location)

# Reduce dataframe to target columns (by filtering on column names)
df = df[['Cascade', 'Schedule Name', 'Market']]

where a quick view of the resulting dataframe df will show:

In [1]: df
Out[1]:
   Cascade     Schedule Name                Market
0  SF05UB0  DO Macro Upgrade  Upper Central Valley
1  DE03HO0  DO Macro Upgrade                Toledo
2  SF73XC4  DO Macro Upgrade                SF Bay

answered Nov 14, 2018 at 22:21

Xukrao

8,6745 gold badges29 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Sat.N Over a year ago

Thank Xukrao for your answer but i do not know how to perform operation over excel data using pandas. so i am unable to use it.

alexis · Accepted Answer · 2018-11-15 21:51:02Z

2

Your column names are in the first row of the spreadsheet, right? So read the first row and construct a mapping from names to column indices.

column_pos = [ (sheet.cell(0, i).value, i) for i in range(sheet.ncols) ]
colidx = dict(column_pos)

Or as a one-liner:

colidx = dict( (sheet.cell(0, i).value, i) for i in range(sheet.ncols) )

You can then use the index to interpret column names, for example:

print(sheet.cell(5, colidx["Schedule Name"]).value)

To get an entire column, you can use a list comprehension:

schedule = [ sheet.cell(i, colidx["Schedule Name"]).value for i in range(1, sheet.nrows) ]

If you really wanted to, you could create a wrapper for the cell function that handles the interpretation. But I think this is simple enough.

edited Nov 15, 2018 at 21:51

answered Nov 14, 2018 at 22:23

alexis

50.4k18 gold badges108 silver badges173 bronze badges

2 Comments

Sat.N Over a year ago

thanks Alexis for your answer. i want to fetch complete data of 'schedule name' instead of individual value. can you show me how ?

alexis Over a year ago

Done. (I assume row 0 contains the column names, so it's not included in the column values.)

stovfl · Accepted Answer · 2018-11-17 17:00:06Z

1

Comment: still not working when header of
fieldnames = ['Cascade', 'Market', 'Schedule', 'Name] and
Sheet(['Cascade', 'Schedule', 'Name', 'Market']) are equal.

Keep order of fieldnames in col_idx, was not my initial goal.

Question: I want to fetch data by column name

The following OOP solution will work:

class OrderedByName():
    """
    Privides a generator method, to iterate in Column Name ordered sequence
    Provides subscription, to get columns index by name. using class[name]
    """
    def __init__(self, sheet, fieldnames, row=0):
        """
        Create a OrderedDict {name:index} from 'fieldnames'
        :param sheet: The Worksheet to use
        :param fieldnames: Ordered List of Column Names
        :param row: Default Row Index for the Header Row
        """
        from collections import OrderedDict
        self.columns = OrderedDict().fromkeys(fieldnames, None)
        for n in range(sheet.ncols):
            self.columns[sheet.cell(row, n).value] = n

    @property
    def ncols(self):
        """
        Generator, equal usage as range(xlrd.ncols), 
          to iterate columns in ordered sequence
        :return: yield Column index
        """
        for idx in self.columns.values():
            yield idx

    def __getitem__(self, item):
        """
        Make class object subscriptable
        :param item: Column Name
        :return: Columns index
        """
        return self.columns[item]

Usage:

# Worksheet Data
sheet([['Schedule', 'Cascade', 'Market'],
       ['SF05UB0', 'DO Macro Upgrade', 'Upper Cnetral Valley'],
       ['DE03HO0', 'DO Macro Upgrade', 'Toledo'],
       ['SF73XC4', 'DO Macro Upgrade', 'SF Bay']]
      )

# Instantiate with Ordered List of Column Names
# NOTE the different Order of Column Names
by_name = OrderedByName(sheet, ['Cascade', 'Market', 'Schedule'])

# Iterate all Rows and all Columns Ordered as instantiated
for row in range(sheet.nrows):
    for col in by_name.ncols:
        value = sheet.cell(row, col).value
        print("cell({}).value == {}".format((row,col), value))

Output:

cell((0, 1)).value == Cascade
cell((0, 2)).value == Market
cell((0, 0)).value == Schedule
cell((1, 1)).value == DO Macro Upgrade
cell((1, 2)).value == Upper Cnetral Valley
cell((1, 0)).value == SF05UB0
cell((2, 1)).value == DO Macro Upgrade
cell((2, 2)).value == Toledo
cell((2, 0)).value == DE03HO0
cell((3, 1)).value == DO Macro Upgrade
cell((3, 2)).value == SF Bay
cell((3, 0)).value == SF73XC4

Get Index of one Column by Name

print("cell{}.value == {}".format((1, by_name['Schedule']),
                                    sheet.cell(1, by_name['Schedule']).value))
#>>> cell(1, 0).value == SF05UB0

Tested with Python: 3.5

edited Nov 17, 2018 at 17:00

answered Nov 14, 2018 at 22:10

stovfl

15.6k7 gold badges26 silver badges54 bronze badges

4 Comments

Sat.N Over a year ago

thanks stovfl for your answer but you are only printing the column index in but i want to print all the data of corresponding of column name. can you show me how i should use these column index for fetch correspondin row data of it

stovfl Over a year ago

@George.S: From your Question: " i know how to fetch data in case of fixed column.". Edit you Question and show a not fixed data table and how you do this using the col_idx list.

Sat.N Over a year ago

Actually " i know how to fetch data in case of fixed column." but problem is that in your code if i changed column header of my excel. it is print the column index in same order. so tell me how would my code knows that which header is residing in which column. i think i have cleared my point.

Sat.N Over a year ago

Thanks for update but it is still not working when header of fieldnames['Cascade', 'Market', 'Schedule', 'Name] and Sheet(['Cascade', 'Schedule', 'Name', 'Market']) are equal. it is not showing the exact position of column header.

jupiterbjy · Accepted Answer · 2020-07-04 15:08:18Z

0

You can make use of pandas. Below is the sample code for identifying the columns and rows in an excel sheet.

import pandas as pd

file_location =r"Your_Excel_Path"

# Read out first sheet of excel file and return as pandas dataframe
df = pd.read_excel(file_location)


total_rows=len(df.axes[0])
total_cols=len(df.axes[1])

# Print total number of rows in an excel sheet
print("Number of Rows: "+str(total_rows))

# Print total number of columns in an excel sheet
print("Number of Columns: "+str(total_cols))

# Print column names in an excel sheet
print(df.columns.ravel())

Now once you have the column data, you can convert it into a list of values.

edited Jul 4, 2020 at 15:08

jupiterbjy

3,7703 gold badges20 silver badges37 bronze badges

answered Jul 4, 2020 at 13:16

Shreyash Kulkarni

1

1 Comment

jupiterbjy Over a year ago

As op commented on @Xukrao's answer that also utilize pandas, op don't know how to use pandas.

Collectives™ on Stack Overflow

How to read Excel data by column name in python using xlrd

4 Answers 4

1 Comment

2 Comments

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

2 Comments

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related