2

I am trying to read the data of large excel file(almost 100000 row). I am using 'xlrd Module' in python to fetch the data from excel. I want to fetch data by column name(Cascade,Schedule Name,Market) instead of column number(0,1,2). Because my excel columns are not fixed. i know how to fetch data in case of fixed column.

enter image description here

here is the code by which i am fetching data from the excel for fixed column

import xlrd

file_location =r"C:\Users\Desktop\Vision.xlsx"
workbook=xlrd.open_workbook(file_location)
sheet= workbook.sheet_by_index(0)
print(sheet.ncols,sheet.nrows,sheet.name,sheet.number)

for i in range(sheet.nrows):
   flag = 0
   for j in range(sheet.ncols):
      value=sheet.cell(i,j).value

If anyone has any solution of this, kindly let me know

Thanks

3
  • Edit your Question and give an examples of "by column name instead of column number" Commented Nov 14, 2018 at 21:07
  • i have made changes in my question. Commented Nov 14, 2018 at 21:21
  • can you show me how ? Commented Nov 14, 2018 at 21:31

4 Answers 4

4

Alternatively you could also make use of pandas, which is a comprehensive data analysis library with built-in excel I/O capabilities.

import pandas as pd

file_location =r"C:\Users\esatnir\Desktop\Sprint Vision.xlsx"

# Read out first sheet of excel file and return as pandas dataframe
df = pd.read_excel(file_location)

# Reduce dataframe to target columns (by filtering on column names)
df = df[['Cascade', 'Schedule Name', 'Market']]

where a quick view of the resulting dataframe df will show:

In [1]: df
Out[1]:
   Cascade     Schedule Name                Market
0  SF05UB0  DO Macro Upgrade  Upper Central Valley
1  DE03HO0  DO Macro Upgrade                Toledo
2  SF73XC4  DO Macro Upgrade                SF Bay
Sign up to request clarification or add additional context in comments.

1 Comment

Thank Xukrao for your answer but i do not know how to perform operation over excel data using pandas. so i am unable to use it.
2

Your column names are in the first row of the spreadsheet, right? So read the first row and construct a mapping from names to column indices.

column_pos = [ (sheet.cell(0, i).value, i) for i in range(sheet.ncols) ]
colidx = dict(column_pos)

Or as a one-liner:

colidx = dict( (sheet.cell(0, i).value, i) for i in range(sheet.ncols) )

You can then use the index to interpret column names, for example:

print(sheet.cell(5, colidx["Schedule Name"]).value)

To get an entire column, you can use a list comprehension:

schedule = [ sheet.cell(i, colidx["Schedule Name"]).value for i in range(1, sheet.nrows) ]

If you really wanted to, you could create a wrapper for the cell function that handles the interpretation. But I think this is simple enough.

2 Comments

thanks Alexis for your answer. i want to fetch complete data of 'schedule name' instead of individual value. can you show me how ?
Done. (I assume row 0 contains the column names, so it's not included in the column values.)
1

Comment: still not working when header of
fieldnames = ['Cascade', 'Market', 'Schedule', 'Name] and
Sheet(['Cascade', 'Schedule', 'Name', 'Market']) are equal.

Keep order of fieldnames in col_idx, was not my initial goal.


Question: I want to fetch data by column name

The following OOP solution will work:

class OrderedByName():
    """
    Privides a generator method, to iterate in Column Name ordered sequence
    Provides subscription, to get columns index by name. using class[name]
    """
    def __init__(self, sheet, fieldnames, row=0):
        """
        Create a OrderedDict {name:index} from 'fieldnames'
        :param sheet: The Worksheet to use
        :param fieldnames: Ordered List of Column Names
        :param row: Default Row Index for the Header Row
        """
        from collections import OrderedDict
        self.columns = OrderedDict().fromkeys(fieldnames, None)
        for n in range(sheet.ncols):
            self.columns[sheet.cell(row, n).value] = n

    @property
    def ncols(self):
        """
        Generator, equal usage as range(xlrd.ncols), 
          to iterate columns in ordered sequence
        :return: yield Column index
        """
        for idx in self.columns.values():
            yield idx

    def __getitem__(self, item):
        """
        Make class object subscriptable
        :param item: Column Name
        :return: Columns index
        """
        return self.columns[item]

Usage:

# Worksheet Data
sheet([['Schedule', 'Cascade', 'Market'],
       ['SF05UB0', 'DO Macro Upgrade', 'Upper Cnetral Valley'],
       ['DE03HO0', 'DO Macro Upgrade', 'Toledo'],
       ['SF73XC4', 'DO Macro Upgrade', 'SF Bay']]
      )

# Instantiate with Ordered List of Column Names
# NOTE the different Order of Column Names
by_name = OrderedByName(sheet, ['Cascade', 'Market', 'Schedule'])

# Iterate all Rows and all Columns Ordered as instantiated
for row in range(sheet.nrows):
    for col in by_name.ncols:
        value = sheet.cell(row, col).value
        print("cell({}).value == {}".format((row,col), value))

Output:

cell((0, 1)).value == Cascade
cell((0, 2)).value == Market
cell((0, 0)).value == Schedule
cell((1, 1)).value == DO Macro Upgrade
cell((1, 2)).value == Upper Cnetral Valley
cell((1, 0)).value == SF05UB0
cell((2, 1)).value == DO Macro Upgrade
cell((2, 2)).value == Toledo
cell((2, 0)).value == DE03HO0
cell((3, 1)).value == DO Macro Upgrade
cell((3, 2)).value == SF Bay
cell((3, 0)).value == SF73XC4

Get Index of one Column by Name

print("cell{}.value == {}".format((1, by_name['Schedule']),
                                    sheet.cell(1, by_name['Schedule']).value))
#>>> cell(1, 0).value == SF05UB0

Tested with Python: 3.5

4 Comments

thanks stovfl for your answer but you are only printing the column index in but i want to print all the data of corresponding of column name. can you show me how i should use these column index for fetch correspondin row data of it
@George.S: From your Question: " i know how to fetch data in case of fixed column.". Edit you Question and show a not fixed data table and how you do this using the col_idx list.
Actually " i know how to fetch data in case of fixed column." but problem is that in your code if i changed column header of my excel. it is print the column index in same order. so tell me how would my code knows that which header is residing in which column. i think i have cleared my point.
Thanks for update but it is still not working when header of fieldnames['Cascade', 'Market', 'Schedule', 'Name] and Sheet(['Cascade', 'Schedule', 'Name', 'Market']) are equal. it is not showing the exact position of column header.
0

You can make use of pandas. Below is the sample code for identifying the columns and rows in an excel sheet.

import pandas as pd

file_location =r"Your_Excel_Path"

# Read out first sheet of excel file and return as pandas dataframe
df = pd.read_excel(file_location)


total_rows=len(df.axes[0])
total_cols=len(df.axes[1])

# Print total number of rows in an excel sheet
print("Number of Rows: "+str(total_rows))

# Print total number of columns in an excel sheet
print("Number of Columns: "+str(total_cols))

# Print column names in an excel sheet
print(df.columns.ravel())

Now once you have the column data, you can convert it into a list of values.

1 Comment

As op commented on @Xukrao's answer that also utilize pandas, op don't know how to use pandas.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.