Handling Excel File with Pandas

Question

I'm trying to read an excel file with Pandas.

I'm trying to only read column 2 through column 4.
I'm trying to skip reading the first 9 rows.
Even with, skiprows=8, parse_col=["B:D"],my data stored in df looks the same as the incoming excel file, and it doesn't exclude the first 9 rows or exclude the desired columns.

What is wrong with my syntax and why isn't the database structure stored in df my inputted excel file minus 9 rows and a few columns?

My incoming data below:

Null,Null,Null,Null,Null,Null,Null,Null,Null    
Null,Null,Null,Null,Null,Null,Null,Null,Null    
Null,Null,Null,Null,Null,Null,Null,Null,Null    
Null,Null,Null,Null,Null,Null,Null,Null,Null                
Null,Null,Null,Null,Null,Null,Null,Null,Null                                
Null,Null,Null,Null,Null,Null,Null,Null,Null                    
Null,Null,Null,Null,Null,Null,Null,Null,Null                                
Null,Null,Null,Null,String1,String2,Null,Null,Null  
Null,Phase to Phase Voltage,A - B,210.0,C - A,211.0,B - C,212.0 
Null,Circuit/Breaker,Number,Internal Meter Amps,External Meter Amps,Measured Difference,% Difference,Location Identifier,Total Location Amperage,Comments
Null,Main Phase A,94.1,96.,2.8,3%,Null,Null,Null            
Null,Main Phase B,90.1,92.6,2.5,3%,Null,Null,Null           
Null,Main Phase C,91.9,92.1,0.2,0%,Null,Null,Null       
Null,Neutral,0.0,0.4,0.4,100%,Null,Null,Null            
Null,Ground 0.0,0.1,0.1,100%,Null,Null,Null         
Null,1,10.6,10.2,-0.4,-4%,Null,Null,Null            
Null,2,10.6,10.3,-0.3,-3%,Null,Null,Null                
....

My code is below:

import pandas as pd

df = pd.read_excel('filelocation.xlsx',  sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data'], skiprows=8, parse_col=["B:D"], keep_default_na='FALSE', na_values=['NULL'])

Never heard about this problem. Have you tried to read only one sheet (sheet name) instead of several ? If you try to read several sheets read_excel returns a Dict of DataFrames, with keys representing sheets. I think that reproducing the problem with only one sheet could permit to locate the problem. — Romain
– Romain, Commented Mar 8, 2016 at 5:50
I get the same issues when trying to parse just one sheet. My new line looks like this: df = pd.read_excel('C:/Users/Jerry/Documents/panoptics/panopticsMeeting2.28.16/FDC 1301 Data Collection (upTo48BreakerDevice) - original.xlsx', sheetname=['pnl1 Data '], skiprows=8, parse_col=["B:D"], keep_default_na='FALSE', na_values=['NULL']) — pHorseSpec
– pHorseSpec, Commented Mar 8, 2016 at 6:30

MaxU - stand with Ukraine · Accepted Answer · 2016-03-08 07:15:31Z

1

You've misspelled the parse_cols parameter name, use parse_cols instead of parse_col. Beside that you should either specify a string like "B:D" (or "B,C,D") or a list like ['B','C','D']

Try this:

import pandas as pd

df = pd.read_excel('filelocation.xlsx',
        sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data'],
        skiprows=8, parse_cols="B:D", keep_default_na='FALSE', na_values=['NULL'])

PS Also check this sheet_name: 'pnl1 Data ' for the trailing space

answered Mar 8, 2016 at 7:15

MaxU - stand with Ukraine

212k37 gold badges402 silver badges437 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

pHorseSpec Over a year ago

The sheet name actually has a trailing space in it, so that was put in on purpose. Also, When I change parse_col to parse_cols, I got the following error: ` File "C:\Users\Jerry\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 1817, in _next_line raise StopIteration StopIteration`

pHorseSpec Over a year ago

I'm sorry. Nvm. When I inserted your code, it worked. Some syntax in my code must be wrong.

Collectives™ on Stack Overflow

Handling Excel File with Pandas

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related