0

I'm trying to read an excel file with Pandas.

  1. I'm trying to only read column 2 through column 4.

  2. I'm trying to skip reading the first 9 rows.

  3. Even with, skiprows=8, parse_col=["B:D"],my data stored in df looks the same as the incoming excel file, and it doesn't exclude the first 9 rows or exclude the desired columns.

What is wrong with my syntax and why isn't the database structure stored in df my inputted excel file minus 9 rows and a few columns?

My incoming data below:

Null,Null,Null,Null,Null,Null,Null,Null,Null    
Null,Null,Null,Null,Null,Null,Null,Null,Null    
Null,Null,Null,Null,Null,Null,Null,Null,Null    
Null,Null,Null,Null,Null,Null,Null,Null,Null                
Null,Null,Null,Null,Null,Null,Null,Null,Null                                
Null,Null,Null,Null,Null,Null,Null,Null,Null                    
Null,Null,Null,Null,Null,Null,Null,Null,Null                                
Null,Null,Null,Null,String1,String2,Null,Null,Null  
Null,Phase to Phase Voltage,A - B,210.0,C - A,211.0,B - C,212.0 
Null,Circuit/Breaker,Number,Internal Meter Amps,External Meter Amps,Measured Difference,% Difference,Location Identifier,Total Location Amperage,Comments
Null,Main Phase A,94.1,96.,2.8,3%,Null,Null,Null            
Null,Main Phase B,90.1,92.6,2.5,3%,Null,Null,Null           
Null,Main Phase C,91.9,92.1,0.2,0%,Null,Null,Null       
Null,Neutral,0.0,0.4,0.4,100%,Null,Null,Null            
Null,Ground 0.0,0.1,0.1,100%,Null,Null,Null         
Null,1,10.6,10.2,-0.4,-4%,Null,Null,Null            
Null,2,10.6,10.3,-0.3,-3%,Null,Null,Null                
....

My code is below:

import pandas as pd

df = pd.read_excel('filelocation.xlsx',  sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data'], skiprows=8, parse_col=["B:D"], keep_default_na='FALSE', na_values=['NULL'])
2
  • Never heard about this problem. Have you tried to read only one sheet (sheet name) instead of several ? If you try to read several sheets read_excel returns a Dict of DataFrames, with keys representing sheets. I think that reproducing the problem with only one sheet could permit to locate the problem. Commented Mar 8, 2016 at 5:50
  • I get the same issues when trying to parse just one sheet. My new line looks like this: df = pd.read_excel('C:/Users/Jerry/Documents/panoptics/panopticsMeeting2.28.16/FDC 1301 Data Collection (upTo48BreakerDevice) - original.xlsx', sheetname=['pnl1 Data '], skiprows=8, parse_col=["B:D"], keep_default_na='FALSE', na_values=['NULL']) Commented Mar 8, 2016 at 6:30

1 Answer 1

1

You've misspelled the parse_cols parameter name, use parse_cols instead of parse_col. Beside that you should either specify a string like "B:D" (or "B,C,D") or a list like ['B','C','D']

Try this:

import pandas as pd

df = pd.read_excel('filelocation.xlsx',
        sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data'],
        skiprows=8, parse_cols="B:D", keep_default_na='FALSE', na_values=['NULL'])

PS Also check this sheet_name: 'pnl1 Data ' for the trailing space

Sign up to request clarification or add additional context in comments.

2 Comments

The sheet name actually has a trailing space in it, so that was put in on purpose. Also, When I change parse_col to parse_cols, I got the following error: ` File "C:\Users\Jerry\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 1817, in _next_line raise StopIteration StopIteration`
I'm sorry. Nvm. When I inserted your code, it worked. Some syntax in my code must be wrong.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.