0

I got csv dataset like this and i need to remove any empty rows inside of it i already tried following code but still it fails when it reads empty row, and return

pandas.errors.ParserError: Error tokenizing data. C error: Expected 7 fields in line 11, saw 8

    def feed_db():
        try:
            employees = pd.read_csv(
                'employee.csv', delimiter=',',  sep='\t', encoding="utf-8", header=None, skipinitialspace=True, skip_blank_lines=True)
            employees.columns = [c.strip().lower().replace(' ', '_')
                                 for c in employees.columns]
            employees.to_sql('employees', conn, if_exists='replace', index=False)
    
        except Error as exc:
            raise Error('Database initialization failed', exc)

sample dataset

Employee Id, Full Name, Gender, Date of Birth, Joined Date, Salary (USD), Branch
EN_0001, Aditi Musunur, Male, 1990-03-24, 2011-07-05, 1500, Sri Lanka
EN_0002, Advitiya Sujeet, Male, 1986-07-28, 2010-03-24, 1600, Sri Lanka
EN_0003, Alagesan Poduri, Male, 1982-05-25, 2016-06-24, 1800, Sri Lanka
EN_0004, Amrish Ilyas, Female, 1987-10-24, 2013-12-17, 2000, India
EN_0005, Aprativirya Seshan, Female, 1981-12-16, 2012-03-14, 1750, India
EN_0006, Asvathama Ponnada, Male, 1986-01-09, 2014-06-18, 2300, Pakistan
EN_0007, Avantas Ghosal, Female, 1981-10-05, 2016-08-26, 4200, Pakistan
EN_0008, Avidosa Vaisakhi, Male, 1980-08-09, 2018-03-05, 3100, Bangladesh
EN_0009, Barsati Sandipa, Male, 1988-04-09, 2011-05-03, 2925, Bangladesh
EN_0010, Debasis Sundhararajan, Female, 1990-03-26, 2015-05-18, , 2800, Bangladesh
EN_0011, Debas Sundhar, Female, 1990-03-26, 2015-05-18, 2800, Bangladesh

how can i remove those empty rows using pandas ?

6
  • could you post line 11 of the CSV please? Commented Mar 10, 2021 at 13:44
  • There is a problem in your CSV, indeed line 11 has 8 fields and not 7 (the 6th is empty), correct your CSV :) Commented Mar 10, 2021 at 13:54
  • so, is it possible to remove it, through code ? Commented Mar 10, 2021 at 13:55
  • it is not the scope of Pandas. You have to fix your CSV first in other ways. Commented Mar 10, 2021 at 13:57
  • :-P if the dataset contains 10000 of records, then how should this be done ? Commented Mar 10, 2021 at 14:00

2 Answers 2

1

As mentioned, there is a redundant empty cell in line 11. You can open and fix the csv, write it to StringIO, then open it with pd.read_csv:

import pandas as pd
import io
s = io.StringIO()

with open('employee.csv') as file:
    for line in file:
        s.write(str(line).replace(", ,", ","))
s.seek(0)

df = pd.read_csv(s)

result:

Employee Id Full Name Gender Date of Birth Joined Date Salary (USD) Branch
0 EN_0001 Aditi Musunur Male 1990-03-24 2011-07-05 1500 Sri Lanka
1 EN_0002 Advitiya Sujeet Male 1986-07-28 2010-03-24 1600 Sri Lanka
2 EN_0003 Alagesan Poduri Male 1982-05-25 2016-06-24 1800 Sri Lanka
3 EN_0004 Amrish Ilyas Female 1987-10-24 2013-12-17 2000 India
4 EN_0005 Aprativirya Seshan Female 1981-12-16 2012-03-14 1750 India
5 EN_0006 Asvathama Ponnada Male 1986-01-09 2014-06-18 2300 Pakistan
6 EN_0007 Avantas Ghosal Female 1981-10-05 2016-08-26 4200 Pakistan
7 EN_0008 Avidosa Vaisakhi Male 1980-08-09 2018-03-05 3100 Bangladesh
8 EN_0009 Barsati Sandipa Male 1988-04-09 2011-05-03 2925 Bangladesh
9 EN_0010 Debasis Sundhararajan Female 1990-03-26 2015-05-18 2800 Bangladesh
10 EN_0011 Debas Sundhar Female 1990-03-26 2015-05-18 2800 Bangladesh
Sign up to request clarification or add additional context in comments.

Comments

0

Your csv file has an extra comma on this line:

EN_0010, Debasis Sundhararajan, Female, 1990-03-26, 2015-05-18, , 2800, Bangladesh

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.