How to remove empty rows in csv using pandas?

Question

I got csv dataset like this and i need to remove any empty rows inside of it i already tried following code but still it fails when it reads empty row, and return

pandas.errors.ParserError: Error tokenizing data. C error: Expected 7 fields in line 11, saw 8

    def feed_db():
        try:
            employees = pd.read_csv(
                'employee.csv', delimiter=',',  sep='\t', encoding="utf-8", header=None, skipinitialspace=True, skip_blank_lines=True)
            employees.columns = [c.strip().lower().replace(' ', '_')
                                 for c in employees.columns]
            employees.to_sql('employees', conn, if_exists='replace', index=False)
    
        except Error as exc:
            raise Error('Database initialization failed', exc)

sample dataset

Employee Id, Full Name, Gender, Date of Birth, Joined Date, Salary (USD), Branch
EN_0001, Aditi Musunur, Male, 1990-03-24, 2011-07-05, 1500, Sri Lanka
EN_0002, Advitiya Sujeet, Male, 1986-07-28, 2010-03-24, 1600, Sri Lanka
EN_0003, Alagesan Poduri, Male, 1982-05-25, 2016-06-24, 1800, Sri Lanka
EN_0004, Amrish Ilyas, Female, 1987-10-24, 2013-12-17, 2000, India
EN_0005, Aprativirya Seshan, Female, 1981-12-16, 2012-03-14, 1750, India
EN_0006, Asvathama Ponnada, Male, 1986-01-09, 2014-06-18, 2300, Pakistan
EN_0007, Avantas Ghosal, Female, 1981-10-05, 2016-08-26, 4200, Pakistan
EN_0008, Avidosa Vaisakhi, Male, 1980-08-09, 2018-03-05, 3100, Bangladesh
EN_0009, Barsati Sandipa, Male, 1988-04-09, 2011-05-03, 2925, Bangladesh
EN_0010, Debasis Sundhararajan, Female, 1990-03-26, 2015-05-18, , 2800, Bangladesh
EN_0011, Debas Sundhar, Female, 1990-03-26, 2015-05-18, 2800, Bangladesh

how can i remove those empty rows using pandas ?

There is a problem in your CSV, indeed line 11 has 8 fields and not 7 (the 6th is empty), correct your CSV :) — MarcoP
– MarcoP, Commented Mar 10, 2021 at 13:54
it is not the scope of Pandas. You have to fix your CSV first in other ways. — MarcoP
– MarcoP, Commented Mar 10, 2021 at 13:57
:-P if the dataset contains 10000 of records, then how should this be done ? — pl-jay
– pl-jay, Commented Mar 10, 2021 at 14:00

RJ Adriaansen · Accepted Answer · 2021-03-10 14:14:28Z

As mentioned, there is a redundant empty cell in line 11. You can open and fix the csv, write it to StringIO, then open it with pd.read_csv:

import pandas as pd
import io
s = io.StringIO()

with open('employee.csv') as file:
    for line in file:
        s.write(str(line).replace(", ,", ","))
s.seek(0)

df = pd.read_csv(s)

result:

	Employee Id	Full Name	Gender	Date of Birth	Joined Date	Salary (USD)	Branch
0	EN_0001	Aditi Musunur	Male	1990-03-24	2011-07-05	1500	Sri Lanka
1	EN_0002	Advitiya Sujeet	Male	1986-07-28	2010-03-24	1600	Sri Lanka
2	EN_0003	Alagesan Poduri	Male	1982-05-25	2016-06-24	1800	Sri Lanka
3	EN_0004	Amrish Ilyas	Female	1987-10-24	2013-12-17	2000	India
4	EN_0005	Aprativirya Seshan	Female	1981-12-16	2012-03-14	1750	India
5	EN_0006	Asvathama Ponnada	Male	1986-01-09	2014-06-18	2300	Pakistan
6	EN_0007	Avantas Ghosal	Female	1981-10-05	2016-08-26	4200	Pakistan
7	EN_0008	Avidosa Vaisakhi	Male	1980-08-09	2018-03-05	3100	Bangladesh
8	EN_0009	Barsati Sandipa	Male	1988-04-09	2011-05-03	2925	Bangladesh
9	EN_0010	Debasis Sundhararajan	Female	1990-03-26	2015-05-18	2800	Bangladesh
10	EN_0011	Debas Sundhar	Female	1990-03-26	2015-05-18	2800	Bangladesh

lennertcl · Accepted Answer · 2021-03-10 13:53:39Z

0

Your csv file has an extra comma on this line:

EN_0010, Debasis Sundhararajan, Female, 1990-03-26, 2015-05-18, , 2800, Bangladesh

answered Mar 10, 2021 at 13:53

lennertcl

1037 bronze badges

Collectives™ on Stack Overflow

How to remove empty rows in csv using pandas?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related