1

I'm trying to remove every row in multiple .csv files which contains empty cell. For example:

Data 1, Data 2, Data 3, Data 4
Value 1, Value 2, Value 3, Value 4
<empty cell>, Value 2, Value 3, Value 4      #Trying to remove this whole row
<empty cell>, Value 2, Value 3, Value 4      #Trying to remove this whole row
Value 1, Value 2, Value 3, Value 4

This is what I got so far:

import os
import csv
import argparse

ap = argparse.ArgumentParser()
ap.add_argument("-i", "--input", required=True)
ap.add_argument("-o", "--output", required=True)
args = vars(ap.parse_args())

for file in os.listdir(args["input"]):
    if file.endswith(".csv"):
        with open(os.path.join(args["input"],file), 'r') as infile, open(os.path.join(args["output"], file), 'w') as outfile:
            csv_reader = csv.reader(infile)
            for line in csv_reader:                                                 ///This is where I get stuck
                    with open(os.path.join(args["output"], file), 'a') as outfile:  

        outfile.close()

any ideas? Thanks

2
  • You can read your data into a pandas.DataFrame and then use pandas.DataFrame.dropna() Commented Dec 19, 2019 at 9:20
  • tldr, so one hint first: don't do a second with block in your for loop. Rather write the line into the outputfile, but depending on a check, i.e. within an if conditional. E g. check against line starts with ',' or whatever you want... Commented Dec 19, 2019 at 9:22

3 Answers 3

2

Empty cells are represented as empty strings by the csv reader. Empty strings have a boolean value of False in Python, so you can use the built-in function all to test whether the row contains any empty cells and so whether it should be included in the output.

for file in os.listdir(args["input"]):
    if file.endswith(".csv"):
        with open(os.path.join(args["input"],file), 'r') as infile, open(os.path.join(args["output"], file), 'w') as outfile:
            csv_reader = csv.reader(infile)
            csv_writer = csv.writer(outfile)
            for line in csv_reader:
                if all(line):
                    csv_writer.writerow(line)
Sign up to request clarification or add additional context in comments.

1 Comment

Just a typo writer.writerow(line) ---> csv_writer.writerow(line) . But it works!
2

You can just kill any rows with any empty cell directly upon reading:

df = pd.read_csv(myfile, sep=',').dropna()

Comments

1

You can use the python library pandas to manipulate your CSV as a dataframe

input file 'test_file.csv':

     A  B  C   D
0  1.0  3  6   9
1  NaN  4  7  10
2  2.0  5  8  11

Then :

import pandas as pd
f = 'test_file.csv'
df = pd.read_csv(f, sep=";")

vector_not_null = df['A'].notnull()
df_not_null = df[vector_not_null]


df_not_null.to_csv ('test_file_without_null_rows.csv', index = None, header=True, sep=';', encoding='utf-8-sig')

output file 'test_file_without_null_rows.csv':

     A  B  C   D
0  1.0  3  6   9
1  2.0  5  8  11

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.