Split up CSV column contents into multiple columns

Question

I am trying to tie another piece into my existing Python program. I am new to Python and can't seem to figure it out, even with all of the help out there. I will list my existing Python program below, I would just like to add in the other piece to perform another task.

The current program opens "initial.csv" and looks in the first column for any key words. If it matches one, it writes the line in "listname_rejects.csv" and any that don't match, it writes into "listname.csv". It sounds backwards, but for what I'm doing, it's correct. I've used it a thousand times.

Now, what I would like to add into this, is the ability to look at column 2 (Full of addresses) and split them up into separate columns. For example, this -

Name,Address,Phonenumber,ID
John,"123 Any Street, New York, NY 00010",999-999-9999,321654

Turns into this -

Name,Street,City,State,Zipcode,Phonenumber,ID
John,123 Any Street, New York, NY, 00010,999-999-9999,321654

Basically, I need to be able to explode the second column into separate columns. Rather than having the entire address in column 2, I need to split it up between say column 2, 3, 4, & 5.

I have found things close to this on stack overflow, but again, I'm new to Python and can't figure out how to piece them into my current code.

key_words = [ 
'Suzy', 
'Billy', 
'Cody',
 ]

listname = raw_input ("Enter List Name:")
listname_accept = (listname) + '.csv'
listname_rejects = (listname) + '_rejected.csv'

with open('initial.csv', 'r') as oldfile, open(listname_accept, 'w') as cleaned:
    for line in oldfile:
        if not any(key_word in line.split(",", 1)[0] for key_word in key_words):
            cleaned.write(line)      
        else:
            matched.write(line)

I notice two things: your code doesn't implement CSV. Use the csv module. It is easier and avoids many bugs. Second: parsing addresses is far from trivial if you want to handle addresses correctly. Do some searching to find some of the various crazy ways valid streets and address numbers occur in the US. — dsh
– dsh, Commented Dec 14, 2016 at 22:20
Thank you. I will implement csv in my code. The addresses in my csv file all have the same, common trend, so I figured I could implement something. — codyfraley
– codyfraley, Commented Dec 14, 2016 at 22:23
@CFraley can you always guarantee that address follows this format: "<street>,<city>,<state zip>"? — LMc
– LMc, Commented Dec 14, 2016 at 22:26
In that case, your problem is much much easier. Use csv to read the rows. Then take the second column of each row and split on commas. Write the result using the csv module to the destination file. — dsh
– dsh, Commented Dec 14, 2016 at 22:38

LMc · Accepted Answer · 2016-12-14 23:05:25Z

1

Let me know if this works, I may have mixed up your output csv names, but you can adjust those based on your logic:

import csv

key_words = [ 
'Suzy', 
'Billy', 
'Cody',
 ]

listname = raw_input ("Enter List Name:")
listname_accept = (listname) + '.csv'
listname_rejects = (listname) + '_rejected.csv'

with open('initial.csv') as oldfile, open(listname_accept,'w') as cleaned, open(listname_rejects,'w') as matched:
    accept_writer=csv.writer(cleaned) # create one csv writer object
    reject_writer=csv.writer(matched) # create second csv writer object
    initial_reader=csv.reader(oldfile)
    for c,row in enumerate(initial_reader): # read through input csv
        if c==0:                            # first row is the header
            header=row[:]
            del header[1]       # delete 'address'
            header[1:1]=['Street','City','State','Zipcode'] # insert these column names
            accept_writer.writerow(header)                  # write column names to csv
            reject_writer.writerow(header)                  # write column names to csv
        else:                                               # for all other input rows, except the first
            address_list=[i.strip() for i in row[1].split(',')] # split the address by comma
            all_address=address_list[:-1]+address_list[-1].split() # split the state and zip by space
            del row[1]                                             # delete original string address from row
            row[1:1]=all_address                                   # insert new address
            if row[0] not in key_words:                            # test if name in key_words
                accept_writer.writerow(row)
            else:
                reject_writer.writerow(row)

I've inserted comments to help you understand what's going on.

answered Dec 14, 2016 at 23:05

LMc

19k4 gold badges41 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

codyfraley Over a year ago

It's extremely close. I've messed with it to get it to work and keep breaking it. I see exactly what you're doing... However, in my original code, I was looking at column 0 to see if it INCLUDED any keywords. I believe the way yours is written, is if it IS a keyword. That's why I had the line.split in there. Could you advise on where I might add this in? Everything else appears to be working.

codyfraley Over a year ago

Actually, I figured it out... I changed if row[0] not in key_words: to if not any(key_word in row[0].split(",", 1)[0] for key_word in key_words):

LMc Over a year ago

@CFraley I am a bit confused by this. In this one row example, you are testing if 'John' is in 'Suzy','Billy', or 'Cody', which is False. The the not is flipping the statement to be True and writing to accept_writer. But a more compact way to check for this membership is to write 'John' not in key_words (ie row[0] not in key_words, which is True and yields you the same thing, but is more compact and quicker. Did I miss something?

codyfraley Over a year ago

Sorry, I should have been a little clearer. Obviously, the example csv isn't exactly what I'm doing. I'm actually cleaning a list of business names and need to see if any of them include any of the key_words. You are correct, the current way you wrote it works, for the example I provided. I should have given a better example so you knew what I was trying to fully accomplish. It's 100% working now. I really appreciate your help!

LMc Over a year ago

@CFraley ok, great. Just wanted to make sure you were getting the help you needed.

Angel F · Accepted Answer · 2016-12-15 02:23:34Z

hope that next code helps you: I have placed my own csv files name but you can customize them The main idea is that you can create your csv with columns what you want in your file and split the string correctly

Regards

import csv

to_validate = ["name1", "name2"]

"""
file_to_read.csv has
Name,Address,Phonenumber,ID
John,"123 Any Street, New York, NY 00010",999-999-9999,321654
"""

file_to_read = csv.DictReader(open("file_to_read.csv", 'r'), delimiter=',', quotechar='"')
headers_wrote = False


for row in file_to_read:
    if row["Name"] in to_validate:
        # do some stufs
        pass
    else:
        to_write = {
            "Name": row["Name"],
            "Street": row["Address"].split(",")[0].strip(),
            "City": row["Address"].split(",")[1].strip(),
            "State": row["Address"].split(",")[2].strip().split(" ")[0].strip(),
            "Zipcode": row["Address"].split(",")[2].strip().split(" ")[1].strip(),
            "Phonenumber": row["Phonenumber"],
            "ID": row["ID"]
        }
        with open("example_file.csv", 'w+') as csvfile:
            if not headers_wrote:
                fieldnames = ["Name", "Street", "City", "State", "Zipcode", "Phonenumber", "ID"]
                writer = csv.DictWriter(csvfile, fieldnames = fieldnames, delimiter = ",")
                writer.writeheader()
                writer.writerow(to_write)
                headers_wrote = True
            else:
                writer = csv.DictWriter(csvfile, fieldnames = fieldnames, delimiter = ",")
                writer.writerow(to_write)

quapka · Accepted Answer · 2016-12-15 20:33:32Z

0

althought the question is already answered I feel like you should broaden your knowledge with pandas module. I've implemented only the part with splitting the Address row. If you want, I can show you also the rest. pandas can be sometimes not straightforward, but once you get used to it, it is the easiest way to go in many of csv handling problems (not to mention other great features: working with databases etc.). The code is visible on my github page. Have a look!

answered Dec 15, 2016 at 20:33

quapka

2,9495 gold badges23 silver badges40 bronze badges

Collectives™ on Stack Overflow

Split up CSV column contents into multiple columns

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related