2

I am trying to tie another piece into my existing Python program. I am new to Python and can't seem to figure it out, even with all of the help out there. I will list my existing Python program below, I would just like to add in the other piece to perform another task.

The current program opens "initial.csv" and looks in the first column for any key words. If it matches one, it writes the line in "listname_rejects.csv" and any that don't match, it writes into "listname.csv". It sounds backwards, but for what I'm doing, it's correct. I've used it a thousand times.

Now, what I would like to add into this, is the ability to look at column 2 (Full of addresses) and split them up into separate columns. For example, this -

Name,Address,Phonenumber,ID
John,"123 Any Street, New York, NY 00010",999-999-9999,321654

Turns into this -

Name,Street,City,State,Zipcode,Phonenumber,ID
John,123 Any Street, New York, NY, 00010,999-999-9999,321654

Basically, I need to be able to explode the second column into separate columns. Rather than having the entire address in column 2, I need to split it up between say column 2, 3, 4, & 5.

I have found things close to this on stack overflow, but again, I'm new to Python and can't figure out how to piece them into my current code.

key_words = [ 
'Suzy', 
'Billy', 
'Cody',
 ]

listname = raw_input ("Enter List Name:")
listname_accept = (listname) + '.csv'
listname_rejects = (listname) + '_rejected.csv'

with open('initial.csv', 'r') as oldfile, open(listname_accept, 'w') as cleaned:
    for line in oldfile:
        if not any(key_word in line.split(",", 1)[0] for key_word in key_words):
            cleaned.write(line)      
        else:
            matched.write(line)
6
  • I notice two things: your code doesn't implement CSV. Use the csv module. It is easier and avoids many bugs. Second: parsing addresses is far from trivial if you want to handle addresses correctly. Do some searching to find some of the various crazy ways valid streets and address numbers occur in the US. Commented Dec 14, 2016 at 22:20
  • Thank you. I will implement csv in my code. The addresses in my csv file all have the same, common trend, so I figured I could implement something. Commented Dec 14, 2016 at 22:23
  • @CFraley can you always guarantee that address follows this format: "<street>,<city>,<state zip>"? Commented Dec 14, 2016 at 22:26
  • @LMc Yes, every single row, every single time. Commented Dec 14, 2016 at 22:27
  • In that case, your problem is much much easier. Use csv to read the rows. Then take the second column of each row and split on commas. Write the result using the csv module to the destination file. Commented Dec 14, 2016 at 22:38

3 Answers 3

1

Let me know if this works, I may have mixed up your output csv names, but you can adjust those based on your logic:

import csv

key_words = [ 
'Suzy', 
'Billy', 
'Cody',
 ]

listname = raw_input ("Enter List Name:")
listname_accept = (listname) + '.csv'
listname_rejects = (listname) + '_rejected.csv'

with open('initial.csv') as oldfile, open(listname_accept,'w') as cleaned, open(listname_rejects,'w') as matched:
    accept_writer=csv.writer(cleaned) # create one csv writer object
    reject_writer=csv.writer(matched) # create second csv writer object
    initial_reader=csv.reader(oldfile)
    for c,row in enumerate(initial_reader): # read through input csv
        if c==0:                            # first row is the header
            header=row[:]
            del header[1]       # delete 'address'
            header[1:1]=['Street','City','State','Zipcode'] # insert these column names
            accept_writer.writerow(header)                  # write column names to csv
            reject_writer.writerow(header)                  # write column names to csv
        else:                                               # for all other input rows, except the first
            address_list=[i.strip() for i in row[1].split(',')] # split the address by comma
            all_address=address_list[:-1]+address_list[-1].split() # split the state and zip by space
            del row[1]                                             # delete original string address from row
            row[1:1]=all_address                                   # insert new address
            if row[0] not in key_words:                            # test if name in key_words
                accept_writer.writerow(row)
            else:
                reject_writer.writerow(row)

I've inserted comments to help you understand what's going on.

Sign up to request clarification or add additional context in comments.

5 Comments

It's extremely close. I've messed with it to get it to work and keep breaking it. I see exactly what you're doing... However, in my original code, I was looking at column 0 to see if it INCLUDED any keywords. I believe the way yours is written, is if it IS a keyword. That's why I had the line.split in there. Could you advise on where I might add this in? Everything else appears to be working.
Actually, I figured it out... I changed if row[0] not in key_words: to if not any(key_word in row[0].split(",", 1)[0] for key_word in key_words):
@CFraley I am a bit confused by this. In this one row example, you are testing if 'John' is in 'Suzy','Billy', or 'Cody', which is False. The the not is flipping the statement to be True and writing to accept_writer. But a more compact way to check for this membership is to write 'John' not in key_words (ie row[0] not in key_words, which is True and yields you the same thing, but is more compact and quicker. Did I miss something?
Sorry, I should have been a little clearer. Obviously, the example csv isn't exactly what I'm doing. I'm actually cleaning a list of business names and need to see if any of them include any of the key_words. You are correct, the current way you wrote it works, for the example I provided. I should have given a better example so you knew what I was trying to fully accomplish. It's 100% working now. I really appreciate your help!
@CFraley ok, great. Just wanted to make sure you were getting the help you needed.
0

hope that next code helps you: I have placed my own csv files name but you can customize them The main idea is that you can create your csv with columns what you want in your file and split the string correctly

Regards

import csv

to_validate = ["name1", "name2"]

"""
file_to_read.csv has
Name,Address,Phonenumber,ID
John,"123 Any Street, New York, NY 00010",999-999-9999,321654
"""

file_to_read = csv.DictReader(open("file_to_read.csv", 'r'), delimiter=',', quotechar='"')
headers_wrote = False


for row in file_to_read:
    if row["Name"] in to_validate:
        # do some stufs
        pass
    else:
        to_write = {
            "Name": row["Name"],
            "Street": row["Address"].split(",")[0].strip(),
            "City": row["Address"].split(",")[1].strip(),
            "State": row["Address"].split(",")[2].strip().split(" ")[0].strip(),
            "Zipcode": row["Address"].split(",")[2].strip().split(" ")[1].strip(),
            "Phonenumber": row["Phonenumber"],
            "ID": row["ID"]
        }
        with open("example_file.csv", 'w+') as csvfile:
            if not headers_wrote:
                fieldnames = ["Name", "Street", "City", "State", "Zipcode", "Phonenumber", "ID"]
                writer = csv.DictWriter(csvfile, fieldnames = fieldnames, delimiter = ",")
                writer.writeheader()
                writer.writerow(to_write)
                headers_wrote = True
            else:
                writer = csv.DictWriter(csvfile, fieldnames = fieldnames, delimiter = ",")
                writer.writerow(to_write)

Comments

0

althought the question is already answered I feel like you should broaden your knowledge with pandas module. I've implemented only the part with splitting the Address row. If you want, I can show you also the rest. pandas can be sometimes not straightforward, but once you get used to it, it is the easiest way to go in many of csv handling problems (not to mention other great features: working with databases etc.). The code is visible on my github page. Have a look!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.