Using the split function in Python

Question

I am working with the CSV module, and I am writing a simple program which takes the names of several authors listed in the file, and formats them in this manner: john.doe

So far, I've achieved the results that I want, but I am having trouble with getting the code to exclude titles such as "Mr."Mrs", etc. I've been thinking about using the split function, but I am not sure if this would be a good use for it.

Any suggestions? Thanks in advance!

Here's my code so far:

import csv


books = csv.reader(open("books.csv","rU"))


for row in books:


     print '.'.join ([item.lower() for item in [row[index] for index in (1, 0)]])

Take a look at the filter() function: docs.python.org/library/functions.html#filter — Hunter McMillen
– Hunter McMillen, Commented Dec 14, 2011 at 1:26
If you can think of a way to do what you want using split(), then it is a fine use of it. If you show us your code and state exactly what you are asking then it will be easier to answer this question. — Daniel Nill
– Daniel Nill, Commented Dec 14, 2011 at 1:27
Could you please be a little more specific on exactly what you have and what you want? (A couple of examples are welcome) — redShadow
– redShadow, Commented Dec 14, 2011 at 1:27
row[index] for index in (1, 0) can be written as: row[1::-1] — Bora Caglayan
– Bora Caglayan, Commented Dec 14, 2011 at 2:14

redShadow · Accepted Answer · 2011-12-14 01:57:42Z

3

It depends on how much messy the strings are, in worst cases this regexp-based solution should do the job:

import re
x=re.compile(r"^\s*(mr|mrs|ms|miss)[\.\s]+", flags=re.IGNORECASE)
x.sub("", text)

(I'm using re.compile() here since for some reasons Python 2.6 re.sub doesn't accept the flags= kwarg..)

UPDATE: I wrote some code to test that and, although I wasn't able to figure out a way to automate results checking, it looks like that's working fine.. This is the test code:

import re
x=re.compile(r"^\s*(mr|mrs|ms|miss)[\.\s]+", flags=re.IGNORECASE)
names = ["".join([a,b,c,d]) for a in ['', ' ', '   ', '..', 'X'] for b in ['mr', 'Mr', 'miss', 'Miss', 'mrs', 'Mrs', 'ms', 'Ms'] for c in ['', '.', '. ', ' '] for d in ['Aaaaa', 'Aaaa Bbbb', 'Aaa Bbb Ccc', ' aa ']]
print "\n".join([" => ".join((n,x.sub('',n))) for n in names])

edited Dec 14, 2011 at 1:57

answered Dec 14, 2011 at 1:43

redShadow

6,7972 gold badges34 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

redShadow Over a year ago

Actually, test code is a one-liner.. :

print "\n".join([" => ".join((n,re.compile(r"^\s*(mr|mrs|ms|miss)[\.\s]+", flags=re.IGNORECASE).sub('',n))) for n in ["".join([a,b,c,d]) for a in ['', ' ', '   ', '..', 'X'] for b in ['mr', 'Mr', 'miss', 'Miss', 'mrs', 'Mrs', 'ms', 'Ms'] for c in ['', '.', '. ', ' '] for d in ['Aaaaa', 'Aaaa Bbbb', 'Aaa Bbb Ccc', ' aa ']]])

Tom Neyland · Accepted Answer · 2011-12-14 01:42:07Z

Depending on the complexity of your data and the scope of your needs you may be able to get away with something as simple as stripping titles from the lines in the csv using replace() as you iterate over them.

Something along the lines of:

titles = ["Mr.", "Mrs.", "Ms", "Dr"] #and so on

for line in lines:
    line_data = line
    for title in titles:
        line_data = line_data.replace(title,"")
    #your code for processing the line

This may not be the most efficient method, but depending on your needs may be a good fit.

How this could work with the code you posted (I am guessing the Mr./Mrs. is part of column 1, the first name):

import csv

books = csv.reader(open("books.csv","rU"))

for row in books:
     first_name = row[1]
     last_name = row[0]
     for title in titles:
          first_name = first_name.replace(title,"")
     print '.'.(first_name, last_name)

Collectives™ on Stack Overflow

Using the split function in Python

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related