2

I am working with the CSV module, and I am writing a simple program which takes the names of several authors listed in the file, and formats them in this manner: john.doe

So far, I've achieved the results that I want, but I am having trouble with getting the code to exclude titles such as "Mr."Mrs", etc. I've been thinking about using the split function, but I am not sure if this would be a good use for it.

Any suggestions? Thanks in advance!

Here's my code so far:

import csv


books = csv.reader(open("books.csv","rU"))


for row in books:


     print '.'.join ([item.lower() for item in [row[index] for index in (1, 0)]])
4
  • Take a look at the filter() function: docs.python.org/library/functions.html#filter Commented Dec 14, 2011 at 1:26
  • 2
    If you can think of a way to do what you want using split(), then it is a fine use of it. If you show us your code and state exactly what you are asking then it will be easier to answer this question. Commented Dec 14, 2011 at 1:27
  • 4
    Could you please be a little more specific on exactly what you have and what you want? (A couple of examples are welcome) Commented Dec 14, 2011 at 1:27
  • 3
    row[index] for index in (1, 0) can be written as: row[1::-1] Commented Dec 14, 2011 at 2:14

2 Answers 2

3

It depends on how much messy the strings are, in worst cases this regexp-based solution should do the job:

import re
x=re.compile(r"^\s*(mr|mrs|ms|miss)[\.\s]+", flags=re.IGNORECASE)
x.sub("", text)

(I'm using re.compile() here since for some reasons Python 2.6 re.sub doesn't accept the flags= kwarg..)

UPDATE: I wrote some code to test that and, although I wasn't able to figure out a way to automate results checking, it looks like that's working fine.. This is the test code:

import re
x=re.compile(r"^\s*(mr|mrs|ms|miss)[\.\s]+", flags=re.IGNORECASE)
names = ["".join([a,b,c,d]) for a in ['', ' ', '   ', '..', 'X'] for b in ['mr', 'Mr', 'miss', 'Miss', 'mrs', 'Mrs', 'ms', 'Ms'] for c in ['', '.', '. ', ' '] for d in ['Aaaaa', 'Aaaa Bbbb', 'Aaa Bbb Ccc', ' aa ']]
print "\n".join([" => ".join((n,x.sub('',n))) for n in names])
Sign up to request clarification or add additional context in comments.

1 Comment

Actually, test code is a one-liner.. : print "\n".join([" => ".join((n,re.compile(r"^\s*(mr|mrs|ms|miss)[\.\s]+", flags=re.IGNORECASE).sub('',n))) for n in ["".join([a,b,c,d]) for a in ['', ' ', ' ', '..', 'X'] for b in ['mr', 'Mr', 'miss', 'Miss', 'mrs', 'Mrs', 'ms', 'Ms'] for c in ['', '.', '. ', ' '] for d in ['Aaaaa', 'Aaaa Bbbb', 'Aaa Bbb Ccc', ' aa ']]])
0

Depending on the complexity of your data and the scope of your needs you may be able to get away with something as simple as stripping titles from the lines in the csv using replace() as you iterate over them.

Something along the lines of:

titles = ["Mr.", "Mrs.", "Ms", "Dr"] #and so on

for line in lines:
    line_data = line
    for title in titles:
        line_data = line_data.replace(title,"")
    #your code for processing the line

This may not be the most efficient method, but depending on your needs may be a good fit.

How this could work with the code you posted (I am guessing the Mr./Mrs. is part of column 1, the first name):

import csv

books = csv.reader(open("books.csv","rU"))

for row in books:
     first_name = row[1]
     last_name = row[0]
     for title in titles:
          first_name = first_name.replace(title,"")
     print '.'.(first_name, last_name)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.