1

I am following along the book Foundation for Analytics with Python by Clinton W. Brownley (O'Reilly Media Inc.)

For Chapter 2 - Read and Write a CSV File (Part 2) Base Python, with csv module

the script is as the following:

#!/usr/bin/env python3
import sys
import csv

input_file = sys.argv[1]
output_file = sys.argv[2]

with open(input_file, 'r', newline='') as csv_input_file:
    with open(output_file, 'w', newline='') as csv_output_file:

        filereader = csv.reader(csv_input_file, delimiter=',')
        filewriter = csv.writer(csv_output_file, delimiter=',')

        for row_list in filereader:
            print(row_list)
            filewriter.writerow(row_list)

the input file has fields containing commas (the dollar amounts in the last two lines):

Supplier Name,Invoice Number,Part Number,Cost,Purchase Date
Supplier X,001-1001,2341,$500.00,1/20/14
Supplier X,001-1001,2341,$500.00,1/20/14
Supplier X,001-1001,5467,$750.00,1/20/14
Supplier X,001-1001,5467,$750.00,1/20/14
Supplier Y,50-9501,7009,$250.00,1/30/14
Supplier Y,50-9501,7009,$250.00,1/30/14
Supplier Y,50-9505,6650,$125.00,2/3/14
Supplier Y,50-9505,6650,$125.00,2/3/14
Supplier Z,920-4803,3321,$615.00,2/3/14
Supplier Z,920-4804,3321,$615.00,2/10/14
Supplier Z,920-4805,3321,$6,015.00,2/17/14
Supplier Z,920-4806,3321,$1,006,015.00,2/24/14

running the script produces the following output in terminal:

['Supplier Name', 'Invoice Number', 'Part Number', 'Cost', 'Purchase Date']
['Supplier X', '001-1001', '2341', '$500.00', '1/20/14']
['Supplier X', '001-1001', '2341', '$500.00', '1/20/14']
['Supplier X', '001-1001', '5467', '$750.00', '1/20/14']
['Supplier X', '001-1001', '5467', '$750.00', '1/20/14']
['Supplier Y', '50-9501', '7009', '$250.00', '1/30/14']
['Supplier Y', '50-9501', '7009', '$250.00', '1/30/14']
['Supplier Y', '50-9505', '6650', '$125.00', '2/3/14']
['Supplier Y', '50-9505', '6650', '$125.00', '2/3/14']
['Supplier Z', '920-4803', '3321', '$615.00', '2/3/14']
['Supplier Z', '920-4805', '3321', '$615.00', '2/17/14']
['Supplier Z', '920-4804', '3321', '$6', '015.00', '2/10/14']
['Supplier Z', '920-4806', '3321', '$1', '006', '015.00', '2/24/14']

but the book show the expected output like this:

enter image description here

What am I doing wrong?

4
  • 1
    You're not doing anything wrong, CSV uses comma as a separator, which means that it can't be used anywhere else, including numbers... So technically 1,006,015.00 is not valid number in CSV format. Commented Aug 20, 2017 at 0:58
  • It's quite possible the book's example is just wrong. This doesn't appear to be listed in the confirmed errata though. Are you sure the input file looks like that? Is it actually a plain text file? Commented Aug 20, 2017 at 1:06
  • just double-checked, the screenshot of Figure 2-7 shows the interface of Excel. Using applications like Excel or Numbers to modify the csv file, then export as csv, the cells containing commas would become enclosed by doublequotes Commented Aug 20, 2017 at 15:08
  • just submitted an errata on O'Reilly site Commented Aug 20, 2017 at 15:15

2 Answers 2

7

You have three ways to correct your output:

  1. Remove the commas from the money amounts.
  2. Use QUOTING: Wrap the money amount in double quotes. For example, in the first row $500.00 will be "$500.00". Quoting is a popular technique. When using quoting, change your read statement to this:

    filereader = csv.reader(csv_input_file, delimiter=',', quotechar='"')

  3. Use a different delimiter. You don't have to use a comma as the delimiter. To use this method, change the delimiters in your input file to another delimiter. I like pipe-delimited files because pipes are rarely used as text.

    filereader = csv.reader(csv_input_file, delimiter='|')

Sign up to request clarification or add additional context in comments.

1 Comment

I disagree with the author on this. The comma is used to identify where one field begins and another ends. If you include commas within one of your fields as text, the interpreter can't distinguish them from the end of field delimiter which is what you saw in your output. You have to give it a way to distinguish between a field delimiter and text within the field. Try the quoting solution and see for yourself.
0

just double-checked, the screenshot of Figure 2-7 shows the interface of Excel.

Using applications like Excel or Numbers to modify the csv file, then export as csv, the cells containing commas would become enclosed by doublequotes

Thank you all for the detailed explanations!

1 Comment

You're welcome. The best way to thank us on stackoverflow is to accept the answer if it helped you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.