3

Python Documents have following code example on writing unicode to csv file. I think it has mentioned there that this is the way to do since csv module can't handle unicode strings.

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

I am writing more than one file and to keep it simple I have only put the section of my code to demonstrate how I use above class in my code:

def write(self):
    """
    Outputs the dataset to a csv.
    """
    f = codecs.open(self.filename, 'a')
    writer = UnicodeWriter(f)
    #with open(self.filename, 'a', encoding='utf-8') as f:
    if self.headers and not self.written:
        writer.writerow(self.headers)
        self.written = True
    for record in self.records[self.last_written:]:
        print record
        writer.writerow(record)
    self.last_written = len(self.records)
    f.close()

This is a method inside a class coll dataset which prepare the dataset prior to writing to csv, previously I was using writer = csv.writer(f) but due to codec errors I change my code to use `UnicodeWriter class.

But my problem is that when I open the csv file, I get the following:

some_header
B,r,ë,k,ò,w,n,i,k,_,b,s
B,r,ë,k,ò,w,n,i,k,_,c,s
B,r,ë,k,ò,w,n,i,k,_,c,s,b
B,r,ë,k,ò,w,n,i,k,_,d,e
B,r,ë,k,ò,w,n,i,k,_,d,e,-,1
B,r,ë,k,ò,w,n,i,k,_,d,e,-,2
B,r,ë,k,ò,w,n,i,k,_,d,e,-,3
B,r,ë,k,ò,w,n,i,k,_,d,e,-,4
B,r,ë,k,ò,w,n,i,k,_,d,e,-,5
B,r,ë,k,ò,w,n,i,k,_,d,e,-,M
B,r,ë,k,ò,w,n,i,k,_,e,n
B,r,ë,k,ò,w,n,i,k,_,e,n,-,1
B,r,ë,k,ò,w,n,i,k,_,e,n,-,2

Where as these rows should actually should be something like Brëkòwnik_de-1 I am not really whats happening.

To give a basic idea of how the data has been generated I would add the following line: title = unicode(row_page_title['page_title'], 'utf-8')

2
  • 1
    Looks like isinstance(row, basestring) == True (should be an array or tuple). Commented Apr 11, 2013 at 22:41
  • did an assert yes it is a basestring, how can i fix this? Commented Apr 11, 2013 at 22:47

1 Answer 1

4

This symptom points to something like feeding a string into a function/method that is expecting a list or tuple.

The writerows method is expecting a list of lists, and writerow expects a list (or tuple) containing the field values. Since you are feeding it a string, and a string can mimic a list of characters when you iterate over it, you get a CSV with one character in each column.

If your CSV has just one column, you should use writer.writerow([data]) instead of writer.writerow(data). Some may question if you really need the csv module if you have only one column, but the csv module will handle things like a record containing funny stuff (CR/LF and others), so yes, it is a good idea.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes that why I thought of using csv module infact I am not sending direction I am constructing a data set by another class and then send that to csv to make sure data is clean. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.