0

I'm using tweepy to capture some tweets in Portuguese and I'm saving these tweets in a csv file. All tweet text we're saved with special characters and now I can't convert then to the correct format.

My coding for the tweet capture is:

csvFile = open('ua.csv', 'a')
csvWriter = csv.writer(csvFile)
for tweet in tweepy.Cursor(api.user_timeline,id=usuario,count=10,
                           lang="en",
                           since="2018-12-01").items():
csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])

I'm reading the results like this:

test = pd.read_csv('ua.csv', header=None)
test.columns = ["date", "text"]
result = test['text'][0]
print(result)
'Aproveita essa promo\xc3\xa7\xc3\xa3o aqui!'

The result I need sholud be this:

print(result)
'Aproveita essa promoção aqui!'

I tried this code to convert:

print(result.decode('utf-8'))

and got this error message:

AttributeError: 'str' object has no attribute 'decode'

Where am I doing wrong ?

6
  • You should specify encoding when reading the data as well, otherwise the bytes on the hard drive are interpreted the wrong way. And actually I think the csv writer needs a str not a bytes and you should probably specify the encoding the for the CSV writer as well. Commented Dec 30, 2018 at 1:06
  • 1
    This is Python 3?? Commented Dec 30, 2018 at 1:07
  • 1
    @davedwards that's the encoding of the Python source code file Commented Dec 30, 2018 at 1:08
  • 2
    Because it's irrelevant to the question, and for Python 3, it defaults to utf8 anyway Commented Dec 30, 2018 at 1:09
  • 1
    Well, I suspect the problem is that you are writing the string representation of the bytes object to the file. When you write the tweets, don't use tweet.text.encode('utf-8')] i.e don't use .encode Commented Dec 30, 2018 at 1:12

3 Answers 3

1

The problem is that you are creating a bytes object when you .encode your tweet, you don't need to do this.

A csv.writer object will coerce to string whatever you pass to it.

Note:

In [1]: import csv

In [2]: s = 'Aproveita essa promoção aqui!'

In [3]: print(s)
Aproveita essa promoção aqui!

In [4]: print(s.encode())
b'Aproveita essa promo\xc3\xa7\xc3\xa3o aqui!'

In [5]: with open('test.txt', 'a') as f:
   ...:     writer = csv.writer(f)
   ...:     writer.writerow([1, 3.4, 'Aproveita essa promoção aqui!'.encode()])
   ...:

In [6]: !cat test.txt
1,3.4,b'Aproveita essa promo\xc3\xa7\xc3\xa3o aqui!'

So just use:

csvWriter.writerow([tweet.created_at, tweet.text])
Sign up to request clarification or add additional context in comments.

1 Comment

for tweet in tweepy.Cursor(api.user_timeline,id=usuario,count=10, lang="en", since="2018-07-01").items(): csvWriter.writerow([tweet.created_at, tweet.text]) UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f614' in position 72: character maps to <undefined>
0

Open the file with the encoding to be used. Don't encode it manually (Zen of Python: Explicit is better than implicit):

# newline='' per csv documentation
# encoding='utf-8-sig' if you plan on using Excel to read the csv, else 'utf8' is fine.
with open('ua.csv','a',encoding='utf-8-sig',newline='') as csvFile:
    csvWriter = csv.writer(csvFile)
    for tweet in tweepy.Cursor(api.user_timeline,id=usuario,count=10,
                               lang="en",
                               since="2018-12-01").items():
    csvWriter.writerow([tweet.created_at, tweet.text)

Here's a working example:

import csv
import pandas as pd

with open('ua.csv','w',encoding='utf-8-sig',newline='') as csvFile:
    csvWriter = csv.writer(csvFile)
    csvWriter.writerow(['timestamp','Aproveita essa promoção aqui!'])

test = pd.read_csv('ua.csv', encoding='utf-8-sig', header=None)
print(test)

Output:

           0                              1
0  timestamp  Aproveita essa promoção aqui!

Comments

0

The pandas read_csv has an encoding parameter:

Encoding to use for UTF when reading/writing (ex. ‘utf-8’).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.