0

i am trying to save strings that contain emojies to a .txt file, but I always get an error when running the code.

Code:


I set the .txt file up to have an utf-8 encoding.


subject_proper = subject.text.strip()
subject_proper = subject_proper.decode('utf-8')

Error:

subject_proper = subject_proper.decode('utf-8')
AttributeError: 'str' object has no attribute 'decode'

Edit:

if i drop the .decode I get the following error:

UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 65-65: Non-BMP character not supported in Tk

Edit 2:

Example text: Christmas treats for the triathletes ⛄

I have scraped the strings from https://milled.com/wiggle-co-uk

This method has worked before, but I dont know why it does not with this code. I have tried to find the answer elsewhere, but unfortunately without success.

I hope someone has an idea :)

9
  • This might just be the difference between Python 2 and Python 3. Commented Dec 13, 2019 at 19:57
  • Does this answer your question? 'str' object has no attribute 'decode'. Python 3 error? Commented Dec 13, 2019 at 19:57
  • Decode works on bytes. b'some text'.decode('utf-8') will work but 'some text'.decode('utf-8') will not. Commented Dec 13, 2019 at 19:58
  • 2
    Please consider adding some of the text you're trying to parse / decode to the question. Commented Dec 13, 2019 at 20:04
  • 1
    I figured out what the problem was. The code runs in pycharm without issues, but does not in idle. Removing the print output to the console has fixed the issue. It is now printing to the .txt without issues. Commented Dec 13, 2019 at 20:13

1 Answer 1

1

You're trying to decode a string that has already been decoded. If your file is set to utf-8 but only has ASCII characters in it, I don't think the encoding matters.

Once you have a str, there's no need to decode it anymore. If you drop .decode('utf-8'), the error will likely go away.

If you're expecting code to possibly have utf-8 values, you can surround it with a try-except block to catch an AttributeError, and then act on it accordingly.

Sign up to request clarification or add additional context in comments.

2 Comments

Unfortunatley that does not work. I get the following error: UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 65-65: Non-BMP character not supported in Tk
Possibly consider subject_proper =subject_proper .encode('unicode-escape').decode('utf-8')? I'm not sure what characters you're trying to parse, but python doesn't seem to like them. Consider checking this question out, too.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.