Unable to decode string to utf-8 in python

Question

i am trying to save strings that contain emojies to a .txt file, but I always get an error when running the code.

Code:


I set the .txt file up to have an utf-8 encoding.


subject_proper = subject.text.strip()
subject_proper = subject_proper.decode('utf-8')

Error:

subject_proper = subject_proper.decode('utf-8')
AttributeError: 'str' object has no attribute 'decode'

Edit:

if i drop the .decode I get the following error:

UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 65-65: Non-BMP character not supported in Tk

Edit 2:

Example text: Christmas treats for the triathletes ⛄

I have scraped the strings from https://milled.com/wiggle-co-uk

This method has worked before, but I dont know why it does not with this code. I have tried to find the answer elsewhere, but unfortunately without success.

I hope someone has an idea :)

This might just be the difference between Python 2 and Python 3. — Mark Ransom
– Mark Ransom, Commented Dec 13, 2019 at 19:57
Does this answer your question? 'str' object has no attribute 'decode'. Python 3 error? — Juan C
– Juan C, Commented Dec 13, 2019 at 19:57
Decode works on bytes. b'some text'.decode('utf-8') will work but 'some text'.decode('utf-8') will not. — WGriffing
– WGriffing, Commented Dec 13, 2019 at 19:58
Please consider adding some of the text you're trying to parse / decode to the question. — Nick Reed
– Nick Reed, Commented Dec 13, 2019 at 20:04
I figured out what the problem was. The code runs in pycharm without issues, but does not in idle. Removing the print output to the console has fixed the issue. It is now printing to the .txt without issues. — HansDampf
– HansDampf, Commented Dec 13, 2019 at 20:13

Nick Reed · Accepted Answer · 2019-12-13 20:01:25Z

1

You're trying to decode a string that has already been decoded. If your file is set to utf-8 but only has ASCII characters in it, I don't think the encoding matters.

Once you have a str, there's no need to decode it anymore. If you drop .decode('utf-8'), the error will likely go away.

If you're expecting code to possibly have utf-8 values, you can surround it with a try-except block to catch an AttributeError, and then act on it accordingly.

edited Dec 13, 2019 at 20:01

answered Dec 13, 2019 at 19:57

Nick Reed

5,1094 gold badges19 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

HansDampf Over a year ago

Unfortunatley that does not work. I get the following error: UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 65-65: Non-BMP character not supported in Tk

Nick Reed Over a year ago

Possibly consider subject_proper =subject_proper .encode('unicode-escape').decode('utf-8')? I'm not sure what characters you're trying to parse, but python doesn't seem to like them. Consider checking this question out, too.

Collectives™ on Stack Overflow

Unable to decode string to utf-8 in python

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related