When does Python raise UnicodeDecodeError when searching in string

Question

I have this piece of code that reads from a gunzip stream and checks if each line contains some pattern. What I have is

if (pattern in line):
    do_something()

Some lines contain non-ASCII characters, when my code reaches those lines, I get a UnicodeDecodeError. However, I am unable to reproduce this error in my manual testing. When I copy the repr of the line that causes UnicodeDecodeError and assign it to variable line and do pattern in line, I get False instead of an error. I am confused about this inconsistency. Why does it behave different for the same string?

Because repr(some_string) is not the same as some_string, it is a representation of it. For some types it aspires to give a representation that can be used to construct a new instance of the type if given as input to an interpreter or as code, but not for all. See stackoverflow.com/questions/7784148/…. — Ilja Everilä
– Ilja Everilä, Commented Jun 9, 2016 at 19:24
Aside: if you're a beginner, you should probably be using Python 3 instead of Python 2. Unicode in particular is handled much better, and while there's still some stuff you have to learn at least what you'll be learning makes sense. — DSM
– DSM, Commented Jun 9, 2016 at 19:56

user274602 · Accepted Answer · 2016-06-09 19:32:15Z

1

I find the root cause of my problem. Somehow, in my actual code pattern has type unicode instead of str, but in manual testing my pattern is just a str that I type in. This causes the different behaviro I observed.

answered Jun 9, 2016 at 19:32

user274602

591 gold badge2 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

When does Python raise UnicodeDecodeError when searching in string

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related