1

I have this piece of code that reads from a gunzip stream and checks if each line contains some pattern. What I have is

if (pattern in line):
    do_something()

Some lines contain non-ASCII characters, when my code reaches those lines, I get a UnicodeDecodeError. However, I am unable to reproduce this error in my manual testing. When I copy the repr of the line that causes UnicodeDecodeError and assign it to variable line and do pattern in line, I get False instead of an error. I am confused about this inconsistency. Why does it behave different for the same string?

2
  • 1
    Because repr(some_string) is not the same as some_string, it is a representation of it. For some types it aspires to give a representation that can be used to construct a new instance of the type if given as input to an interpreter or as code, but not for all. See stackoverflow.com/questions/7784148/…. Commented Jun 9, 2016 at 19:24
  • 1
    Aside: if you're a beginner, you should probably be using Python 3 instead of Python 2. Unicode in particular is handled much better, and while there's still some stuff you have to learn at least what you'll be learning makes sense. Commented Jun 9, 2016 at 19:56

1 Answer 1

1

I find the root cause of my problem. Somehow, in my actual code pattern has type unicode instead of str, but in manual testing my pattern is just a str that I type in. This causes the different behaviro I observed.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.