2

This is not about decoding using UTF-8. This is about reading a bytes object as a literal and needing it as a bytes object without reinventing the parsing process. If there is an answer to my question out there, it is hiding behind a lot of answers to questions about decoding.

Here is what I need:

x = "bytearray(b'abc\xd8\xa2\xd8\xa8\xd8\xa7xyz')"
y = ???(x, ???)
z = bytearray(b'abc\xd8\xa2\xd8\xa8\xd8\xa7xyz')
if y == z:
   print ("Yes!")

Any suggestions for how to replace those question marks?

Thanks!

                           -- Dave
1
  • 2
    It might be easier to fix the code that produced the bytearray literal to produce something more friendly instead. Commented Jul 13, 2016 at 19:29

1 Answer 1

3

One approach would be to remove all the clutter from x (bytearray(b' and ')), then we just convert each character to its byte representation and wrap it into a bytearray object.

x = "bytearray(b'abc\xd8\xa2\xd8\xa8\xd8\xa7xyz')"
y = bytes(ord(c) for c in x[12:-2])

The second approach below won't be limited to bytearray and you should use it with care to protect against injection, but if you are sure that your content is in the correct format you can use this:

x = r"bytearray(b'abc\xd8\xa2\xd8\xa8\xd8\xa7xyz')"
y = eval(x)
z = bytearray(b'abc\xd8\xa2\xd8\xa8\xd8\xa7xyz')

Here you need to prefix x with r"..." to prevent the backslashes from immediately inserting unicode sequences into x. Therefore, it might not be possible to use this with dynamic content, e.g. strings coming from standard input or read from files.

You can also use ast.literal_eval(x[10:-1]) as suggested by kindall.

Sign up to request clarification or add additional context in comments.

3 Comments

You can also let python treat slashes as literal slashes (instead of escapes) by prefixing the whole string with r, as in x = r"bytearray(...)" (r stands for "Raw" in this context)
If you chop it down to b'...' you can then use ast.literal_eval() on it to get the bytes object, then just call bytearray() on that.
The r is only useful if the string is coming from an actual Python string literal, and if it's coming from an actual string literal, you might as well take off the quotes and skip the whole eval thing entirely. If the string is coming from standard input, or a file, or a network request, or pretty much anywhere else, the r is unnecessary, and there'd be nowhere to put the r anyway.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.