0

Is there a builtin way to "convert" a bytestring to a unicode string? I don't want to decode it, I want the string i see on print without the "b".

e.g. Input:

b'\xb5\xb5\xb5\xb5\r\n1'

output:

'\xb5\xb5\xb5\xb5\r\n1'  

I've tried iterating over the byte string, but that gives me a list of integers:

my_bytestring = b'%PDF-1.4\n%\x93\x8c\x8b\x9e'

my_string = ""
my_list = []
for char in my_bytestring:
    my_list.append(char)
    my_string += str(char)
print(my_list)   # -> list of ints
print(my_string) # -> string of converted ints

I get:

[37, 80, 68, 70, 45, 49, 46, 52, 10, 37, 147, 140, 139, 158]

I want:

['%', 'P', 'D', 'F', '-', '1', '.', '4', '\\', 'n', '%', '\\', 'x', '9', '3', '\\', 'x', '8', 'c', '\\', 'x', '8', 'b', '\\', 'x', '9', 'e']
9
  • But they're both technically the same string... c.f: stackoverflow.com/questions/7262828/… Commented Apr 26, 2018 at 10:10
  • None of the answers there do what I want though. They all decode or start from a unicode string. I amended the question to show what i get vs what i need. Commented Apr 26, 2018 at 10:17
  • Where is the bytestring coming from? I.E: Why can't you just do r'...' and not b'...' Commented Apr 26, 2018 at 10:23
  • 1
    Do you want the result to contain literal \ , x, b, etc.? Commented Apr 26, 2018 at 10:24
  • 1
    You're asking two different questions here. The first string is treated like a normal string (i.e. b'\xb5' becomes '\xb5'), while the 2nd string is treated like a raw string (i.e. b'\xb5' becomes r'\xb5'). Commented Apr 26, 2018 at 10:41

3 Answers 3

2

Use the [Python]: chr(i) function:

>>> b = b"\xb5\xb5\xb5\xb5\r\n1"
>>> s = "".join([chr(i) for i in b])
>>> s
'µµµµ\r\n1'
>>> len(b), len(s)
(7, 7)

As @hop mentioned, it would be better to use this method:

>>> s0 = b.decode(encoding="unicode_escape")
>>> s0
'µµµµ\r\n1'
>>> len(s0)
7

However, looking at your 2nd example, it seems you need [Python]: repr(object):

>>> my_bytestring = b'%PDF-1.4\n%\x93\x8c\x8b\x9e'
>>> l = [i for i in repr(my_bytestring)][2:-1]
>>> l
['%', 'P', 'D', 'F', '-', '1', '.', '4', '\\', 'n', '%', '\\', 'x', '9', '3', '\\', 'x', '8', 'c', '\\', 'x', '8', 'b', '\\', 'x', '9', 'e']
>>> len(my_bytestring), len(l)
(14, 27)
Sign up to request clarification or add additional context in comments.

5 Comments

don't invent your own .decode() use the unicode_escape encoding
Thanks. Just changing the str(char) to chr(char) in my code did the job!
Thank you @hop, I'll add it.
Your links look like they've been inserted by some kind of script. May I ask where I can find this script? It looks useful.
@Aran-Fey: Unfortunately I didn't have the time to automate it yet, so it's all manual (monkey work) :) .
1

Technically you cannot get from bytes to strings without decoding, but there is a codec that does what you want:

>>> b = b'\xb5\xb5\xb5\xb5\r\n1'
>>> s = b.decode('unicode_escape')
>>> s
'µµµµ\r\n1'
>>> print(s)
µµµµ
1

There is also raw_unicode_escape. You can read about the differences in the documentation

I very much doubt that there is a use case for having binary data in a unicode string.

2 Comments

Doesn't work for the second string. Gives: UnicodeEncodeError: 'charmap' codec can't encode characters in position 11-14: character maps to <undefined>
@Yobmod I can't reproduce that. b'%PDF-1.4\n%\x93\x8c\x8b\x9e'.decode('unicode_escape') returns '%PDF-1.4\n%\x93\x8c\x8b\x9e'.
-2

The PDF payload obviously isn't utf-8 encoded, or other encodings. They are raw data, not any form of text.

BUT there is an encoding that mantains all the characters with code from 0 to 255:

data = data.decode("latin1")

This changes the data type from bytes to str.

It isn't a brilliant solution because it consumes cpu time and memory, creating a new object, but it is the only one.

It is a nuisance there isn't an instruction in Python to just change the data type, from bytes to str, without processing.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.