0

I'm new to python 3 and trying to extract a message from a bytes array that contains both string and bytes in the message.

I'm unable to extract the bytes message from the decoded bytes array.

  1. Firstly, I decode the bytes array.
  2. Then I do a split on the decoded array.
  3. I get string values upon splitting the array.

I tried to use bytes(v) for v in rest.split() function to try and get the bytes array and then decode it, but wasn't able to.

# The message chunk:
chunk = b"1568077849\n522\nb'l5:d4:auth53:\xc3\x99\xc3\xac\x1fH\xc2\xa3ei6eli1eee'\n"

# I split the chunk into sub categories for further processing:
_, size, rest = (chunk.decode("utf-8")).split('\n', 2)

# _ contains "1568077849"
# size contains "522" 
# rest contains "b'l5:d4:auth53:\xc3\x99\xc3\xac\x1fH\xc2\xa3ei6eli1eee'"

I'm supposed to be able to decode the rest variable (rest.decode("utf-8")), but since it's getting stored as string, I'm having a hard time figuring out how can I convert that to bytes and then decode the value.

The expected result: l5:d4:auth53:ÙìH£ei6eli1eee

7
  • how did you get this string ? It seems someone create this string in wrong way. Commented Sep 10, 2019 at 2:29
  • 1
    You could use slicing rest = rest[2:-2] Commented Sep 10, 2019 at 2:32
  • It's coming in from a server request = reader._build_request(chunk_meta) chunk = urllib.request.urlopen(request).read() Commented Sep 10, 2019 at 2:33
  • as @nathancy mentioned you have to slice it and then you should have correct string l5:d4:auth53:ÙìH£ei6eli1eee Commented Sep 10, 2019 at 2:38
  • @nathancy I need to be able to get the value in bytes to be able to decode it correctly. Currently, your solution is still giving rest as a string ``` print(isinstance(rest, bytes)) -> False ``` Commented Sep 10, 2019 at 2:38

2 Answers 2

2

This will print your final result:

chunk = b"1568077849\n522\nb'l5:d4:auth53:\xc3\x99\xc3\xac\x1fH\xc2\xa3ei6eli1eee'\n"

l1 = chunk.decode('utf-8').split()[2:]  # Initial decode
#  slice out the embedded byte string "b'  '" characters
l1_string = ''.join([x[:-2] if x[0] != 'b' else x[2:] for x in l1])
l1_bytes = l1_string.encode('utf-8')
l1_final = l1_bytes.decode('utf-8')

print('Results')
print(f'l1_string is {l1_string}')
print(f'l1_bytes is {l1_bytes}')
print(f'l1_final is {l1_final}')
Results
l1_string is l5:d4:auth53:ÙìH£ei6eli1ee
l1_bytes is b'l5:d4:auth53:\xc3\x99\xc3\xacH\xc2\xa3ei6eli1ee'
l1_final is l5:d4:auth53:ÙìH£ei6eli1ee
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Dave !! Took sometime to decode the answer :)
Dave, I just came across the data and its coming in as: chunk = b"1568077849\n522\nb'l5:d4:auth53:\\xc3\\x99\\xc3\\xac\\x1fH\\xc2\\xa3ei6eli1eee'\n". There are two double slashes in the input. So the above solution is not working as expected.
0

I was able to get the expected output this way:

 _, size, rest = (chunk.decode("utf-8")).split('\n', 2)
 rest = bytes(rest.replace("b'", "").replace("'", ""), "utf-8").decode("unicode_escape")

Got the clue from this post: Process escape sequences in a string in Python

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.