URL-Encoding a Byte String?

Question

I am writing a Bittorrent client. One of the steps involved requires that the program sends a HTTP GET request to the tracker containing an SHA1 hash of part of the torrent file. I have used Fiddler2 to intercept the request sent by Azureus to the tracker.

The hash that Azureus sends is URL-Encoded and looks like this: %D9%0C%3C%E3%94%18%F0%C5%D9%83X%E03I%26%2B%60%8C%BFR

The hash should look like this before it's URL-Encoded: d90c3ce39418f0c5d98358e03349262b608cbf52

I notice that it is not as simple as placing a '%' symbol every two characters, so how would I go about encoding this BYTE string to get the same as Azureus.

Thanks in advance.

R. Martinho Fernandes · Accepted Answer · 2012-07-18 08:44:08Z

2

Actually, you can just place a % symbol every two characters. Azureus doesn't do that because, for example, R is a safe character in a URL, and 52 is the hexadecimal representation of R, so it doesn't need to percent-encode it. Using %52 instead is equivalent.

answered Jul 18, 2012 at 8:44

R. Martinho Fernandes

236k73 gold badges443 silver badges518 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

David Schwartz · Accepted Answer · 2012-07-18 08:44:16Z

1

Go through the string from left to right. If you encounter a %, output the next two characters, converting upper-case to lower-case. If you encounter anything else, output the ASCII code for that character in hex using lower-case letters.

%D9 %0C %3C %E3 %94 %18 %F0 %C5 %D9 %83 X %E0 3 I %26 %2B %60 %8C %BF R

The ASCII code for X is 0x58, so that becomes 58. The ASCII code for 3 is 0x33.

(I'm kind of puzzled why you had to ask though. Your question clearly shows that you recognized this as URL-Encoded.)

answered Jul 18, 2012 at 8:44

David Schwartz

184k18 gold badges229 silver badges292 bronze badges

3 Comments

R. Martinho Fernandes Over a year ago

I think the question is about encoding, not decoding.

David Schwartz Over a year ago

No difference. The process is clearly reversible. (And the question points out that it's just URL encoded!)

brnby Over a year ago

I hadn't realised that ascii characters could be used to represent the bytes, thank you for clearing that up! :)

KiriSakow · Accepted Answer · 2022-08-24 17:10:17Z

Even though I know well the original question was about C++, it might be useful somehow, sometimes to see alternative solutions. Therefore, for what it's worth (10 years later), here's

An alternative solution implemented in Python 3.6+

import binascii
import urllib.parse

def hex_str_to_esc_str(s: str, *, encoding: str='Windows-1252') -> str:
    # decode hex string as a Windows-1252 string
    win1252_str = binascii.unhexlify(hex_str).decode(encoding)
    # escape string and return
    return urllib.parse.quote(win1252_str, encoding=encoding)

def esc_str_to_hex_str(s: str, *, encoding: str='Windows-1252') -> str:
    # unescape the escaped string as a Windows-1252 string
    win1252_str = urllib.parse.unquote(esc_str, encoding='Windows-1252')
    # encode string, hexlify, and return
    return win1252_str.encode('Windows-1252').hex()

Two elementary tests:

esc_str = '%D9%0C%3C%E3%94%18%F0%C5%D9%83X%E03I%26%2B%60%8C%BFR'
hex_str = 'd90c3ce39418f0c5d98358e03349262b608cbf52'

print(hex_str_to_esc_str(hex_str) == esc_str) # True
print(esc_str_to_hex_str(esc_str) == hex_str) # True

Note

Windows-1252 (aka cp1252) emerged as the default encoding as a result of the following test:

import binascii
import chardet

esc_str = '%D9%0C%3C%E3%94%18%F0%C5%D9%83X%E03I%26%2B%60%8C%BFR'
hex_str = 'd90c3ce39418f0c5d98358e03349262b608cbf52'

print(
    chardet.detect(
        binascii.unhexlify(hex_str)
    )
)

...which gave a pretty strong clue:

{'encoding': 'Windows-1252', 'confidence': 0.73, 'language': ''}

Collectives™ on Stack Overflow

URL-Encoding a Byte String?

3 Answers 3

Comments

3 Comments

An alternative solution implemented in Python 3.6+

Note

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

An alternative solution implemented in Python 3.6+

Note

Comments

Your Answer

Sign up or log in

Post as a guest

Related