0

I am writing a Bittorrent client. One of the steps involved requires that the program sends a HTTP GET request to the tracker containing an SHA1 hash of part of the torrent file. I have used Fiddler2 to intercept the request sent by Azureus to the tracker.

The hash that Azureus sends is URL-Encoded and looks like this: %D9%0C%3C%E3%94%18%F0%C5%D9%83X%E03I%26%2B%60%8C%BFR

The hash should look like this before it's URL-Encoded: d90c3ce39418f0c5d98358e03349262b608cbf52

I notice that it is not as simple as placing a '%' symbol every two characters, so how would I go about encoding this BYTE string to get the same as Azureus.

Thanks in advance.

3 Answers 3

2

Actually, you can just place a % symbol every two characters. Azureus doesn't do that because, for example, R is a safe character in a URL, and 52 is the hexadecimal representation of R, so it doesn't need to percent-encode it. Using %52 instead is equivalent.

Sign up to request clarification or add additional context in comments.

Comments

1

Go through the string from left to right. If you encounter a %, output the next two characters, converting upper-case to lower-case. If you encounter anything else, output the ASCII code for that character in hex using lower-case letters.

%D9 %0C %3C %E3 %94 %18 %F0 %C5 %D9 %83 X %E0 3 I %26 %2B %60 %8C %BF R

The ASCII code for X is 0x58, so that becomes 58. The ASCII code for 3 is 0x33.

(I'm kind of puzzled why you had to ask though. Your question clearly shows that you recognized this as URL-Encoded.)

3 Comments

I think the question is about encoding, not decoding.
No difference. The process is clearly reversible. (And the question points out that it's just URL encoded!)
I hadn't realised that ascii characters could be used to represent the bytes, thank you for clearing that up! :)
0

Even though I know well the original question was about C++, it might be useful somehow, sometimes to see alternative solutions. Therefore, for what it's worth (10 years later), here's

An alternative solution implemented in Python 3.6+

import binascii
import urllib.parse

def hex_str_to_esc_str(s: str, *, encoding: str='Windows-1252') -> str:
    # decode hex string as a Windows-1252 string
    win1252_str = binascii.unhexlify(hex_str).decode(encoding)
    # escape string and return
    return urllib.parse.quote(win1252_str, encoding=encoding)

def esc_str_to_hex_str(s: str, *, encoding: str='Windows-1252') -> str:
    # unescape the escaped string as a Windows-1252 string
    win1252_str = urllib.parse.unquote(esc_str, encoding='Windows-1252')
    # encode string, hexlify, and return
    return win1252_str.encode('Windows-1252').hex()

Two elementary tests:

esc_str = '%D9%0C%3C%E3%94%18%F0%C5%D9%83X%E03I%26%2B%60%8C%BFR'
hex_str = 'd90c3ce39418f0c5d98358e03349262b608cbf52'

print(hex_str_to_esc_str(hex_str) == esc_str) # True
print(esc_str_to_hex_str(esc_str) == hex_str) # True

Note

Windows-1252 (aka cp1252) emerged as the default encoding as a result of the following test:

import binascii
import chardet

esc_str = '%D9%0C%3C%E3%94%18%F0%C5%D9%83X%E03I%26%2B%60%8C%BFR'
hex_str = 'd90c3ce39418f0c5d98358e03349262b608cbf52'

print(
    chardet.detect(
        binascii.unhexlify(hex_str)
    )
)

...which gave a pretty strong clue:

{'encoding': 'Windows-1252', 'confidence': 0.73, 'language': ''}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.