1

I use scapy to sniff some packets and I get some HTTP response packets that are bytes I cannot parse.For example:

  b'HTTP/1.1 200 OK\r\nDate: Thu, 07 Dec 2017 02:44:18 GMT\r\nServer:Apache/2.4.18 (Ubuntu)\r\nLast-Modified: Tue, 14 Nov 2017 05:51:36 GMT\r\nETag: "2c39-55deafadf0ac0-gzip"\r\nAccept-Ranges: bytes\r\nVary: Accept-Encoding\r\nContent-Encoding: gzip\r\nContent-Length: 3186\r\nConnection: close\r\nContent-Type: text/html\r\n\r\n\x1f\x8b'

Is there a way to get the content part of this byte array so I can use gzip library to decode? I don't want to use request to get the HTTP response because I merely want to process the raw packet I had.

2 Answers 2

4

There's no built-in way to parse a raw HTTP response like this and handle compression properly. I would use urllib3:

import urllib3

from io import BytesIO
from http.client import HTTPResponse

class BytesIOSocket:
    def __init__(self, content):
        self.handle = BytesIO(content)

    def makefile(self, mode):
        return self.handle

def response_from_bytes(data):
    sock = BytesIOSocket(data)

    response = HTTPResponse(sock)
    response.begin()

    return urllib3.HTTPResponse.from_httplib(response)

if __name__ == '__main__':
    import socket

    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect(('httpbin.org', 80))
    sock.send(b'GET /gzip HTTP/1.1\r\nHost: httpbin.org\r\n\r\n')

    raw_response = sock.recv(8192)

    response = response_from_bytes(raw_response)
    print(response.headers)
    print(response.data)
Sign up to request clarification or add additional context in comments.

5 Comments

Thank you very much! This is exactly what I need!
@Hi, I still have a question though. How to parse a HTTP Request raw bytes?
@user6456568: What do you mean? In my example code, raw_response is the raw HTTP response with a gzip-compressed body.
I have some raw bytes, and they are either HTTP request or response, I want to parse them both.
@user6456568: parsing HTTP requests is a different problem: stackoverflow.com/questions/39090366/…
1

You can extract the value portion of the bytes with

response_bytes.decode('utf-8')

Then you can parse the returned information with Beautiful Soup for whatever part of it you want.

3 Comments

Thanks. Why I get an error? UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 302: invalid start byte
@user6456568 - Sorry, but I'm not the best person to help with decode issues. My apologies...
@user6456568 Because you're dealing with gzipped response. The body of the response is compressed so you can't just turn it into an utf8 string without first decompressing the body

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.