How to parse HTTP raw bytes and get the HTTP content in python?

Question

I use scapy to sniff some packets and I get some HTTP response packets that are bytes I cannot parse.For example:

  b'HTTP/1.1 200 OK\r\nDate: Thu, 07 Dec 2017 02:44:18 GMT\r\nServer:Apache/2.4.18 (Ubuntu)\r\nLast-Modified: Tue, 14 Nov 2017 05:51:36 GMT\r\nETag: "2c39-55deafadf0ac0-gzip"\r\nAccept-Ranges: bytes\r\nVary: Accept-Encoding\r\nContent-Encoding: gzip\r\nContent-Length: 3186\r\nConnection: close\r\nContent-Type: text/html\r\n\r\n\x1f\x8b'

Is there a way to get the content part of this byte array so I can use gzip library to decode? I don't want to use request to get the HTTP response because I merely want to process the raw packet I had.

Blender · Accepted Answer · 2017-12-07 03:52:02Z

4

There's no built-in way to parse a raw HTTP response like this and handle compression properly. I would use urllib3:

import urllib3

from io import BytesIO
from http.client import HTTPResponse

class BytesIOSocket:
    def __init__(self, content):
        self.handle = BytesIO(content)

    def makefile(self, mode):
        return self.handle

def response_from_bytes(data):
    sock = BytesIOSocket(data)

    response = HTTPResponse(sock)
    response.begin()

    return urllib3.HTTPResponse.from_httplib(response)

if __name__ == '__main__':
    import socket

    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect(('httpbin.org', 80))
    sock.send(b'GET /gzip HTTP/1.1\r\nHost: httpbin.org\r\n\r\n')

    raw_response = sock.recv(8192)

    response = response_from_bytes(raw_response)
    print(response.headers)
    print(response.data)

answered Dec 7, 2017 at 3:52

Blender

300k55 gold badges462 silver badges512 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

user6456568 Over a year ago

Thank you very much! This is exactly what I need!

user6456568 Over a year ago

@Hi, I still have a question though. How to parse a HTTP Request raw bytes?

Blender Over a year ago

@user6456568: What do you mean? In my example code, raw_response is the raw HTTP response with a gzip-compressed body.

user6456568 Over a year ago

I have some raw bytes, and they are either HTTP request or response, I want to parse them both.

Blender Over a year ago

@user6456568: parsing HTTP requests is a different problem: stackoverflow.com/questions/39090366/…

GaryMBloom · Accepted Answer · 2017-12-07 03:03:15Z

1

You can extract the value portion of the bytes with

response_bytes.decode('utf-8')

Then you can parse the returned information with Beautiful Soup for whatever part of it you want.

answered Dec 7, 2017 at 3:03

GaryMBloom

5,7472 gold badges29 silver badges34 bronze badges

3 Comments

user6456568 Over a year ago

Thanks. Why I get an error? UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 302: invalid start byte

GaryMBloom Over a year ago

@user6456568 - Sorry, but I'm not the best person to help with decode issues. My apologies...

hangonstack Over a year ago

@user6456568 Because you're dealing with gzipped response. The body of the response is compressed so you can't just turn it into an utf8 string without first decompressing the body

Collectives™ on Stack Overflow

How to parse HTTP raw bytes and get the HTTP content in python?

2 Answers 2

5 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related