Hello StackOverflow community!
I started to learn Node.js recently, and decided to implement a reverse HTTP proxy as a task. There were a couple of rough places, which I managed to get through on my own, but now I'm a bit of stuck, and need your help. I managed to handle redirects and relative urls, and with implementation of relative url support I faced the problem I'm going to describe.
You can find my code at http://pastebin.com/vZfEfk8r. It's not very big, but still doesn't fit nicely to this page.
So to the problems (there are 2 of them). I'm using http.request to forward client's request to the target server, then waiting for response and sending this response back to client. It works okay for some of the requests, but not for others. This is the first problem: on the web-site I'm using to test the proxy ( http://ixbt.com, cool russian web-site about the tech) I can always get the main page /index.html, but when browser starts to fetch other files referenced from that page (css, img, etc.), most of the requests are ending with ParseError ({"bytesParsed":0}).
While debugging (using Wireshark) I noticed that at some of the requests (if not all) fail with this error when the following HTTP negotiation between proxy and target server occurs:
Request:
GET articles/pics2/201206/coolermaster-computex2012_70x70.jpg HTTP/1.1
Host: www.ixbt.com
Connection: keep-alive
Response:
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>nginx</center>
</body>
</html>
Looks like server doesn't send the status code, and no headers. So the question is, can this be the reason of failure (ParseError)?
My another concern is that when I'm trying to get the same file as a standalone request, I have no problems. Just look:
Request:
GET /articles/pics2/201206/coolermaster-computex2012_70x70.jpg HTTP/1.1
Host: www.ixbt.com
Connection: keep-alive
Response:
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 25 Jun 2012 17:09:51 GMT
Content-Type: image/jpeg
Content-Length: 3046
Last-Modified: Fri, 22 Jun 2012 00:06:27 GMT
Connection: keep-alive
Expires: Wed, 25 Jul 2012 17:09:51 GMT
Cache-Control: max-age=2592000
Accept-Ranges: bytes
... and here goes the body ...
So in the end of the day there may be some mistake in how I do the proxy requests. Maybe it's because I actually do lots of them, when the main page is loaded - it has many images, etc.?
I hope I was clear enough, but please ask about details if I missed something. And the full source code is available (again, at the http://pastebin.com/vZfEfk8r), so if somebody would try it, it would be just great. :)
Much thanks in advance!
P.S. As I said, I'm just learning, so if you'll see some bad practices in my code (even unrelated to the question), it would be nice know them.
UPDATE: As was mentioned in comment, I didn't proxied the original request's headers, which in theory could lead to problems with the following requests. I changed that, but, unfortunately, the behavior remained the same. Here's example of new request and response:
Request
GET css/main_fixed.css HTTP/1.1
Host: www.ixbt.com
connection: keep-alive
cache-control: no-cache
pragma: no-cache
user-agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.56 Safari/536.5
accept: text/css,*/*;q=0.1
accept-encoding: gzip,deflate,sdch
accept-language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4
accept-charset: windows-1251,utf-8;q=0.7,*;q=0.3
referer: http://www.ixbt.com/
Response
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>nginx</center>
</body>
</html>
I had to craft the 'referer' header by hand, since browser is sending it with reverse proxy url. Still behavior is the same, as you can see. Any other ideas?
400because of the lack of a specific header. Can you compare both requests but fully, i.e. with headers?url.parse(clientRequest.url.substring(1));this is used? Shouldn't HTTP Request-URI begin with/string? I mean, what happens if you leave justclientRequest.urlhere?serverRequest.url == '/http://ixbt.com'. This leading slash has to be removed, so that url could be parsed to have host component, not just path. But actually you spotted this right, as I just happened to find the answer, and it is related to your question. I'll post my own answer soon.