Transparent (redirecting) proxy with BaseHTTPServer

P

paul koelle

Hi list,

My ultimate goal is to have a small HTTP proxy which is able to show a
message specific to clients name/ip/status then handle the original
request normally either by redirecting the client, or acting as a proxy.

I started with a modified[1] version of TinyHTTPProxy postet by Suzuki
Hisao somewhere in 2003 to this list and tried to extend it to my needs.
It works quite well if I configure my client to use it, but using
iptables REDIRECT feature to point the clients transparently to the
proxy caused some issues.

Precisely, the "self.path" member variable of baseHTTPRequestHandler is
missing the <command> and the host (i.e www.python.org) part of the
request line for REDIRECTed connections:

without iptables REDIRECT:
self.path -> GET http://www.python.org/ftp/python/contrib/ HTTP/1.1

with REDIRECT:
self.path -> GET /ftp/python/contrib/ HTTP/1.1

I asked about this on the squid mailing list and was told this is normal
and I have to reconstuct the request line from the real destination IP,
the URL-path and the Host header (if any). If the Host header is sent
it's an (unsafe) nobrainer, but I cannot for the life of me figure out
where to get the "real destination IP". Any ideas?

thanks
Paul

[1] HTTP Debugging Proxy
Modified by Xavier Defrang (http://defrang.com/)
 
A

aurora

If you actually want the IP, resolve the host header would give you that.

In the redirect case you should get a host header like

Host: www.python.org

From that you can reconstruct the original URL as
http://www.python.org/ftp/python/contrib/. With that you can open it using
urllib and proxy the data to the client.

The second form of HTTP request without the host part is for compatability
of pre-HTTP/1.1 standard. All modern web browser should send the Host
header.
 
P

paul koelle

Thanks, aurora ;),
If you actually want the IP, resolve the host header would give you that.
I' m only interested in the hostname.
The second form of HTTP request without the host part is for
compatability of pre-HTTP/1.1 standard. All modern web browser should
send the Host header.
How safe is the assumtion that the Host header will be there? Is it part
of the HTTP/1.1 spec? And does it mean all "pre 1.1" clients will fail?
Hmm, maybe I should look on the wire whats really happening...

thanks again
Paul
 
A

aurora

It should be very safe to count on the host header. Maybe some really
really old browser would not support that. But they probably won't work in
today's WWW anyway. Majority of today's web site is likely to be virtually
hosted. One Apache maybe hosting for 50 web addresses. If a client strip
the host name and not sending the host header either the web server
wouldn't what address it is really looking for. If you caught some request
that doesn't have host header it is a good idea to redirect them to a
browser upgrade page.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top