Making a socket connection via a proxy server

F

Fuzzyman

In a nutshell - the question I'm asking is, how do I make a socket
conenction go via a proxy server ?
All our internet traffic has to go through a proxy-server at location
'dav-serv:8080' and I need to make a socket connection through it.

The reason (with code example) is as follows :

I am hacking "Tiny HTTP Proxy" by SUZUKI Hisao to make an http proxy
that modifies URLs. I haven't got very far - having started from zero
knowledge of the 'hyper text transfer protocol'.

It looks like the Tiny HTTP Proxy (using BaseHTTPServer as it's
foundation) intercepts all requests to local addresses and then
re-implements the request (whether it is CONNECT, GET, PUT or
whatever). It logs everything that goes through it - I will simply
edit it to amend the URL that is being asked for.

It looks like the CONNECT and GET requests are just implemented using
simple socket commands. (I say simple because there isn't a lot of
code - I'm not familiar with the actual behaviour of sockets, but it
doesn't look too complicated).

What I need to do is rewrite the soc.connect(host_port) line in the
following example so that it connects *via* my proxy-server. (which it
doesn't by default).

I think the current format of host_port is a tuple : (host_domain,
port_no)

Below is a summary of the GET command (I've inlined all the method
calls - this example starts from the do_GET method) :

soc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
soc.connect(host_port)
soc.send("%s %s %s\r\n" % (
self.command,
urlparse.urlunparse(('', '', path, params, query, '')),
self.request_version))
self.headers['Connection'] = 'close'
del self.headers['Proxy-Connection']
for key_val in self.headers.items():
soc.send("%s: %s\r\n" % key_val)
soc.send("\r\n")

max_idling=20 # this is really
part of a self._read_write method
iw = [self.connection, soc]
ow = []
count = 0
while 1:
count += 1
(ins, _, exs) = select.select(iw, ow, iw, 3)
if exs: break
if ins:
for i in ins:
if i is soc:
out = self.connection
else:
out = soc
data = i.recv(8192)
if data:
out.send(data)
count = 0
else:
print "\t" "idle", count
if count == max_idling: break

print "\t" "bye"
soc.close()
self.connection.close()

Regards,


Fuzzy

http://www.voidspace.org.uk/atlantibots/pythonutils.html
 
D

Diez B. Roggisch

Fuzzyman said:
In a nutshell - the question I'm asking is, how do I make a socket
conenction go via a proxy server ?
All our internet traffic has to go through a proxy-server at location
'dav-serv:8080' and I need to make a socket connection through it.

Short answer: Its not possible.

The long answer: The proxy (dav-serv] isn't transparent - it is implemented
using http. That means that the browser knows about a proxp beeing in the
middle, and rewrites its request accordingly. An example:

we want to connect to http://foo.bar.com/foo.html

Usually, a http get looks like this:

GET /foo.html HTTP/1.1

It is made by opening a connection to foo.bar.com on port 80.

Now if you add a proxy, things look like this:

GET http://foo.bar.com/foo.bar HTTP/1.1

And its sended to your proxy.

Notice the difference? Now the fully qualified url is sended, so the proxy
can make the request itself.

So because of this, there is no such thing like a socket connection through
a http-proxy.

If you still need a solution, you might be able to alter the tiny http proxy
to use http itself, instead of direct socket connections. Thus you might be
able to contact a proxy yourself.
 
A

Alan Kennedy

[Fuzzyman]
In a nutshell - the question I'm asking is, how do I make a socket
conenction go via a proxy server ?
All our internet traffic has to go through a proxy-server at location
'dav-serv:8080' and I need to make a socket connection through it.
I am hacking "Tiny HTTP Proxy" by SUZUKI Hisao to make an http proxy
that modifies URLs. I haven't got very far - having started from zero
knowledge of the 'hyper text transfer protocol'.

It looks like the Tiny HTTP Proxy (using BaseHTTPServer as it's
foundation) intercepts all requests to local addresses and then
re-implements the request (whether it is CONNECT, GET, PUT or
whatever). It logs everything that goes through it - I will simply
edit it to amend the URL that is being asked for.

Yes, that is exactly what the proxy should do. It relays requests
between client and server. However, there is one vital detail you're
probably missing that is preventing you from chaining client + proxy*N
+ server together.

When sending a HTTP GET request to a server, a client sends a request
line containing a URI without a server component. This is because the
socket connection to the server is already formed, therefore the
server connection details do not need to be repeated. So a standard
GET will look like this

GET /index.html HTTP/1.1

However, it's different when a client connects to a proxy, because the
socket no longer connects directly to the server, but to the proxy
instead. The proxy still needs to know to which server it should send
the request. So the correct format for sending requests to a proxy is
to use the "absoluteURI" form, which includes the server details, e.g.

GET http://www.python.org:80/index.html HTTP/1.1

Any proxy that receives such a request now knows that the server to
forward to is "www.python.org:80". It will open a connection to
www.python.org:80, and send it a GET request for the URI.

Since you want your proxy to forward to another proxy, i.e. your proxy
is a client from your external-access-proxy's point of view, you
should also use the absoluteURI form when making requests from your
python proxy to your external proxy.
It looks like the CONNECT and GET requests are just implemented using
simple socket commands. (I say simple because there isn't a lot of
code - I'm not familiar with the actual behaviour of sockets, but it
doesn't look too complicated).

What I need to do is rewrite the soc.connect(host_port) line in the
following example so that it connects *via* my proxy-server. (which it
doesn't by default).

I think the current format of host_port is a tuple : (host_domain,
port_no)

Below is a summary of the GET command (I've inlined all the method
calls - this example starts from the do_GET method) :

soc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
soc.connect(host_port)

What is the value of host_port at this point? It *should* be the
address of your external access proxy, i.e. dav-serv:8080
soc.send("%s %s %s\r\n" % (
self.command,
urlparse.urlunparse(('', '', path, params, query, '')),
self.request_version))

And you're not sending an absoluteURI: this should be amended to
contain the server details of the the server that is finally going to
service the request. For the python.org example above, this code would be

soc.send("%s %s %s\r\n" % (
self.command,
urlparse.urlunparse(('http', 'www.python.org:80', path, params,
query, '')),
self.request_version))

though of course, these values should be made available to you by
TinyHTTPProxy. Taking a brief look at the code, these values should
available through the variables "scm" and "netloc". So your outgoing
connection code from TinyHTTPProxy should look something like this

soc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
soc.connect( ('dav-serv', 8080) )
soc.send("%s %s %s\r\n" % (
self.command,
urlparse.urlunparse((scm, netloc, path, params, query, '')),
self.request_version))

HTH,
 
H

Heiko Wundram

Am Freitag, 30. Juli 2004 18:51 schrieb Diez B. Roggisch:
Short answer: Its not possible.

Longer answer: it is possible if you use DNAT on some router between the
computer which opens the request and the destination machine. Check out squid
transparent proxy howtos you can find on the net. The protocol will need
HTTP/1.1 for this, though.

Small example, which clarifies why this is possible:

Computer 1 opens http (port 80) connection to computer 2.

Router 1 sits in the middle, sees a port 80 connection is opened to some
computer 2, and rewrites the incoming packet to have a new destination
address/port (DNAT), namely proxy 1 with port 3128 (standard http-proxy port,
at least for squid), and a new source address/port (SNAT), namely router 1
with some port.

Proxy 1 gets the following (from router 1):

GET /foo.html HTTP/1.1
Host: www.foo.com:80
<other headers>

Proxy 1 opens the connection to www.foo.com port 80 (now, the router sees that
the connection comes from proxy, it must not do address rewriting), gets the
result, and stores it locally.

proxy 1 then sends the packets back to router 1 (because the proxy request
seems to have come from router; if you leave out SNAT in the rewriting step,
it'll seem to have come from the actual computer, and this is fine too, but
then you have to be sure that the return packet also has to go through the
router), and now router 1 does reverse DNAT and SNAT to return the packet to
computer 1, which will see a source address of computer 2 and port 80 on the
packet.

computer 1 sees the result, and thinks it came from the outside machine,
although through some SNAT/DNAT the packets actually originated from the
proxy.

This is basically it.

If you want to implement this, as I said, read up on transparent proxy howtos
for squid. Pretty much every proxy can be made to support this, as with
HTTP/1.1 the Host: header is a required header, and thus the proxy can always
extract the host which was queried from the request, even when it isn't
passed as the others have suggested.

On another note: I assumed you wanted to transparently relay/rewrite HTTP
through the proxy. If you need to open some form of socket connection to the
proxy which is not HTTP, the proxy protocol supports the method CONNECT,
which will simply open up a socket connection which is relayed by the proxy.
But: This cannot be made transparent, except by some deeper magic in the
router.

HTH!

Heiko.
 
F

Fuzzyman

[snip..]
What is the value of host_port at this point? It *should* be the
address of your external access proxy, i.e. dav-serv:8080


And you're not sending an absoluteURI: this should be amended to
contain the server details of the the server that is finally going to
service the request. For the python.org example above, this code would be

soc.send("%s %s %s\r\n" % (
self.command,
urlparse.urlunparse(('http', 'www.python.org:80', path, params,
query, '')),
self.request_version))

though of course, these values should be made available to you by
TinyHTTPProxy. Taking a brief look at the code, these values should
available through the variables "scm" and "netloc". So your outgoing
connection code from TinyHTTPProxy should look something like this

soc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
soc.connect( ('dav-serv', 8080) )
soc.send("%s %s %s\r\n" % (
self.command,
urlparse.urlunparse((scm, netloc, path, params, query, '')),
self.request_version))

HTH,


Thanks to all of you who replied.
I think I uderstand enough to have a go - I need to make the
connection to the proxy and the request for the absolute URI. That at
least gives me something to go at and it shouldn't be too hard.

Many Thanks for your help.

Fuzzyman

http://www.voidspace.org.uk/atlantibots/pythonutils.html
 
F

Fuzzyman

[snip..]
Yes, that is exactly what the proxy should do. It relays requests
between client and server. However, there is one vital detail you're
probably missing that is preventing you from chaining client + proxy*N
+ server together.

When sending a HTTP GET request to a server, a client sends a request
line containing a URI without a server component. This is because the
socket connection to the server is already formed, therefore the
server connection details do not need to be repeated. So a standard
GET will look like this

GET /index.html HTTP/1.1

However, it's different when a client connects to a proxy, because the
socket no longer connects directly to the server, but to the proxy
instead. The proxy still needs to know to which server it should send
the request. So the correct format for sending requests to a proxy is
to use the "absoluteURI" form, which includes the server details, e.g.

GET http://www.python.org:80/index.html HTTP/1.1

Any proxy that receives such a request now knows that the server to
forward to is "www.python.org:80". It will open a connection to
www.python.org:80, and send it a GET request for the URI.

Since you want your proxy to forward to another proxy, i.e. your proxy
is a client from your external-access-proxy's point of view, you
should also use the absoluteURI form when making requests from your
python proxy to your external proxy.

Well the two minor changes you suggested worked straight away for
normal HTML pages - great.
It's not fetching images and a couple of other problems (possibly
because that proxy server can only handle HTTP/1.0 - but I have a more
advanced one called TcpWatch from Zope that I might hack around).

But there's more than enough for me to go on and get it working.

MANY THANKS

Regards,

Fuzzy

http://www.voidspace.org.uk/atlantibots/pythonutils.html
 
F

Fuzzyman

[snip..]
On another note: I assumed you wanted to transparently relay/rewrite HTTP
through the proxy. If you need to open some form of socket connection to the
proxy which is not HTTP, the proxy protocol supports the method CONNECT,
which will simply open up a socket connection which is relayed by the proxy.
But: This cannot be made transparent, except by some deeper magic in the
router.

HTH!

Heiko.

Thanks for your help.
It's only http that I'll be relayign and I only need it to be
transparent to the user - I'm not using this for anonymity.

I don't yet understand the detail of what you've said, but I am
following hte resources you've suggested and now have enough to get to
the next stage of my work.

Thanks

Fuzzyman

http://www.voidspace.org.uk/atlantibots/pythonutils.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,188
Latest member
Crypto TaxSoftware

Latest Threads

Top