Need Perl teacher/school: Network programming

I

Irving Kimura

For months (actually years) I have been trying to learn how to
write a Perl proxy that can do the following: intercept *all* HTTP
and HTTPS traffic to and from my browser, and write all of it to
a log file, decoding and encoding the HTTPS stuff as necessary so
that all the logged text is intelligible. The proxy must be able
to handle redirection, pop-ups, frames, and any other tricks the
browser and server may be up to.

I have looked everywhere for help with this; I have consulted every
Perl book I know, I have asked every Perl programmer I know, and
posted in every Perl forum I know. I have found that Perl programmers
fall in two sets: the ones that don't know how to do this (I am in
this set), and those for whom it is so obvious and trivial that
asking them how to do it is like asking someone to explain to you
how to swallow. "What do you mean how to swallow? Just...
swallow!!!"

I have to conclude that the problem is difficult enough that its
solution cannot be readily explained to someone who doesn't already
know how to solve it. Hence I need a teacher: someone I will *pay*
to explain to me how to do this. Or better yet, someone who will
write the proxy for me, with enough comments that I will be able
to understand it completely.

Where? Who? How much?

Thanks!

-Irv
 
J

J. Gleixner

Irving said:
For months (actually years) I have been trying to learn how to
write a Perl proxy that can do the following: intercept *all* HTTP
and HTTPS traffic to and from my browser, and write all of it to
a log file, decoding and encoding the HTTPS stuff as necessary so
that all the logged text is intelligible. The proxy must be able
to handle redirection, pop-ups, frames, and any other tricks the
browser and server may be up to.

I have looked everywhere for help with this; I have consulted every
Perl book I know, I have asked every Perl programmer I know, and
posted in every Perl forum I know. I have found that Perl programmers
fall in two sets: the ones that don't know how to do this (I am in
this set), and those for whom it is so obvious and trivial that
asking them how to do it is like asking someone to explain to you
how to swallow. "What do you mean how to swallow? Just...
swallow!!!"

I have to conclude that the problem is difficult enough that its
solution cannot be readily explained to someone who doesn't already
know how to solve it. Hence I need a teacher: someone I will *pay*
to explain to me how to do this. Or better yet, someone who will
write the proxy for me, with enough comments that I will be able
to understand it completely.

Where? Who? How much?

Thanks!

-Irv

The following should help you do what you want:

http://www.stonehenge.com/merlyn/WebTechniques/col11.html
 
A

A. Sinan Unur

For months (actually years) I have been trying to learn how to
write a Perl proxy that can do the following: intercept *all* HTTP
and HTTPS traffic to and from my browser, and write all of it to
a log file, decoding and encoding the HTTPS stuff as necessary so
that all the logged text is intelligible. The proxy must be able
to handle redirection, pop-ups, frames, and any other tricks the
browser and server may be up to.

The article "Web Scraping Proxy" by Howard P. Katseff in Dr. Dobbs' June 03
issue might be of some help.

http://www.ddj.com/articles/2003/0306/
 
J

Jacqui Caren

I appreciate the pointer, but I'm familiar with that script, and
I can tell you that 1) it can't handle anything other than the
simplest HTTP; most importantly, it doesn't listen to the HTTPS
port; and 2) some of the classes it uses have changed since 1997,
and as a result the script is broken.

Have you considered taking the apache proxy code (under apache2)
and putting the logging code into this.

The perl proxy module can and does allow you to do some wondefull
things - not just content lagging.

I and a work colleague did a web proxy that converted certain
occurrences of Microsoft specific terminology to Borgism's
such as drone, collective, assimilation :)

We then pointed it at a number of web sites - the results were
VERY scary...

The object of the excercise was to get across the idea of contextual
parsing of content. Our exercise was funny bu did get the very
important point across.

Note sure if anyone has gotten any of the proxy modules
working with SSL. Be very interested if they have though
although it does seem rather silly to app-relay what should
be a single SSL encypted connection and then store the contents
in plain text ;_)

Jacqui
 
A

Alan J. Flavell

The perl proxy module can and does allow you to do some wondefull
things - not just content lagging.

It's a bit warm for that just now ;-) SCNR
Note sure if anyone has gotten any of the proxy modules
working with SSL. Be very interested if they have though
although it does seem rather silly to app-relay what should
be a single SSL encypted connection and then store the contents
in plain text ;_)

I'm a bit confused as to what you have in mind here.

If that's possible at all, I mean other than deliberate co-operation
between the server and the proxy, then it represents a complete
security failure. The client and server are supposed to negotiate an
end to end encrypted path precisely in order to prevent any
intermediate from overhearing what goes on. If the proxy succeeds in
masquerading as the target server, then that whole purpose is
defeated, and the crypto folk would surely be working on overtime to
solve the problem, no? But you know this already, so you must have
had something else in mind, I trow.

Sure, if you had the "proxy" (in the informal sense, rather than in
the strict HTTP sense) acting as an "httpd accelerator" (as the squid
folks used to call it, anyway), then from the point of view of the
client the "proxy" _is_ the end server, and what goes on behind the
scenes is something else. OK, so I guess that might have been what
you had in mind...?

cheers
 
N

nobull

Alan J. Flavell said:
I'm a bit confused as to what you have in mind here.

If that's possible at all, I mean other than deliberate co-operation
between the server and the proxy, then it represents a complete
security failure.

Actually this is a feature I've considered adding to Apache to help me
reverse engineer some forms on https site for use with LWP.

It does not require the co-operation of the server. It only requires
the co-operation of _either_ the client _or_ the server.

So long as the proxy holds a CA private key that's trusted by the
client you can get away with it.
The client and server are supposed to negotiate an
end to end encrypted path precisely in order to prevent any
intermediate from overhearing what goes on. If the proxy succeeds in
masquerading as the target server, then that whole purpose is
defeated, and the crypto folk would surely be working on overtime to
solve the problem, no?

There is no technological solution to the human problem of choosing
the who to trust. If I can trick you into trusting my CA then I can
intercept your https traffic.
 
A

Alan J. Flavell

If that's possible at all, I mean other than deliberate co-operation
between the server and the proxy, then it represents a complete
security failure.
[..]
It does not require the co-operation of the server. It only requires
the co-operation of _either_ the client _or_ the server.

Good point, thanks.
So long as the proxy holds a CA private key that's trusted by the
client you can get away with it. [..]
There is no technological solution to the human problem of choosing
the who to trust. If I can trick you into trusting my CA then I can
intercept your https traffic.

Can't argue with that.

all the best
 
J

Jacqui Caren

If that's possible at all, I mean other than deliberate co-operation
between the server and the proxy, then it represents a complete
security failure.
[..]
It does not require the co-operation of the server. It only requires
the co-operation of _either_ the client _or_ the server.

Good point, thanks.
So long as the proxy holds a CA private key that's trusted by the
client you can get away with it. [..]
There is no technological solution to the human problem of choosing
the who to trust. If I can trick you into trusting my CA then I can
intercept your https traffic.

Can't argue with that.

all the best

Sorry for the delay.

I was explaining to someone at work what I like to call the three clases
of proxy. I will ignore caching...

a proxy
-------
You point your browser at this and it gets pages for you. :)

often used by companies as part of access control and content
filtering.

very easy to add filters to remove or change content.

Often used to strip javascript and advertising images and links.
- such as dropping all "doubleclick" ads - sorry Tim :)

I use one to strip "nasty" bits of *script in web pages I do not
want my browser to see. I now strip the macromedia stuff from the
reg because it screws up the current mozilla beta when ran without
flash plugin - the reg has a oad.macormedia.com link instead of
load.macromedia.com link and this causes a tight window open look
in moz.

a reverse proxy
---------------
It appears as the real web site but forwards requests to the real web
server that does the work.

Often installed within the network of the WSP.

Usually the RP is in the DMZ and the web server is behind a median
firewall.

used by companies so that DOS attacks that take out the RP does not
impact the backend web server. As a RP is cheap and light
you can very cheaply build a large pool with a much smaller number
of backend web servers.

a virtual reverse proxy
-----------------------
This accepts requests as one (www.foo.com) and forwards them to
a master web site (www.foo.co.uk) for processing.
The result is parsed to ensure links etc do not point to foo.co.uk etc.

Jacqui
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top