http request

P

Peder Ydalus

I'm trying to write a program that will dynamically let me download
pictures from a website. The problem seems to be, however, that when I
use getstore() or write the (e.g.) ".../images/01.jpg" address in
manually, the server redirects the request to some add page. I guess
it's checking that the only way to get to these pics is that the
requestee has $REMOTE_ADDR, $REMOTE_HOST or $HTTP_REFERER or something
set in the request.

If I need to manually construct such a request, what is the way to go
about this?

Thanks!

- Peder -
 
R

Richard Gration

I'm trying to write a program that will dynamically let me download
pictures from a website. The problem seems to be, however, that when I
use getstore() or write the (e.g.) ".../images/01.jpg" address in
manually, the server redirects the request to some add page. I guess
it's checking that the only way to get to these pics is that the
requestee has $REMOTE_ADDR, $REMOTE_HOST or $HTTP_REFERER or something
set in the request.
If I need to manually construct such a request, what is the way to go
about this?
Thanks!
- Peder -

Hi,

This is how I would go (have gone) about this:

1.Use a packet sniffer (eg ethereal) to find the headers from a successful
request
2. See if you can duplicate this successful request from a perl script by
setting the relevant [1] headers correctly. Setting headers is explained
in the docs for the lwp lib. If yes, you're done. If not ...
3. Set up a cookie jar (also explained in the docs) in your perl script
and see if this improves matters.

If none of this works, post with your results.

Might I also suggest you look into wget, a utility for bulk download of
web pages.

HTH
Rick

[1] Referer: is a good candidate for a relevant header. There may be
others. Also, some web sites react differently based on the User-Agent:
string.
 
I

Iain Chalmers

Richard Gration said:
I'm trying to write a program that will dynamically let me download
pictures from a website. The problem seems to be, however, that when I
use getstore() or write the (e.g.) ".../images/01.jpg" address in
manually, the server redirects the request to some add page. I guess
it's checking that the only way to get to these pics is that the
requestee has $REMOTE_ADDR, $REMOTE_HOST or $HTTP_REFERER or something
set in the request.
If I need to manually construct such a request, what is the way to go
about this?
Thanks!
- Peder -

Hi,

This is how I would go (have gone) about this:

1.Use a packet sniffer (eg ethereal) to find the headers from a successful
request
2. See if you can duplicate this successful request from a perl script by
setting the relevant [1] headers correctly. Setting headers is explained
in the docs for the lwp lib. If yes, you're done. If not ...
3. Set up a cookie jar (also explained in the docs) in your perl script
and see if this improves matters.

Even easier is to use Web Scraping Proxy from:

http://www.research.att.com/~hpk/wsp/

"Web Scraping Proxy

Programmers often need to use information on Web pages as input to
other programs. This is done by Web Scraping, writing a program to
simulate a person viewing a Web site with a browser. It is often hard
to write these programs because it is difficult to determine the Web
requests necessary to do the simulation.

The Web Scraping Proxy (WSP) solves this problem by monitoring the flow
of information between the browser and the Web site and emitting Perl
LWP code fragments that can be used to write the Web Scraping program.
A developer would use the WSP by browsing the site once with a browser
that accesses the WSP as a proxy server. He then uses the emitted code
as a template to build a Perl program that accesses the site. "

cheers,
big
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top