http request

Discussion in 'Perl Misc' started by Peder Ydalus, Jan 14, 2004.

  1. Peder Ydalus

    Peder Ydalus Guest

    I'm trying to write a program that will dynamically let me download
    pictures from a website. The problem seems to be, however, that when I
    use getstore() or write the (e.g.) ".../images/01.jpg" address in
    manually, the server redirects the request to some add page. I guess
    it's checking that the only way to get to these pics is that the
    requestee has $REMOTE_ADDR, $REMOTE_HOST or $HTTP_REFERER or something
    set in the request.

    If I need to manually construct such a request, what is the way to go
    about this?

    Thanks!

    - Peder -
     
    Peder Ydalus, Jan 14, 2004
    #1
    1. Advertising

  2. In article <bu3h72$dgh$>, "Peder Ydalus"
    <> wrote:


    > I'm trying to write a program that will dynamically let me download
    > pictures from a website. The problem seems to be, however, that when I
    > use getstore() or write the (e.g.) ".../images/01.jpg" address in
    > manually, the server redirects the request to some add page. I guess
    > it's checking that the only way to get to these pics is that the
    > requestee has $REMOTE_ADDR, $REMOTE_HOST or $HTTP_REFERER or something
    > set in the request.
    > If I need to manually construct such a request, what is the way to go
    > about this?
    > Thanks!
    > - Peder -
    >


    Hi,

    This is how I would go (have gone) about this:

    1.Use a packet sniffer (eg ethereal) to find the headers from a successful
    request
    2. See if you can duplicate this successful request from a perl script by
    setting the relevant [1] headers correctly. Setting headers is explained
    in the docs for the lwp lib. If yes, you're done. If not ...
    3. Set up a cookie jar (also explained in the docs) in your perl script
    and see if this improves matters.

    If none of this works, post with your results.

    Might I also suggest you look into wget, a utility for bulk download of
    web pages.

    HTH
    Rick

    [1] Referer: is a good candidate for a relevant header. There may be
    others. Also, some web sites react differently based on the User-Agent:
    string.
     
    Richard Gration, Jan 14, 2004
    #2
    1. Advertising

  3. In article <bu3kpm$6fs$2surf.net>,
    "Richard Gration" <> wrote:

    > In article <bu3h72$dgh$>, "Peder Ydalus"
    > <> wrote:
    >
    >
    > > I'm trying to write a program that will dynamically let me download
    > > pictures from a website. The problem seems to be, however, that when I
    > > use getstore() or write the (e.g.) ".../images/01.jpg" address in
    > > manually, the server redirects the request to some add page. I guess
    > > it's checking that the only way to get to these pics is that the
    > > requestee has $REMOTE_ADDR, $REMOTE_HOST or $HTTP_REFERER or something
    > > set in the request.
    > > If I need to manually construct such a request, what is the way to go
    > > about this?
    > > Thanks!
    > > - Peder -
    > >

    >
    > Hi,
    >
    > This is how I would go (have gone) about this:
    >
    > 1.Use a packet sniffer (eg ethereal) to find the headers from a successful
    > request
    > 2. See if you can duplicate this successful request from a perl script by
    > setting the relevant [1] headers correctly. Setting headers is explained
    > in the docs for the lwp lib. If yes, you're done. If not ...
    > 3. Set up a cookie jar (also explained in the docs) in your perl script
    > and see if this improves matters.


    Even easier is to use Web Scraping Proxy from:

    http://www.research.att.com/~hpk/wsp/

    "Web Scraping Proxy

    Programmers often need to use information on Web pages as input to
    other programs. This is done by Web Scraping, writing a program to
    simulate a person viewing a Web site with a browser. It is often hard
    to write these programs because it is difficult to determine the Web
    requests necessary to do the simulation.

    The Web Scraping Proxy (WSP) solves this problem by monitoring the flow
    of information between the browser and the Web site and emitting Perl
    LWP code fragments that can be used to write the Web Scraping program.
    A developer would use the WSP by browsing the site once with a browser
    that accesses the WSP as a proxy server. He then uses the emitted code
    as a template to build a Perl program that accesses the site. "

    cheers,
    big

    --
    'When I first met Katho, she had a meat cleaver in one hand and
    half a sheep in the other. "Come in", she says, "Hammo's not here.
    I hope you like meat.' Sharkey in aus.moto
     
    Iain Chalmers, Jan 15, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brian Birtle
    Replies:
    2
    Views:
    2,071
    John Saunders
    Oct 16, 2003
  2. Replies:
    6
    Views:
    5,268
    Tor Iver Wilhelmsen
    Aug 29, 2005
  3. James
    Replies:
    3
    Views:
    16,408
    Roedy Green
    Nov 25, 2005
  4. nRk
    Replies:
    1
    Views:
    1,010
    Steven D'Aprano
    Feb 12, 2009
  5. Gelonida N
    Replies:
    0
    Views:
    530
    Gelonida N
    Feb 27, 2012
Loading...

Share This Page