Caching robots.txt in LWP::RobotUA

Discussion in 'Perl Misc' started by pwaring@gmail.com, Mar 15, 2010.

  1. Guest

    I'm using LWP::RobotUA to download a series of pages with the
    following code:

    #!/usr/bin/perl -w

    use strict;
    use LWP::RobotUA;

    my %options = ('agent' => 'crawler', 'show_progress' => 1, 'delay' =>
    10/60, 'from' => '');

    my $ua = LWP::RobotUA->new(%options);

    my @all_urls = (array of liniks populated from elsewhere);

    foreach my $url (@all_urls)
    {
    $filename = "$url.html";
    $ua->mirror($url, $filename);
    }
    }

    The problem is that LWP::RobotUA seems to make a GET request for the
    robots.txt file each time I call the mirror() method, even though all
    of the URLs are on the same domain. I'd expect the module to cache the
    file, either in memory or on disk, because it's highly unlikely to
    change between requests, but it doesn't seem to do so.

    Do I need to write my own cache module, or tack on an existing one
    from CPAN? I was hoping that calling mirror() would Just Work.

    Thanks in advance!
    , Mar 15, 2010
    #1
    1. Advertising

  2. Guest

    On 15 Mar, 20:15, "" <> wrote:
    > foreach my $url (@all_urls)
    > {
    >     $filename = "$url.html";
    >     $ua->mirror($url, $filename);
    >   }
    >
    > }


    That second bracket shouldn't be there - I forgot to omit it when
    snipping the code down to just show the relevant parts.
    , Mar 15, 2010
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Frankie

    OT: Opinions on Robots.txt

    Frankie, Oct 9, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    997
    S. Justin Gengo
    Oct 10, 2005
  2. Daniel Vesma
    Replies:
    15
    Views:
    1,502
    Jacqui or (maybe) Pete
    Jul 2, 2003
  3. Neil White

    Re: robots.txt

    Neil White, Aug 8, 2003, in forum: HTML
    Replies:
    0
    Views:
    390
    Neil White
    Aug 8, 2003
  4. Replies:
    0
    Views:
    79
  5. Tim w

    meta robots and robots txt

    Tim w, May 22, 2014, in forum: HTML
    Replies:
    1
    Views:
    98
Loading...

Share This Page