Caching robots.txt in LWP::RobotUA

P

pwaring

I'm using LWP::RobotUA to download a series of pages with the
following code:

#!/usr/bin/perl -w

use strict;
use LWP::RobotUA;

my %options = ('agent' => 'crawler', 'show_progress' => 1, 'delay' =>
10/60, 'from' => '(e-mail address removed)');

my $ua = LWP::RobotUA->new(%options);

my @all_urls = (array of liniks populated from elsewhere);

foreach my $url (@all_urls)
{
$filename = "$url.html";
$ua->mirror($url, $filename);
}
}

The problem is that LWP::RobotUA seems to make a GET request for the
robots.txt file each time I call the mirror() method, even though all
of the URLs are on the same domain. I'd expect the module to cache the
file, either in memory or on disk, because it's highly unlikely to
change between requests, but it doesn't seem to do so.

Do I need to write my own cache module, or tack on an existing one
from CPAN? I was hoping that calling mirror() would Just Work.

Thanks in advance!
 
P

pwaring

foreach my $url (@all_urls)
{
    $filename = "$url.html";
    $ua->mirror($url, $filename);
  }

}

That second bracket shouldn't be there - I forgot to omit it when
snipping the code down to just show the relevant parts.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top