lwp::simple get (why it would stop working along with wget when fetch still works)

R

rockerd

Hi Perl People,
Something recently changed on a site that I was fetching and parsing
from with lwp::simple.
Here is the thing: For the longest time I was using get() to grab a
http: site and store it in a scalar which I parsed later. Suddenly I
get an empty but defined scalar with: $html = get($url);

More: when I use fetch on a freebsd system it pulls the page to text
without any problems but when I use wget on a linux system I get a
blank file. Everything used to work. I tried changing my user-agent
headers and have had no luck. The only thing I can see is that the
file has an unknown length.. but I don't know what to do.

Thanks for the advice,
Rocker
 
G

Gunnar Hjalmarsson

Something recently changed on a site that I was fetching and parsing
from with lwp::simple.
Here is the thing: For the longest time I was using get() to grab a
http: site and store it in a scalar which I parsed later. Suddenly I
get an empty but defined scalar with: $html = get($url);

Maybe the web server doesn't like requests that are generated by Perl.
:( You may want to try without sending a client identifier:

use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->agent(''); # <- This line may make a difference
my $response = $ua->get('http://www.perl.org/');
print $response->content;
 
P

Peter J. Holzer

Something recently changed on a site that I was fetching and parsing
from with lwp::simple.
Here is the thing: For the longest time I was using get() to grab a
http: site and store it in a scalar which I parsed later. Suddenly I
get an empty but defined scalar with: $html = get($url);

Use LWP::Simple only if you are absolutely sure that you never need the
return code or headers. LWP::UserAgent is almost always the better
choice, especially if you have to handle errors or strange behaviour.

More: when I use fetch on a freebsd system it pulls the page to text
without any problems but when I use wget on a linux system I get a
blank file. Everything used to work. I tried changing my user-agent
headers and have had no luck.

Is "a linux system" the system where the script normally runs and "a
freebsd system" a different system? It might be that the owner of the
site noticed that you are automatically retrieving data and blocking
your IP address.

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top