How can I keep LWP::UserAgent from adding the http-equiv strings fromthe Head section of the page?

C

CronJob

How can I keep LWP::UserAgent from adding the http-equiv strings from
the Head section of the page? When I run the following program below,
the $headers variable contains three Content-Type: listings. One from
the actual http header and one from the meta tag in the web page.

#!/usr/bin/perl -w

use LWP::UserAgent;
use HTML::parse;
use HTML::Element;
use HTTP::Response;
use HTTP::Request;
use HTTP::Status;
use URI::URL;

my ($code, $desc, $headers, $body)=&makeRequest('GET', 'http://
www.google.com');
print "The headers:\n$headers\n";
print "The body:\n$body\n";

sub makeRequest( ) {
($method, $path) = @_;
# create a user agent object
my $ua = new LWP::UserAgent;
$ua->agent("Mozilla/4.0");

# request a url
my $request = new HTTP::Request($method, $path);
# set values in response object HTTP::Reponse
my $response = $ua->request($request);

# get the details if there is an error
# otherwise parse the response object
my $body=$response->content;
my $code=$response->code;
my $desc=HTTP::Status::status_message($code);
my $headers=$response->headers_as_string;
$body = $response->error_as_HTML if ($response->is_error);
return ($code, $desc, $headers, $body);
}
 
C

CronJob

See the ->parse_head method of LWP::UserAgent.

You might want to try reading the docs of the modules you are using.

Ben

Yes I agree with you. Unfortunately for me, I find the form that is
used in the perl documentation to be abstruse. I learn by working with
example code, not by reading abstract discussions about how code is
that do not contain working examples. Hopefully it will come to me
over time. I had the same issue with man pages years ago, but now its
second nature. I appreciate your response and I will look through the
documentation carefully.
 
C

CronJob

Thank you Ben.

I ran 'perldoc LWP' and found:

The class name for the user agent is "LWP::UserAgent".
<snip>
· The parse_head specifies whether we should initialize
response headers from the <head> section of HTML docu-
ments.

Running 'perldoc LWP::UserAgent' I see that:

$ua = LWP::UserAgent->new( %options )
This method constructs a new "LWP::UserAgent" object and
returns it. Key/value pair arguments may be pro-
vided to set up the initial state. The following options
correspond to attribute methods described below:

KEY DEFAULT
----------- --------------------
parse_head 1


I now realize that the 1 is implicitly a boolean value, and hence that
0 should do the trick for me.

Working code:

#!/usr/bin/perl -w

use strict;
use LWP::UserAgent;
use HTML::parse;
use HTML::Element;
use HTTP::Response;
use HTTP::Request;
use HTTP::Status;
use URI::URL;

my $ie7UAString = 'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0;
en-US)';
my ($code, $desc, $headers,$body) = &LWPUserAgentRequest('GET','http://
www.google.com');
print "The headers:\n$headers\n";
print "The body:\n$body\n";

sub LWPUserAgentRequest {
my ($method, $path) = @_;
my $ua = new LWP::UserAgent;
$ua->agent($ie7UAString);
$ua->parse_head(0);
my $request = new HTTP::Request($method, $path);
my $response = $ua->request($request);
my $body = $response->content;
$body = $response->error_as_HTML if ($response->is_error);
my $code = $response->code;
my $desc = HTTP::Status::status_message($code);
my $headers = $response->headers_as_string;
return ($code, $desc, $headers, $body);
}
 
J

J. Gleixner

CronJob wrote:
[...]
Working code:

#!/usr/bin/perl -w

use strict;
use LWP::UserAgent;
use HTML::parse;
use HTML::Element;
use HTTP::Response;
use HTTP::Request;
use HTTP::Status;
use URI::URL;

Some minor tweaks..


Do you really need all of those?
my $ie7UAString = 'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)';
my ($code, $desc, $headers,$body) = &LWPUserAgentRequest('GET','http://www.google.com');

Remove the '&'------------------------^

If you add a '/' to the end of the URL, then the Web server doesn't
have to do it for you.
print "The headers:\n$headers\n";
print "The body:\n$body\n";

You can call print once, with a list:

print "The headers:\n$headers\n",
"The body:\n$body\n";
sub LWPUserAgentRequest {
my ($method, $path) = @_;

Usually, it's nice to have a blank line after initializing
the input parameters.
my $ua = new LWP::UserAgent;

my $ua = LWP::UserAgent->new();
$ua->agent($ie7UAString);
$ua->parse_head(0);
my $request = new HTTP::Request($method, $path);

my $request = HTTP::Request->new( $method, $path );
my $response = $ua->request($request);
my $body = $response->content;
$body = $response->error_as_HTML if ($response->is_error);

my $body = ( $response->is_error )
? $response->error_as_HTML
: $response->content;
my $code = $response->code;
my $desc = HTTP::Status::status_message($code);
my $headers = $response->headers_as_string;

Ya don't really need $headers, you could just return
$response->headers_as_string, instead of $headers, below.
 
T

Tad J McClellan

J. Gleixner said:
CronJob wrote:


my $ua = LWP::UserAgent->new();

my $request = HTTP::Request->new( $method, $path );


Just in case you're wondering why this suggested change is
a Really Good Idea, see the "Indirect Object Syntax" section in:

perldoc perlobj
 
E

Eric Pozharski

CronJob wrote: *SKIP*

You can call print once, with a list:

print "The headers:\n$headers\n",
"The body:\n$body\n";

With such outrageous number of newlines I would suggest

print <<"EOT";
The headers:
$headers
The body:
$body
EOT

*CUT*
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top