Newbie LWP question - simulate browser?

P

philthym

Hi

As the title suggests, I am a Perl newbie. I am trying to monitor a
remote site and would like to time it in returning all objects on the
page, ie the HTML and all the associated GIFs, bits of JavaScript,
Java applets and so on.

Here is the code as it stands today:

#!/usr/bin/perl
use CGI;
use LWP::Simple;
use Time::HiRes qw(gettimeofday);

$URL="http://www.xyz.com/index.html";

$usec1 = gettimeofday;
$timenow = localtime();

$HomePage=get($URL);

if
($HomePage =~ /String/)
{ $usec2 = gettimeofday;
$elapsed = $usec2-$usec1;
print "$timenow Page retrieved in $elapsed seconds\n" }
else
{ print "$timenow Page not retrieved\n"; }

I'm not sure I understand the whole lwp/get thing! What I'm wondering
is does this request effectively initiate the web server to return all
objects or just the HTML itself? If it's returning everything, then
does the second timer occur after all objects have been returned? In
other words, does this code do what I want it to? If not, any ideas
how I would achieve my aim, please?

Any help would be gratefully appreciated.

Thanks

Phil
 
S

Sherm Pendley

philthym said:
I'm not sure I understand the whole lwp/get thing! What I'm wondering
is does this request effectively initiate the web server to return all
objects or just the HTML itself?

It does *exactly* what you ask it to, no more - it fetches index.html.
Parsing the HTML, extracting the <img ...> elements from it, and making
additional requests to the server to fetch the images they point to, will
require additional code.

Have a look at HTML::parser - it's a good place to start.

sherm--
 
J

Joe Smith

philthym said:
As the title suggests, I am a Perl newbie. I am trying to monitor a
remote site and would like to time it in returning all objects on the
page, ie the HTML and all the associated GIFs, bits of JavaScript,
Java applets and so on.

It's one thing to fetch a Javascript. It is quite another to fetch
the things that would be requested had the Javascript been executed.
For that, you need a proxy that logs the requests from a real browser.

There are several, including the "Web Scrapting Proxy"
http://www.research.att.com/~hpk/wsp/

-Joe
 
P

philthym

Sherm Pendley said:
It does *exactly* what you ask it to, no more - it fetches index.html.
Parsing the HTML, extracting the <img ...> elements from it, and making
additional requests to the server to fetch the images they point to, will
require additional code.

Have a look at HTML::parser - it's a good place to start.

sherm--

Thanks Sherm, I thought it would be something like that. I'll check out HTML:parser.

Regards

Phil
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top