Perl web automation question ref javascript:dt_pop

P

phil court

Hi all,

I am trying to write a script to retrieve a web page. the script is detailed
below. My problem is as follows.

The script can successfully obtain web pages such as http://news.bbc.co.uk
and http://www.dreamteamfc.com

However it fails on the following URL
http://www.dreamteamfc.com/dtfc04/servlet/PostPlayerList?catidx=1&title=GOAL
KEEPERS&gameid=167

The returned web page (saved in myOUT.txt) contains
<HTML><HEAD><SCRIPT
LANGUAGE="JAVASCRIPT">location.replace("http://www.dreamteamfc.com");</SCRIP
T></HEAD></HTML>

The above URL is valid as I have pasted into my browser and it displays OK.
The above URL is part of the
http://www.dreamteamfc.com page and is obtained via a javascript:dt_pop
(Whatever that is).
Anyway here is the script, any ideas ?? Thanks

#!/usr/bin/perl -w

use URI;
use LWP::Simple;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new();

$ua->proxy('http', 'http://128.87.251.250:8080');

#my $content = get("http://news.bbc.co.uk");
my $content =
get("http://www.dreamteamfc.com/dtfc04/servlet/PostPlayerList?catidx=1&title
=GOALKEEPERS&gameid=167");
#my $content = get("http://www.dreamteamfc.com");

$script = "myOUT.txt";
unlink $script;
open (OUT,">>$script") || die "cannot open $script for open";


if (defined $content)
{
#$content will contain the html associated with the url mentioned above.
print OUT $content ;
}
else
{
#If an error occurs then $content will not be defined.
print "Error: Get stuffed";
}
close OUT;
 
B

Brian Helterline

phil court said:
Hi all,

I am trying to write a script to retrieve a web page. the script is detailed
below. My problem is as follows.

The script can successfully obtain web pages such as http://news.bbc.co.uk
and http://www.dreamteamfc.com

However it fails on the following URL
http://www.dreamteamfc.com/dtfc04/servlet/PostPlayerList?catidx=1&title=GOAL
KEEPERS&gameid=167

It would appear that this website uses cookies and a little JavaScript. Both
of these are handled by your browser but not by LWP by default. LWP can
handle
cookies, but not JavaScript.

If you really want to see what's going on, check out the web scraping proxy
(wsp)
project http://www.research.att.com/~hpk/wsp/
It will log *all* traffic generated by your browser.

There is another nifty hack at http://hacks.oreilly.com/pub/h/955
that will turn the output of wsp into a perl script for you.

Enjoy!

-brian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top