Perl web automation question ref javascript:dt_pop

phil court · Sep 9, 2004

Hi all,

I am trying to write a script to retrieve a web page. the script is detailed
below. My problem is as follows.

The script can successfully obtain web pages such as http://news.bbc.co.uk
and http://www.dreamteamfc.com

However it fails on the following URL
http://www.dreamteamfc.com/dtfc04/servlet/PostPlayerList?catidx=1&title=GOAL
KEEPERS&gameid=167

The returned web page (saved in myOUT.txt) contains
<HTML><HEAD><SCRIPT
LANGUAGE="JAVASCRIPT">location.replace("http://www.dreamteamfc.com");</SCRIP
T></HEAD></HTML>

The above URL is valid as I have pasted into my browser and it displays OK.
The above URL is part of the
http://www.dreamteamfc.com page and is obtained via a javascript:dt_pop
(Whatever that is).
Anyway here is the script, any ideas ?? Thanks

#!/usr/bin/perl -w

use URI;
use LWP::Simple;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new();

$ua->proxy('http', 'http://128.87.251.250:8080');

#my $content = get("http://news.bbc.co.uk");
my $content =
get("http://www.dreamteamfc.com/dtfc04/servlet/PostPlayerList?catidx=1&title
=GOALKEEPERS&gameid=167");
#my $content = get("http://www.dreamteamfc.com");

$script = "myOUT.txt";
unlink $script;
open (OUT,">>$script") || die "cannot open $script for open";

if (defined $content)
{
#$content will contain the html associated with the url mentioned above.
print OUT $content ;
}
else
{
#If an error occurs then $content will not be defined.
print "Error: Get stuffed";
}
close OUT;

Brian Helterline · Sep 9, 2004

phil court said:
Hi all,

I am trying to write a script to retrieve a web page. the script is detailed
below. My problem is as follows.

The script can successfully obtain web pages such as http://news.bbc.co.uk
and http://www.dreamteamfc.com

However it fails on the following URL
http://www.dreamteamfc.com/dtfc04/servlet/PostPlayerList?catidx=1&title=GOAL
KEEPERS&gameid=167

It would appear that this website uses cookies and a little JavaScript. Both
of these are handled by your browser but not by LWP by default. LWP can
handle
cookies, but not JavaScript.

If you really want to see what's going on, check out the web scraping proxy
(wsp)
project http://www.research.att.com/~hpk/wsp/
It will log *all* traffic generated by your browser.

There is another nifty hack at http://hacks.oreilly.com/pub/h/955
that will turn the output of wsp into a perl script for you.

Enjoy!

-brian

Help Needed - LWP Redirect problemss	8	Oct 18, 2005
About as basic "Newbie-Question" that you can get.	3	Sep 4, 2023
https request failing	2	Sep 18, 2012
Capturing actual Browser output in perl	3	May 22, 2009
Simple web framework - improvements to makefile	0	Feb 1, 2023
How do I hide the modal after 5 seconds?	4	Jun 1, 2023
Align img inside nav tabs section	5	Dec 29, 2023
Canvas drawing HTML Javascript on elementor	1	Feb 22, 2023

Perl web automation question ref javascript:dt_pop

phil court

Brian Helterline

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads