LWP user agent grabs the intermediate wait page after POST intead ofthe actual result page


B

bhabs

Hi,

I wrote a small LWP based perl program to search the air fare from a
travel website using POST.

#!/usr/bin/perl
use strict;
use CGI;
use LWP;

my $web_browser = LWP::UserAgent->new();
push @{ $web_browser->requests_redirectable }, 'POST';
$web_browser->timeout(300);
my $web_response = ();

$web_response = $web_browser->post('http://blabla.com/travel/
InitialSearch.do',
[
'fromCity' =>
'SFO',
'toCIty'
=> 'CVG'
.... #the rest
of the fields occur here
],
);

die "Error: ", $web_response->status_line()
unless $web_response->is_success;

my @content = $web_response->content;
print "@content";

When I print the content, I see the "intermediate" wait page (where it
displays the progress bar using javascript.... => I matched the
content with the "view source" from IExplorer)
I am unable to capture the final air fare page. It takes time for the
website to do the search and then display the air fare result page.
How do I make my program wait for the actual result and not grab the
intermediate response.

Could anyone please help me on this?

Regards,
bhabs
 
Ad

Advertisements

B

Ben Morrow

Quoth Christian Winter said:
bhabs said:
I wrote a small LWP based perl program to search the air fare from a
travel website using POST.
[...code snipped]

When I print the content, I see the "intermediate" wait page (where it
displays the progress bar using javascript.... => I matched the
content with the "view source" from IExplorer)
I am unable to capture the final air fare page. It takes time for the
website to do the search and then display the air fare result page.
How do I make my program wait for the actual result and not grab the
intermediate response.

You have to simulate what the browser does, and from your
description, this is most likely a repeated ajax request
to the server. Analyze the behaviour of the javascript
and see how it fetches the progress state and what it
does once the result is calculated, then craft those
actions yourself. You best chances to see exactly what is going
on in the background is with a network sniffer like wireshark,
or a browser plugin like Firefox' Live HTTP Headers.

Or http://www.research.att.com/sw/tools/wsp/ , which will write a Perl
script to make the appropriate requests for you.

Ben
 
Ad

Advertisements

T

Tad J McClellan

Christian Winter said:
bhabs said:
I wrote a small LWP based perl program to search the air fare from a
travel website using POST.
[...code snipped]

When I print the content, I see the "intermediate" wait page (where it
displays the progress bar using javascript.... => I matched the
content with the "view source" from IExplorer)
I am unable to capture the final air fare page. It takes time for the
website to do the search and then display the air fare result page.
How do I make my program wait for the actual result and not grab the
intermediate response.

You have to simulate what the browser does, and from your
description, this is most likely a repeated ajax request
to the server. Analyze the behaviour of the javascript
and see how it fetches the progress state and what it
does once the result is calculated, then craft those
actions yourself. You best chances to see exactly what is going
on in the background is with a network sniffer like wireshark,


I like the Web Scraping Proxy for this, it logs the traffic in
the form of LWP Perl code:

http://www.research.att.com/sw/tools/wsp/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top