Form manipulation through mechcanize (perl)

A

Antwerp

Hi,

I'm trying to create a script that automatically logs in to a website, and
then parses the index (which is unavailable without first logging on). I'm
having difficulties, but I'm not exactly sure where they might lie.

I know I need to enable cookies to login, and I believe I have done so. I
also recognize that I need to find and submit the appropriate form data - I
*think* am doing this correctly too. However, once I submit the form data, I am
unable to print, load, or view the "secure" page. I would appreciate any help;

-------

my $mech = WWW::Mechanize->new( autocheck => 1 );
$mech->agent_alias( 'Windows IE 6' );
$mech->cookie_jar(HTTP::Cookies->new
(
file => "cookies.txt",
autosave => 1,
) );
###The above should allow for the storage and sending of cookies###

$mech->get( $url ); #Getting the desired page
#(automatically redirects to the login page);

print $mech->content(); #Printing the content of the page
#(Which, because of the redirect,
# is the login screen. I pipe the output to an
# output.html file for verification.

$mech->field( $login_field_name, $usna ); #this should be setting the login
field values
$mech->field( $password_field_name, $pawo ); #this should be setting the
password field values

$mech->submit(); #this should submit the form, causing
#a redirect to the proper index for logged in users

print $mech->content(); #Question:
#Shouldn't this show the new content of the page -
#that is, the member index? Why does it not do so?

-------

I appreciate any help or suggestions,

Thanks,

AntWerp
 
J

J. Gleixner

Antwerp said:
Hi,

I'm trying to create a script that automatically logs in to a website, and
then parses the index (which is unavailable without first logging on). I'm
having difficulties, but I'm not exactly sure where they might lie.

I know I need to enable cookies to login, and I believe I have done so. I
also recognize that I need to find and submit the appropriate form data - I
*think* am doing this correctly too. However, once I submit the form data, I am
unable to print, load, or view the "secure" page. I would appreciate any help;

-------

my $mech = WWW::Mechanize->new( autocheck => 1 );
$mech->agent_alias( 'Windows IE 6' );
$mech->cookie_jar(HTTP::Cookies->new
(
file => "cookies.txt",
autosave => 1,
) );
$mech->get( $url ); #Getting the desired page
print $mech->content(); #Printing the content of the page
$mech->field( $login_field_name, $usna ); #this should be setting the login
field values
$mech->field( $password_field_name, $pawo ); #this should be setting the
password field values

$mech->submit(); #this should submit the form, causing
#a redirect to the proper index for logged in users

print $mech->content(); #Question:
#Shouldn't this show the new content of the page -
#that is, the member index? Why does it not do so?

Without knowing your values of $url, $login_field_name, and
$password_field_name, and what your final print displays, who knows.
The code looks accurate. Is anything written to cookies.txt? If there
is more than one form on the page, then you may want to look at the
submit_form() method.
 
A

Antwerp

Hello,


I really appreciate your help, I've become really frustrated with this - all
the resources I seek seem to confirm what I've already done - and suggest it
should work.

The first part seems to work well - I am indeed getting the 2 necessary
cookies written into my cookies file (which I've included below). When first
fetching the page, $url, it is redirected to the login page. As a result, when I
look to display the content, I see the login page, and its associated login form
(which is what should happen :) ). I then attempt to use WWW::mechanize to fill
out and submit the form.

This is where things break down - something isn't working. What should
happen is that once the form is submitted with the correct login information, I
am redirected back to the original $url, but this time, I because I have logged
in, I should be able to browse the site. Thus, if I print $mech->content, I
should get the source I would get if I was logged in - that is, I should be able
to parse the logged in content of the site. What actually happens is I get the
same code I got initially - as if nothing happened from when arrived to the
site, and now. Even if I include false data in the username and password fields
(which should result in some sort of error message being returned), it looks the
same.


As per your indications for more information, I've included a more detailed
view into the source and have added the outputs below.



-----Start Code-----

use Data::Dumper;
use HTTP::Cookies;
use WWW::Mechanize;

use POSIX;

#########################
#Vars
#########################

### URI
my $url = "http://www.memberplushq.com/pe/index.jsp";

### DATA: Required Form parameters
my $usna = "censored";
my $pawo = "censored";

### FORM: Form Field and Name Structure
my $target_form_name = "loginForm";
my $login_field_name = "login_name";
my $password_field_name = "password";
my $submit_button_name = "loginSubmit";


#########################
#UserAgent Config
#########################

my $mech = WWW::Mechanize->new( autocheck => 1 );
$mech->env_proxy;
$mech->agent_alias( 'Windows IE 6' );
$mech->cookie_jar(HTTP::Cookies->new
(
file => "cookies.txt",
autosave => 1,
ignore_discard => 1,
) );

$mech->get ( $url );
$mech->success or die "Critical Failure (Site Retrival) : ",
$mech->response->status_line;
#########################
# Debug::Visual Confirmation of arrived login page.
#########################

print $mech->content;

#This outputs the HTML of the login page,
#Which works nicely.

#########################
#Form Submittal
#########################

$mech->submit_form(
form_name => $target_form_name,
fields => {
$login_field_name => $usna,
$password_field_name => $pawo,
},
button => $submit_button_name

);

$mech->success or die "Critical Failure (Form Submission): ",
$mech->response->status_line;

#########################
# Debug::Visual Confirmation of logined index page
#########################

print $mech->content;

# I want this to output the html I would otherwise see if I had
# submitted the form and been redirected back to the initial
# url ($url). This time though, because I have, supposedly,
# logged in, I should be able to see the logged in version.


-----End Code-----


Thank you very much for all your help,

AntWerp



: Antwerp wrote:
: > Hi,
: >
: > I'm trying to create a script that automatically logs in to a website,
and
: > then parses the index (which is unavailable without first logging on). I'm
: > having difficulties, but I'm not exactly sure where they might lie.
: >
: > I know I need to enable cookies to login, and I believe I have done so.
I
: > also recognize that I need to find and submit the appropriate form data - I
: > *think* am doing this correctly too. However, once I submit the form data, I
am
: > unable to print, load, or view the "secure" page. I would appreciate any
help;
: >

[code removed]

: Without knowing your values of $url, $login_field_name, and
: $password_field_name, and what your final print displays, who knows.
: The code looks accurate. Is anything written to cookies.txt? If there
: is more than one form on the page, then you may want to look at the
: submit_form() method.
 
A

Antwerp

Hi,

Thank you for you help, but I figured it out. It turns out I missed the
hidden values of the form - HTTP live headers, a mozilla plug in, alerted me to
this, and now everything works.

Yeay!

AntWerp

: Antwerp wrote:
: > Hi,
: >
: > I'm trying to create a script that automatically logs in to a website,
and
: > then parses the index (which is unavailable without first logging on). I'm
: > having difficulties, but I'm not exactly sure where they might lie.
: >
: > I know I need to enable cookies to login, and I believe I have done so.
I
: > also recognize that I need to find and submit the appropriate form data - I
: > *think* am doing this correctly too. However, once I submit the form data, I
am
: > unable to print, load, or view the "secure" page. I would appreciate any
help;
: >
: > -------
: >
: > my $mech = WWW::Mechanize->new( autocheck => 1 );
: > $mech->agent_alias( 'Windows IE 6' );
: > $mech->cookie_jar(HTTP::Cookies->new
: > (
: > file => "cookies.txt",
: > autosave => 1,
: > ) );
: > $mech->get( $url ); #Getting the desired page
: > print $mech->content(); #Printing the content of the page
: > $mech->field( $login_field_name, $usna ); #this should be setting the
login
: > field values
: > $mech->field( $password_field_name, $pawo ); #this should be setting the
: > password field values
: >
: > $mech->submit(); #this should submit the form, causing
: > #a redirect to the proper index for logged in users
: >
: > print $mech->content(); #Question:
: > #Shouldn't this show the new content of the page -
: > #that is, the member index? Why does it not do so?
:
: Without knowing your values of $url, $login_field_name, and
: $password_field_name, and what your final print displays, who knows.
: The code looks accurate. Is anything written to cookies.txt? If there
: is more than one form on the page, then you may want to look at the
: submit_form() method.
 
A

Antwerp

Hi,

: It appears you have all of the other variables covered, except for the
: "value" of the submit button. You provided the correct name, loginSubmit;
: but you didn't provide a value. From the response to a submittal of the
: previously mentioned Col43 script,
:
: "loginSubmit" => "\xa0\xa0Login\xa0\xa0", # submit
:
: If the back-end processing is checking for the correct value, it would
: behove you to provide it. ;-)

You are absolutely correct - I had mistakenly thought that I would not have
to submit hidden fields, and thus, although I did see them (as per the previous
posts / suggestions made), I did not think it necessary to include them. When,
out of desperation, I tried them, everything worked.

I thank you very much for your time and assistance,

AntWerp


: : <snip>
: > In addition, it appears you have missed at least one hidden variable that
: > should be included in your submittal
: >
: > "userRequested" => "", # hidden
:
: Another hidden variable:
:
: "uri" => "/pe/index.jsp", # hidden
:
: It appears you have all of the other variables covered, except for the
: "value" of the submit button. You provided the correct name, loginSubmit;
: but you didn't provide a value. From the response to a submittal of the
: previously mentioned Col43 script,
:
: "loginSubmit" => "\xa0\xa0Login\xa0\xa0", # submit
:
: If the back-end processing is checking for the correct value, it would
: behove you to provide it. ;-)
:
: Cheers.
: --
: Bill Segraves
:
:
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top