Form manipulation through mechcanize (perl)

Discussion in 'Perl Misc' started by Antwerp, Feb 18, 2005.

  1. Antwerp

    Antwerp Guest

    Hi,

    I'm trying to create a script that automatically logs in to a website, and
    then parses the index (which is unavailable without first logging on). I'm
    having difficulties, but I'm not exactly sure where they might lie.

    I know I need to enable cookies to login, and I believe I have done so. I
    also recognize that I need to find and submit the appropriate form data - I
    *think* am doing this correctly too. However, once I submit the form data, I am
    unable to print, load, or view the "secure" page. I would appreciate any help;

    -------

    my $mech = WWW::Mechanize->new( autocheck => 1 );
    $mech->agent_alias( 'Windows IE 6' );
    $mech->cookie_jar(HTTP::Cookies->new
    (
    file => "cookies.txt",
    autosave => 1,
    ) );
    ###The above should allow for the storage and sending of cookies###

    $mech->get( $url ); #Getting the desired page
    #(automatically redirects to the login page);

    print $mech->content(); #Printing the content of the page
    #(Which, because of the redirect,
    # is the login screen. I pipe the output to an
    # output.html file for verification.

    $mech->field( $login_field_name, $usna ); #this should be setting the login
    field values
    $mech->field( $password_field_name, $pawo ); #this should be setting the
    password field values

    $mech->submit(); #this should submit the form, causing
    #a redirect to the proper index for logged in users

    print $mech->content(); #Question:
    #Shouldn't this show the new content of the page -
    #that is, the member index? Why does it not do so?

    -------

    I appreciate any help or suggestions,

    Thanks,

    AntWerp
    Antwerp, Feb 18, 2005
    #1
    1. Advertising

  2. Antwerp

    J. Gleixner Guest

    Antwerp wrote:
    > Hi,
    >
    > I'm trying to create a script that automatically logs in to a website, and
    > then parses the index (which is unavailable without first logging on). I'm
    > having difficulties, but I'm not exactly sure where they might lie.
    >
    > I know I need to enable cookies to login, and I believe I have done so. I
    > also recognize that I need to find and submit the appropriate form data - I
    > *think* am doing this correctly too. However, once I submit the form data, I am
    > unable to print, load, or view the "secure" page. I would appreciate any help;
    >
    > -------
    >
    > my $mech = WWW::Mechanize->new( autocheck => 1 );
    > $mech->agent_alias( 'Windows IE 6' );
    > $mech->cookie_jar(HTTP::Cookies->new
    > (
    > file => "cookies.txt",
    > autosave => 1,
    > ) );
    > $mech->get( $url ); #Getting the desired page
    > print $mech->content(); #Printing the content of the page
    > $mech->field( $login_field_name, $usna ); #this should be setting the login
    > field values
    > $mech->field( $password_field_name, $pawo ); #this should be setting the
    > password field values
    >
    > $mech->submit(); #this should submit the form, causing
    > #a redirect to the proper index for logged in users
    >
    > print $mech->content(); #Question:
    > #Shouldn't this show the new content of the page -
    > #that is, the member index? Why does it not do so?


    Without knowing your values of $url, $login_field_name, and
    $password_field_name, and what your final print displays, who knows.
    The code looks accurate. Is anything written to cookies.txt? If there
    is more than one form on the page, then you may want to look at the
    submit_form() method.
    J. Gleixner, Feb 21, 2005
    #2
    1. Advertising

  3. Antwerp

    Antwerp Guest

    Hello,


    I really appreciate your help, I've become really frustrated with this - all
    the resources I seek seem to confirm what I've already done - and suggest it
    should work.

    The first part seems to work well - I am indeed getting the 2 necessary
    cookies written into my cookies file (which I've included below). When first
    fetching the page, $url, it is redirected to the login page. As a result, when I
    look to display the content, I see the login page, and its associated login form
    (which is what should happen :) ). I then attempt to use WWW::mechanize to fill
    out and submit the form.

    This is where things break down - something isn't working. What should
    happen is that once the form is submitted with the correct login information, I
    am redirected back to the original $url, but this time, I because I have logged
    in, I should be able to browse the site. Thus, if I print $mech->content, I
    should get the source I would get if I was logged in - that is, I should be able
    to parse the logged in content of the site. What actually happens is I get the
    same code I got initially - as if nothing happened from when arrived to the
    site, and now. Even if I include false data in the username and password fields
    (which should result in some sort of error message being returned), it looks the
    same.


    As per your indications for more information, I've included a more detailed
    view into the source and have added the outputs below.



    -----Start Code-----

    use Data::Dumper;
    use HTTP::Cookies;
    use WWW::Mechanize;

    use POSIX;

    #########################
    #Vars
    #########################

    ### URI
    my $url = "http://www.memberplushq.com/pe/index.jsp";

    ### DATA: Required Form parameters
    my $usna = "censored";
    my $pawo = "censored";

    ### FORM: Form Field and Name Structure
    my $target_form_name = "loginForm";
    my $login_field_name = "login_name";
    my $password_field_name = "password";
    my $submit_button_name = "loginSubmit";


    #########################
    #UserAgent Config
    #########################

    my $mech = WWW::Mechanize->new( autocheck => 1 );
    $mech->env_proxy;
    $mech->agent_alias( 'Windows IE 6' );
    $mech->cookie_jar(HTTP::Cookies->new
    (
    file => "cookies.txt",
    autosave => 1,
    ignore_discard => 1,
    ) );

    $mech->get ( $url );
    $mech->success or die "Critical Failure (Site Retrival) : ",
    $mech->response->status_line;
    #########################
    # Debug::Visual Confirmation of arrived login page.
    #########################

    print $mech->content;

    #This outputs the HTML of the login page,
    #Which works nicely.

    #########################
    #Form Submittal
    #########################

    $mech->submit_form(
    form_name => $target_form_name,
    fields => {
    $login_field_name => $usna,
    $password_field_name => $pawo,
    },
    button => $submit_button_name

    );

    $mech->success or die "Critical Failure (Form Submission): ",
    $mech->response->status_line;

    #########################
    # Debug::Visual Confirmation of logined index page
    #########################

    print $mech->content;

    # I want this to output the html I would otherwise see if I had
    # submitted the form and been redirected back to the initial
    # url ($url). This time though, because I have, supposedly,
    # logged in, I should be able to see the logged in version.


    -----End Code-----


    Thank you very much for all your help,

    AntWerp



    "J. Gleixner" <> wrote in message
    news:G7qSd.410$...
    : Antwerp wrote:
    : > Hi,
    : >
    : > I'm trying to create a script that automatically logs in to a website,
    and
    : > then parses the index (which is unavailable without first logging on). I'm
    : > having difficulties, but I'm not exactly sure where they might lie.
    : >
    : > I know I need to enable cookies to login, and I believe I have done so.
    I
    : > also recognize that I need to find and submit the appropriate form data - I
    : > *think* am doing this correctly too. However, once I submit the form data, I
    am
    : > unable to print, load, or view the "secure" page. I would appreciate any
    help;
    : >

    [code removed]

    : Without knowing your values of $url, $login_field_name, and
    : $password_field_name, and what your final print displays, who knows.
    : The code looks accurate. Is anything written to cookies.txt? If there
    : is more than one form on the page, then you may want to look at the
    : submit_form() method.
    Antwerp, Feb 22, 2005
    #3
  4. Antwerp

    Antwerp Guest

    Hi,

    Thank you for you help, but I figured it out. It turns out I missed the
    hidden values of the form - HTTP live headers, a mozilla plug in, alerted me to
    this, and now everything works.

    Yeay!

    AntWerp

    "J. Gleixner" <> wrote in message
    news:G7qSd.410$...
    : Antwerp wrote:
    : > Hi,
    : >
    : > I'm trying to create a script that automatically logs in to a website,
    and
    : > then parses the index (which is unavailable without first logging on). I'm
    : > having difficulties, but I'm not exactly sure where they might lie.
    : >
    : > I know I need to enable cookies to login, and I believe I have done so.
    I
    : > also recognize that I need to find and submit the appropriate form data - I
    : > *think* am doing this correctly too. However, once I submit the form data, I
    am
    : > unable to print, load, or view the "secure" page. I would appreciate any
    help;
    : >
    : > -------
    : >
    : > my $mech = WWW::Mechanize->new( autocheck => 1 );
    : > $mech->agent_alias( 'Windows IE 6' );
    : > $mech->cookie_jar(HTTP::Cookies->new
    : > (
    : > file => "cookies.txt",
    : > autosave => 1,
    : > ) );
    : > $mech->get( $url ); #Getting the desired page
    : > print $mech->content(); #Printing the content of the page
    : > $mech->field( $login_field_name, $usna ); #this should be setting the
    login
    : > field values
    : > $mech->field( $password_field_name, $pawo ); #this should be setting the
    : > password field values
    : >
    : > $mech->submit(); #this should submit the form, causing
    : > #a redirect to the proper index for logged in users
    : >
    : > print $mech->content(); #Question:
    : > #Shouldn't this show the new content of the page -
    : > #that is, the member index? Why does it not do so?
    :
    : Without knowing your values of $url, $login_field_name, and
    : $password_field_name, and what your final print displays, who knows.
    : The code looks accurate. Is anything written to cookies.txt? If there
    : is more than one form on the page, then you may want to look at the
    : submit_form() method.
    Antwerp, Feb 22, 2005
    #4
  5. Antwerp

    Antwerp Guest

    Hi,

    : It appears you have all of the other variables covered, except for the
    : "value" of the submit button. You provided the correct name, loginSubmit;
    : but you didn't provide a value. From the response to a submittal of the
    : previously mentioned Col43 script,
    :
    : "loginSubmit" => "\xa0\xa0Login\xa0\xa0", # submit
    :
    : If the back-end processing is checking for the correct value, it would
    : behove you to provide it. ;-)

    You are absolutely correct - I had mistakenly thought that I would not have
    to submit hidden fields, and thus, although I did see them (as per the previous
    posts / suggestions made), I did not think it necessary to include them. When,
    out of desperation, I tried them, everything worked.

    I thank you very much for your time and assistance,

    AntWerp


    "Bill Segraves" <> wrote in message
    news:gOKSd.3910$...
    : "Bill Segraves" <> wrote in message
    : news:peKSd.8706$...
    : <snip>
    : > In addition, it appears you have missed at least one hidden variable that
    : > should be included in your submittal
    : >
    : > "userRequested" => "", # hidden
    :
    : Another hidden variable:
    :
    : "uri" => "/pe/index.jsp", # hidden
    :
    : It appears you have all of the other variables covered, except for the
    : "value" of the submit button. You provided the correct name, loginSubmit;
    : but you didn't provide a value. From the response to a submittal of the
    : previously mentioned Col43 script,
    :
    : "loginSubmit" => "\xa0\xa0Login\xa0\xa0", # submit
    :
    : If the back-end processing is checking for the correct value, it would
    : behove you to provide it. ;-)
    :
    : Cheers.
    : --
    : Bill Segraves
    :
    :
    Antwerp, Feb 23, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. jared in ecs

    Perl string manipulation

    jared in ecs, Oct 22, 2003, in forum: Perl
    Replies:
    2
    Views:
    798
    Roy Johnson
    Oct 22, 2003
  2. Tim Peters
    Replies:
    0
    Views:
    491
    Tim Peters
    Jan 24, 2004
  3. Dave
    Replies:
    1
    Views:
    2,543
    J├╝rgen Exner
    Dec 22, 2007
  4. Selwyn Leeke
    Replies:
    3
    Views:
    135
    Anno Siegel
    Sep 15, 2003
  5. siddhartha mulpuru

    Cron Job manipulation through Web Page

    siddhartha mulpuru, Aug 11, 2004, in forum: Perl Misc
    Replies:
    1
    Views:
    121
    Clyde Ingram
    Aug 11, 2004
Loading...

Share This Page