LWP::UserAgent and redirected page responses

Discussion in 'Perl Misc' started by Bill, Oct 21, 2005.

  1. Bill

    Bill Guest

    Hello. This concerns LWP::UserAgent. If a request is sent to a certain
    web site, the response in the browser comes back as a completely
    different domain and site due to redirection. How do I find out, from
    the UserAgent module, what the redirected url is? A uri() method as in
    WWW::Mechanize seems like a good candidate but when checked the uri()
    method seems to return the original request uri, not the redirected
    one. I need to know exactly what would be in the url box of a web
    browser after the redirected response happens. Anyone know how to do
    this? I will post code if requested.
     
    Bill, Oct 21, 2005
    #1
    1. Advertising

  2. Bill

    J. Gleixner Guest

    Bill wrote:
    > Hello. This concerns LWP::UserAgent. If a request is sent to a certain
    > web site, the response in the browser comes back as a completely
    > different domain and site due to redirection. How do I find out, from
    > the UserAgent module, what the redirected url is? A uri() method as in
    > WWW::Mechanize seems like a good candidate but when checked the uri()
    > method seems to return the original request uri, not the redirected
    > one. I need to know exactly what would be in the url box of a web
    > browser after the redirected response happens. Anyone know how to do
    > this? I will post code if requested.
    >


    Check the requests_redirectable method in "perldoc LWP::UserAgent". By
    default, it doesn't redirect on a POST.
     
    J. Gleixner, Oct 21, 2005
    #2
    1. Advertising

  3. Bill

    Paul Lalli Guest

    Bill wrote:
    > Hello. This concerns LWP::UserAgent. If a request is sent to a certain
    > web site, the response in the browser comes back as a completely
    > different domain and site due to redirection. How do I find out, from
    > the UserAgent module, what the redirected url is? A uri() method as in
    > WWW::Mechanize seems like a good candidate but when checked the uri()
    > method seems to return the original request uri, not the redirected
    > one. I need to know exactly what would be in the url box of a web
    > browser after the redirected response happens. Anyone know how to do
    > this? I will post code if requested.


    [Disclaimer: All of the below is gleaned from reading the relevant
    docs. I have not tried any LWP code myself ]

    The LWP::UserAgent object sends a request to a server by means of the
    post() method. The return value of the post() method is an object of
    HTTP::Response. The HTTP::Response man page shows that one of its
    methods is request(), which is defined as follows:
    $r->request
    $r->request( $request )

    This is used to get/set the request attribute. The request attribute
    is a
    reference to the the request that caused this response. It does not
    have
    to be the same request passed to the $ua->request() method, because
    there
    might have been redirects and authorization retries in between.

    To find out what we can get from that object, we look to HTTP::Request,
    which has this method:
    $r->uri
    $r->uri( $val )

    This is used to get/set the uri attribute. The $val can be a
    reference to
    a URI object or a plain string. If a string is given, then it should
    be
    parseable as an absolute URI.

    Putting it altogether then:

    my $ua = new HTTP::UserAgent;
    my $response = $ua->post($url);
    my $request = $response->request();
    my $found_url = $request->uri();

    Hope this helps,
    Paul Lalli
     
    Paul Lalli, Oct 21, 2005
    #3
  4. Paul Lalli wrote:

    > To find out what we can get from that object, we look to HTTP::Request,
    > which has this method:
    > $r->uri
    > $r->uri( $val )
    >
    > This is used to get/set the uri attribute. The $val can be a
    > reference to
    > a URI object or a plain string. If a string is given, then it should
    > be
    > parseable as an absolute URI.
    >
    > Putting it altogether then:
    >
    > my $ua = new HTTP::UserAgent;
    > my $response = $ua->post($url);
    > my $request = $response->request();
    > my $found_url = $request->uri();
    >
    > Hope this helps,
    > Paul Lalli
    >


    Sorry that I did not post code the last time.
    Here's an excerpt from the method in question:

    -----------------------------------------
    # log in to my.tmobile.com (T-Mobile USA) and
    # return hashref keyed to total charged minutes (not free) and
    # total charged SMS messaging. Keys are 'calls' and 'messages'
    sub get_billing {
    my ($self) = @_;
    $self->{start_page} = $base_uri;
    $self->{agent} =
    new WWW::Mechanize(
    agent => "Mozilla/4.0 (compatible; MSIE 7.0b; Perl $])",
    );
    $self->{agent}->get($base_uri);
    $self->{agent}->form_name("Form1") or croak $self->content;

    # Even though WWW:Mechanize does most of the work, we have to
    # manually change readonly on hidden fields. Annoying.
    my $input = $self->{agent}->current_form->find_input('__EVENTTARGET')
    or $self->_err("Cannot find hidden field for signin in Form1");
    no warnings;
    $input->readonly(0);
    use warnings;
    $self->{agent}->set_fields(
    'txtMSISDN' => $self->{user_number},
    'txtPassword' => $self->{password},
    '__EVENTTARGET' => 'signin',
    );
    $self->{agent}->submit
    or $self->_err("Could not submit form1 successfully");
    $self->{agent}->get("https://my.t-mobile.com/Billing/")
    or $self->_err("Cannot get Billing page: ");
    print "Line uri: ", $self->{agent}->uri, "\n";

    # cut for brevity here....
    }

    The problem is that the second uri printed is NOT the same as the uri
    displayed in the url line of the browser doing the same tasks, even
    though the CONTENT of the FIRST request's response text is correct. As a
    result, the user agent fails to correctly submit the next click, since
    the base URL is now incorrect. I cannot just plug in a fixed url there,
    since the redirected URL contains some cookie-like values needed by the
    host.

    Ideas?
     
    William Herrera, Oct 22, 2005
    #4
  5. J. Gleixner wrote:
    > Check the requests_redirectable method in "perldoc LWP::UserAgent". By
    > default, it doesn't redirect on a POST.


    Thanks, I'll try that.
     
    William Herrera, Oct 22, 2005
    #5
  6. Bill

    John Bokma Guest

    William Herrera
    <> wrote:

    > The problem is that the second uri printed is NOT the same as the uri
    > displayed in the url line of the browser doing the same tasks, even
    > though the CONTENT of the FIRST request's response text is correct. As
    > a result, the user agent fails to correctly submit the next click,
    > since the base URL is now incorrect. I cannot just plug in a fixed url
    > there, since the redirected URL contains some cookie-like values
    > needed by the host.
    >
    > Ideas?


    Might be the UserAgent or any other header that triggers this behaviour. If
    the uri you get back contains the "cookie-like" values, you can tweak them
    into the URL you know.

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    I ploink googlegroups.com :)
     
    John Bokma, Oct 22, 2005
    #6
  7. John Bokma wrote:
    >>The problem is that the second uri printed is NOT the same as the uri
    >>displayed in the url line of the browser doing the same tasks, even
    >>though the CONTENT of the FIRST request's response text is correct. As
    >>a result, the user agent fails to correctly submit the next click,
    >>since the base URL is now incorrect. I cannot just plug in a fixed url
    >>there, since the redirected URL contains some cookie-like values
    >>needed by the host.
    >>
    >>Ideas?

    >
    >
    > Might be the UserAgent or any other header that triggers this behaviour. If
    > the uri you get back contains the "cookie-like" values, you can tweak them
    > into the URL you know.


    Yes, and the "cookie-like" values seem to be a per-session ID that
    changes. So, I need to know that the uri is that I get back. Which LWP
    does not seem to keep anywhere--it keeps the original, non-redirected
    uri instead?
     
    William Herrera, Oct 22, 2005
    #7
  8. John Bokma wrote:

    > If there is a redirect, LWP stores this info. IIRC in debug mode you can
    > see what is happening. Another trick is to set the redirect level to 0, to
    > 1, etc.
    >


    Thanks. Using LWP::DebugFile shows that LWP correctly GETS the URL which
    the browser displays, yet the uri() method returns the initial URL, not
    the finally redirected one. Weird. I suppose I could check the tail of
    the LWP::DebugFile as the program progresses, but that seems so clumsy.
    There ought to be a method or value inside UserAgent that I can use?
     
    William Herrera, Oct 22, 2005
    #8

  9. >
    > If there is a redirect, LWP stores this info. IIRC in debug mode you can
    > see what is happening. Another trick is to set the redirect level to 0, to
    > 1, etc.
    >
    > I am sure there are little (Perl) proxy programs available that show you
    > exactly what is being send out, and comes back.
    >
    > Also, try with a browser with JavaScript off, since that is what LWP is
    > doing.
    >


    I wrote an itty bitty module to fix the problem (currently calling it
    LWP::LastURI). So now things work okay. Thanks for the suggestion to
    look at the LWP debug output.

    --Bill
     
    William Herrera, Oct 23, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. La Jesus
    Replies:
    9
    Views:
    1,337
    Gunnar Hjalmarsson
    Oct 27, 2003
  2. Paul Lemmons

    LWP::UserAgent and SSL is it impossible?

    Paul Lemmons, Nov 11, 2003, in forum: Perl Misc
    Replies:
    4
    Views:
    182
    Pierre Asselin
    Nov 12, 2003
  3. dan baker
    Replies:
    0
    Views:
    114
    dan baker
    Jan 18, 2004
  4. P.R.Brady

    LWP::UserAgent and 404 page not found

    P.R.Brady, Jun 22, 2005, in forum: Perl Misc
    Replies:
    4
    Views:
    397
    P.R.Brady
    Jun 24, 2005
  5. CronJob
    Replies:
    5
    Views:
    165
    Eric Pozharski
    Mar 20, 2009
Loading...

Share This Page