LWP user agent query

Discussion in 'Perl Misc' started by P.R.Brady, Aug 26, 2005.

  1. P.R.Brady

    P.R.Brady Guest

    I tried my web crawler/link checker on a neighbour's site and found
    problems with the button top right entitled 'cymraeg' in this page (and
    the same button on others):
    http://www.anglesey.gov.uk/english/community/health/smoke-free/smoke-free.htm

    I think I need to extract the url:
    http://www.anglesey.gov.uk/cgi-bin/change_language.asp?language=cymraeg
    for the get as in the following code but I am getting 404 not found
    returned.

    Internet Explorer seems very happy with the button and returns the Welsh
    version, but Netscape 7 is not entirely happy with it either.

    Where is the problem? My hand extraction of the target url, the code
    below or an issue in the host?

    Regards
    Phil



    use strict;
    use warnings;
    use LWP::UserAgent;
    use HTTP::Response;
    use HTML::TokeParser;

    my $referer=
    'http://www.anglesey.gov.uk/english/community/health/smoke-free/smoke-free.htm';
    my $url=
    'http://www.anglesey.gov.uk/cgi-bin/change_language.asp?language=cymraeg';

    #open the browser
    my $browser = LWP::UserAgent->new;
    $browser->timeout(30);

    my $response = $browser->get($url,
    Referer => $referer,
    'User-Agent' => 'Mozilla/7. [en] (Win98; U)',
    'Accept' => 'text/html, image/gif, image/x-xbitmap,
    image/jpeg, image/pjpeg, image/png, */*',
    'Accept-Charset' => 'ISO-8859-1, *, utf-8',
    'Accept-Language' => 'cy, en, en-GB',
    'media-range' => '*/*',
    'max-redirect' => '70',
    );

    my $status= $response->status_line;

    print "Status=$status\n";

    my $base = $response->base;
    print "Base=$base\n";
    if ($response->is_success) {
    print "Show data?";
    $_= <STDIN>;
    if (/y/i){
    my $doc = $response -> content;
    print "$doc\n";
    }
    }
    exit;
     
    P.R.Brady, Aug 26, 2005
    #1
    1. Advertising

  2. "P.R.Brady" <> wrote in
    news::

    > I tried my web crawler/link checker on a neighbour's site and found
    > problems with the button top right entitled 'cymraeg' in this page
    > (and the same button on others):
    > http://www.anglesey.gov.uk/english/community/health/smoke-free/smoke-
    > free.htm
    >
    > I think I need to extract the url:
    > http://www.anglesey.gov.uk/cgi-bin/change_language.asp?
    > language=cymraeg
    > for the get as in the following code but I am getting 404 not found
    > returned.
    >
    > Internet Explorer seems very happy with the button and returns the
    > Welsh version, but Netscape 7 is not entirely happy with it either.


    Clicking on the link in Firefox re-directs me to http://www.cos.com/

    I am inclined to think this is a case of either bad HTML or bad ASP
    programming, and thus off-topic here.

    Sinan
    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
     
    A. Sinan Unur, Aug 26, 2005
    #2
    1. Advertising

  3. On Fri, 26 Aug 2005, P.R.Brady wrote:

    > I tried my web crawler/link checker on a neighbour's site and found problems
    > with the button top right entitled 'cymraeg' in this page (and the same button
    > on others):
    > http://www.anglesey.gov.uk/english/community/health/smoke-free/smoke-free.htm


    As soon as I click it, my browser throws an alert telling me that
    the site wants to set a cookie.
    However, even if I respond by allowing session cookies, I get an
    error alert, telling me that "community could not be found".

    > Internet Explorer seems very happy with the button and returns the Welsh
    > version, but Netscape 7 is not entirely happy with it either.


    That sounds ominouosly like the all too prevalent situation of a web
    page that's been designed to work only with the operating system
    compoment that thinks it's a browser, but not with a www-compatible
    client agent.

    > I think I need to extract the url:
    > http://www.anglesey.gov.uk/cgi-bin/change_language.asp?language=cymraeg
    > for the get as in the following code but I am getting 404 not found
    > returned.


    You've worked that out from the 'form method="GET" ...' which is used
    to implement this switch, right?

    Here's how their server seems to respond to that URL:


    HTTP/1.1 302 Object moved
    Connection: close
    Date: Fri, 26 Aug 2005 14:48:58 GMT
    Server: Microsoft-IIS/6.0
    MicrosoftOfficeWebServer: 5.0_Pub
    X-Powered-By: ASP.NET
    Location: //
    Content-Length: 123
    Content-Type: text/html
    Set-Cookie: ASPSESSIONIDSCTBSRDA=HDKPDDIDBPOGDPJLBCCGGGOL; path=/
    Cache-control: private


    That "Location:" looks meaningless to me. The HTTP specification
    demands an absolute URL to be returned on a Location: header, and that
    most certainly ain't one. Whatever a client agent would do in
    response to it would seem to be in the nature of an error fixup, and
    there's no reason to suppose clients would perform the same fix as
    each other.

    You might consider running LWP without automatically resolving
    redirections, so that you get control back as soon as this code 302
    response is returned, and try to fix this up yourself, if MSIE has
    given you some clue about where it's supposed to go. You'll need to
    have cookie handling enabled, too, of course. Sorry, I haven't tried
    this at all - it's just a suggestion.


    <rant>
    It's bad enough that the source of the above web page has a DOCTYPE
    that makes it look like HTML/2.0, which it clearly is not: but there's
    a META that says it was extruded by Microsoft FrontPage 5.0, so the
    likelihood of it working with anything that's WWW-compatible does not
    seem too high...
    </>
     
    Alan J. Flavell, Aug 26, 2005
    #3
  4. P.R.Brady

    P.R.Brady Guest

    P.R.Brady wrote:
    > I tried my web crawler/link checker on a neighbour's site ..


    many thanks both. Set my mind at rest!

    Phil
     
    P.R.Brady, Aug 26, 2005
    #4
  5. P.R.Brady

    P.R.Brady Guest

    Alan J. Flavell wrote:
    > On Fri, 26 Aug 2005, P.R.Brady wrote:
    >
    >
    >>I tried my web crawler/link checker on a neighbour's site and found problems
    >>with the button top right entitled 'cymraeg' in this page (and the same button
    >>on others):
    >>http://www.anglesey.gov.uk/english/community/health/smoke-free/smoke-free.htm

    >
    >
    > As soon as I click it, my browser throws an alert telling me that
    > the site wants to set a cookie.
    > However, even if I respond by allowing session cookies, I get an
    > error alert, telling me that "community could not be found".
    >
    >
    >>Internet Explorer seems very happy with the button and returns the Welsh
    >>version, but Netscape 7 is not entirely happy with it either.

    >
    >
    > That sounds ominouosly like the all too prevalent situation of a web
    > page that's been designed to work only with the operating system
    > compoment that thinks it's a browser, but not with a www-compatible
    > client agent.
    >
    >
    >>I think I need to extract the url:
    >>http://www.anglesey.gov.uk/cgi-bin/change_language.asp?language=cymraeg
    >>for the get as in the following code but I am getting 404 not found
    >>returned.

    >
    >
    > You've worked that out from the 'form method="GET" ...' which is used
    > to implement this switch, right?
    >



    That's right, but IE shows
    http://www.anglesey.gov.uk/cgi-bin/change_language.asp?language=cymraeg&x=26&y=11
    in it's url bar after successfully extracting the Welsh page. Adding
    the x and y don't help the perl reader.

    We're no fans of IE and MS web products here either.

    Phil
     
    P.R.Brady, Aug 26, 2005
    #5
  6. P.R.Brady

    Brian Wakem Guest

    P.R.Brady wrote:

    > I tried my web crawler/link checker on a neighbour's site and found
    > problems with the button top right entitled 'cymraeg' in this page (and
    > the same button on others):
    >

    http://www.anglesey.gov.uk/english/community/health/smoke-free/smoke-free.htm
    >
    > I think I need to extract the url:
    > http://www.anglesey.gov.uk/cgi-bin/change_language.asp?language=cymraeg
    > for the get as in the following code but I am getting 404 not found
    > returned.
    >
    > Internet Explorer seems very happy with the button and returns the Welsh
    > version, but Netscape 7 is not entirely happy with it either.
    >
    > Where is the problem? My hand extraction of the target url, the code
    > below or an issue in the host?
    >
    > Regards



    All UK government website are poorly written by 10-a-penny frontpage
    monkeys. I had the misfortune of automating some processes through one
    particular government website. They told me before I started that the site
    would only work in IE. Well it didn't work very well in IE and produced
    random errors all over the place. Eventually I gave up and told them to
    fix their site before I would try again.



    --
    Brian Wakem
    Email: http://homepage.ntlworld.com/b.wakem/myemail.png
     
    Brian Wakem, Aug 26, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. LIN

    User Agent

    LIN, Aug 14, 2003, in forum: ASP .Net
    Replies:
    3
    Views:
    466
    James J. Foster
    Aug 14, 2003
  2. Colin
    Replies:
    0
    Views:
    323
    Colin
    Dec 1, 2003
  3. Bumble
    Replies:
    2
    Views:
    118
    Tad McClellan
    Feb 28, 2004
  4. bhabs
    Replies:
    2
    Views:
    383
    Tad J McClellan
    Feb 13, 2008
  5. Luke Matuszewski
    Replies:
    8
    Views:
    641
    Luke Matuszewski
    Dec 2, 2005
Loading...

Share This Page