LWP::Simple get() refined problem

Discussion in 'Perl Misc' started by Hon Guin Lee - Web Producer - SMI Marketing, Sep 29, 2003.

  1. Hi all,

    The LWP::get() function manages to retrieve some of the localised web document content from local web servers displayed on my web browser using Mozilla 1.1, for URL's without the www.
    However for URLs that begin with www, the get() functon just returns an undef (shown in subroutine get_url) hence the web browser unables to display the web content.

    To narrow the problem further, I used some of the other functions such as getstore(url,file) and mirror(url,file) where I replace url with shift, and a filename specified, but the LWP::Debug just throws up: -

    --------------------------------------------------------------------------

    LWP::UserAgent::new: () LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::UserAgent::request: Simple response: Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/redirect.jsp LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::protocol::collect: read 57 bytes LWP::UserAgent::request: Simple response:
    Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/location.jsp LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::protocol::collect: read 19 bytes LWP::UserAgent::request: Simple response: Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/redirect.jsp?location=Non-US LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::protocol::collect: read 57
    bytes LWP::UserAgent::request: Simple response: Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/cachedir/cachedtab_Non-US_NEWS.html LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::UserAgent::request: Simple response: Internal Server Error 500

    --This is a URL specifed for the local web server requesting some form of proxy.

    --------------------------------------------------------------------------

    LWP::UserAgent::new: () LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://www.sun.com LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::UserAgent::request: Simple response: Internal Server Error 500

    --This is a URL specified for a www web docuument.

    --------------------------------------------------------------------------

    Any solutions/reasons why the get() function cannot retrieve unlocalised web content?

    Here is the script: -

    --------------------------------------------------------------------------

    #!/usr/local/perl5.6/bin/perl -wT

    # perl script to get remote
    # urls and strip them and
    # upload them to teamsite

    use LWP::Simple qw(!head);
    use LWP::Debug '+';
    use CGI qw:)standard); # then only CGI.pm defines a head()
    use strict;

    print "Content-type: text/html\n\n";

    my $old_handle;

    $|++; #sets $| for STDOUT
    $old_handle = select( STDERR ); #change to STDERR
    $|++; #sets $| for STDERR
    select( $old_handle ); #change back to STDOUT

    my ($url) = @_;
    my $lang;

    process_form();
    get_url($url);

    # Passes the data from the server,
    # and takes them onto the PERL script.

    sub process_form {

    $url = param('url');
    $url = "http://$url";
    $lang = param('lang');

    }

    # Retrieves the contents of the
    # specified URL.

    sub get_url {

    my $page = get(shift);

    unless (defined $page) {
    print "Couldn't retrieve $url";
    }
    else {
    print "$page\n";
    }

    }
    --------------------------------------------------------------------------
     
    Hon Guin Lee - Web Producer - SMI Marketing, Sep 29, 2003
    #1
    1. Advertising

  2. Hon Guin Lee - Web Producer - SMI Marketing wrote:

    > Hi all,
    >
    > The LWP::get() function manages to retrieve some of the localised web document content from local web servers displayed on my web browser using Mozilla 1.1, for URL's without the www.
    > However for URLs that begin with www, the get() functon just returns an undef (shown in subroutine get_url) hence the web browser unables to display the web content.
    >
    > To narrow the problem further, I used some of the other functions such as getstore(url,file) and mirror(url,file) where I replace url with shift, and a filename specified, but the LWP::Debug just throws up: -
    >
    > --------------------------------------------------------------------------
    >
    > LWP::UserAgent::new: () LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::UserAgent::request: Simple response: Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/redirect.jsp LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::protocol::collect: read 57 bytes LWP::UserAgent::request: Simple response:
    > Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/location.jsp LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::protocol::collect: read 19 bytes LWP::UserAgent::request: Simple response: Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/redirect.jsp?location=Non-US LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::protocol::collect: read 57
    > bytes LWP::UserAgent::request: Simple response: Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/cachedir/cachedtab_Non-US_NEWS.html LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::UserAgent::request: Simple response: Internal Server Error 500
    >
    > --This is a URL specifed for the local web server requesting some form of proxy.
    >
    > --------------------------------------------------------------------------
    >
    > LWP::UserAgent::new: () LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://www.sun.com LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::UserAgent::request: Simple response: Internal Server Error 500
    >
    > --This is a URL specified for a www web docuument.
    >
    > --------------------------------------------------------------------------
    >
    > Any solutions/reasons why the get() function cannot retrieve unlocalised web content?
    >
    > Here is the script: -
    >
    > --------------------------------------------------------------------------
    >
    > #!/usr/local/perl5.6/bin/perl -wT
    >
    > # perl script to get remote
    > # urls and strip them and
    > # upload them to teamsite
    >
    > use LWP::Simple qw(!head);
    > use LWP::Debug '+';
    > use CGI qw:)standard); # then only CGI.pm defines a head()
    > use strict;
    >
    > print "Content-type: text/html\n\n";
    >
    > my $old_handle;
    >
    > $|++; #sets $| for STDOUT
    > $old_handle = select( STDERR ); #change to STDERR
    > $|++; #sets $| for STDERR
    > select( $old_handle ); #change back to STDOUT
    >
    > my ($url) = @_;
    > my $lang;
    >
    > process_form();
    > get_url($url);
    >
    > # Passes the data from the server,
    > # and takes them onto the PERL script.
    >
    > sub process_form {
    >
    > $url = param('url');
    > $url = "http://$url";
    > $lang = param('lang');
    >
    > }
    >
    > # Retrieves the contents of the
    > # specified URL.
    >
    > sub get_url {
    >
    > my $page = get(shift);
    >
    > unless (defined $page) {
    > print "Couldn't retrieve $url";
    > }
    > else {
    > print "$page\n";
    > }
    >
    > }
    > --------------------------------------------------------------------------


    Do you use a proxy to display web content from outside when using your
    browser?

    perldoc LWP says:
    ENVIRONMENT
    The following environment variables are used by LWP:
    <snip>
    http_proxy
    ftp_proxy
    xxx_proxy
    no_proxy
    These environment variables can be set to enable communication
    through a proxy server. See the description of the "env_proxy"
    method in LWP::UserAgent.

    It /might/ help to specify a proxy server.

    Good luck,
    Dominik
     
    Dominik Seelow, Sep 29, 2003
    #2
    1. Advertising

  3. Dominik Seelow wrote:
    >
    > Hon Guin Lee - Web Producer - SMI Marketing wrote:
    >
    > > Hi all,
    > >
    > > The LWP::get() function manages to retrieve some of the localised web document content from local web servers displayed on my web browser using Mozilla 1.1, for URL's without the www.
    > > However for URLs that begin with www, the get() functon just returns an undef (shown in subroutine get_url) hence the web browser unables to display the web content.
    > >
    > > To narrow the problem further, I used some of the other functions such as getstore(url,file) and mirror(url,file) where I replace url with shift, and a filename specified, but the LWP::Debug just throws up: -
    > >
    > > --------------------------------------------------------------------------
    > >
    > > LWP::UserAgent::new: () LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::UserAgent::request: Simple response: Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/redirect.jsp LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::protocol::collect: read 57 bytes LWP::UserAgent::request: Simple response:
    > > Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/location.jsp LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::protocol::collect: read 19 bytes LWP::UserAgent::request: Simple response: Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/redirect.jsp?location=Non-US LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::protocol::collect: read 57
    > > bytes LWP::UserAgent::request: Simple response: Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/cachedir/cachedtab_Non-US_NEWS.html LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::UserAgent::request: Simple response: Internal Server Error 500
    > >
    > > --This is a URL specifed for the local web server requesting some form of proxy.
    > >
    > > --------------------------------------------------------------------------
    > >
    > > LWP::UserAgent::new: () LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://www.sun.com LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::UserAgent::request: Simple response: Internal Server Error 500
    > >
    > > --This is a URL specified for a www web docuument.
    > >
    > > --------------------------------------------------------------------------
    > >
    > > Any solutions/reasons why the get() function cannot retrieve unlocalised web content?
    > >
    > > Here is the script: -
    > >
    > > --------------------------------------------------------------------------
    > >
    > > #!/usr/local/perl5.6/bin/perl -wT
    > >
    > > # perl script to get remote
    > > # urls and strip them and
    > > # upload them to teamsite
    > >
    > > use LWP::Simple qw(!head);
    > > use LWP::Debug '+';
    > > use CGI qw:)standard); # then only CGI.pm defines a head()
    > > use strict;
    > >
    > > print "Content-type: text/html\n\n";
    > >
    > > my $old_handle;
    > >
    > > $|++; #sets $| for STDOUT
    > > $old_handle = select( STDERR ); #change to STDERR
    > > $|++; #sets $| for STDERR
    > > select( $old_handle ); #change back to STDOUT
    > >
    > > my ($url) = @_;
    > > my $lang;
    > >
    > > process_form();
    > > get_url($url);
    > >
    > > # Passes the data from the server,
    > > # and takes them onto the PERL script.
    > >
    > > sub process_form {
    > >
    > > $url = param('url');
    > > $url = "http://$url";
    > > $lang = param('lang');
    > >
    > > }
    > >
    > > # Retrieves the contents of the
    > > # specified URL.
    > >
    > > sub get_url {
    > >
    > > my $page = get(shift);
    > >
    > > unless (defined $page) {
    > > print "Couldn't retrieve $url";
    > > }
    > > else {
    > > print "$page\n";
    > > }
    > >
    > > }
    > > --------------------------------------------------------------------------

    >
    > Do you use a proxy to display web content from outside when using your
    > browser?
    >
    > perldoc LWP says:
    > ENVIRONMENT
    > The following environment variables are used by LWP:
    > <snip>
    > http_proxy
    > ftp_proxy
    > xxx_proxy
    > no_proxy
    > These environment variables can be set to enable communication
    > through a proxy server. See the description of the "env_proxy"
    > method in LWP::UserAgent.
    >
    > It /might/ help to specify a proxy server.
    >
    > Good luck,
    > Dominik


    I have used a proxy server, usin the LWP::UserAgent module, from the new script as shown below: -

    ------------------------------------------------------------------------

    #!/usr/local/perl5.6/bin/perl -wT

    use LWP::UserAgent;
    use LWP::Debug '+';
    use CGI ':standard';
    use strict;

    print "Content-type: text/html\n\n";

    my $content;
    my $ua = new LWP::UserAgent;
    my $old_handle;
    my $url = param('url');
    my $lang = param('lang');

    $|++; #sets $| for STDOUT
    $old_handle = select( STDERR ); #change to STDERR
    $|++; #sets $| for STDERR
    select( $old_handle ); #change back to STDOUT

    $ua->proxy(['http','https','ftp'], 'file:///usr/dist/share/proxy_config/uk.pac'); # set proxy
    $ua->env_proxy(); # load proxy info from environment variables

    $ua->agent("Mozilla/1.1");

    my $req = new HTTP::Request GET => $url;
    my $res = $ua->request($req);

    if ($res->is_success)
    {
    $content= $res->content;
    }

    else
    {
    die "Could not get content";
    }

    ----------------------------------------------------------------------

    The following error messages were shown: -

    LWP::UserAgent::proxy: ARRAY(0x2d5e60) file:///usr/dist/share/proxy_config/uk.pac LWP::UserAgent::proxy: http file:///usr/dist/share/proxy_config/uk.pac
    LWP::UserAgent::proxy: https file:///usr/dist/share/proxy_config/uk.pac LWP::UserAgent::proxy: ftp file:///usr/dist/share/proxy_config/uk.pac LWP::UserAgent::request: ()
    LWP::UserAgent::request: Simple response: Bad Request Could not get content at /export/home/sdltool/www/cgi-bin/automation1-2.cgi line 40.

    ----------------------------------------------------------------------
     
    Hon Guin Lee - Web Producer - SMI Marketing, Sep 29, 2003
    #3
  4. Hon Guin Lee - Web Producer - SMI Marketing

    ko Guest

    Hon Guin Lee - Web Producer - SMI Marketing wrote:
    > Dominik Seelow wrote:
    >
    >>Hon Guin Lee - Web Producer - SMI Marketing wrote:
    >>
    >>
    >>>Hi all,
    >>>
    >>>The LWP::get() function manages to retrieve some of the localised web document content from local web servers displayed on my web browser using Mozilla 1.1, for URL's without the www.
    >>>However for URLs that begin with www, the get() functon just returns an undef (shown in subroutine get_url) hence the web browser unables to display the web content.
    >>>
    >>>To narrow the problem further, I used some of the other functions such as getstore(url,file) and mirror(url,file) where I replace url with shift, and a filename specified, but the LWP::Debug just throws up: -


    [snip]

    >>>--------------------------------------------------------------------------

    >>
    >>Do you use a proxy to display web content from outside when using your
    >>browser?
    >>
    >>perldoc LWP says:
    >>ENVIRONMENT
    >> The following environment variables are used by LWP:
    >><snip>
    >> http_proxy
    >> ftp_proxy
    >> xxx_proxy
    >> no_proxy
    >> These environment variables can be set to enable communication
    >> through a proxy server. See the description of the "env_proxy"
    >> method in LWP::UserAgent.
    >>
    >>It /might/ help to specify a proxy server.
    >>
    >>Good luck,
    >>Dominik

    >
    >
    > I have used a proxy server, usin the LWP::UserAgent module, from the new script as shown below: -
    >
    > ------------------------------------------------------------------------
    >
    > #!/usr/local/perl5.6/bin/perl -wT
    >
    > use LWP::UserAgent;
    > use LWP::Debug '+';
    > use CGI ':standard';
    > use strict;
    >
    > print "Content-type: text/html\n\n";
    >
    > my $content;
    > my $ua = new LWP::UserAgent;
    > my $old_handle;
    > my $url = param('url');
    > my $lang = param('lang');
    >
    > $|++; #sets $| for STDOUT
    > $old_handle = select( STDERR ); #change to STDERR
    > $|++; #sets $| for STDERR
    > select( $old_handle ); #change back to STDOUT
    >
    > $ua->proxy(['http','https','ftp'], 'file:///usr/dist/share/proxy_config/uk.pac'); # set proxy


    I think that you need to pass a URL to proxy(), not a filename - the
    examples in the docs use URLs. In fact, if you print out the server
    response message with the status_line() method you get (at least I did
    when trying to use a filename):

    400 You can not proxy through the filesystem


    > $ua->env_proxy(); # load proxy info from environment variables


    I'm not sure this will work for you, as this is a CGI script and the
    method is loading the *_proxy environment variables from the user the
    *web server* is running under. It might be better to load your settings
    with proxy() and leave this out.

    >
    > $ua->agent("Mozilla/1.1");
    >
    > my $req = new HTTP::Request GET => $url;
    > my $res = $ua->request($req);
    >
    > if ($res->is_success)
    > {
    > $content= $res->content;
    > }
    >
    > else
    > {
    > die "Could not get content";
    > }
    >
    > ----------------------------------------------------------------------
    >
    > The following error messages were shown: -
    >
    > LWP::UserAgent::proxy: ARRAY(0x2d5e60) file:///usr/dist/share/proxy_config/uk.pac LWP::UserAgent::proxy: http file:///usr/dist/share/proxy_config/uk.pac
    > LWP::UserAgent::proxy: https file:///usr/dist/share/proxy_config/uk.pac LWP::UserAgent::proxy: ftp file:///usr/dist/share/proxy_config/uk.pac LWP::UserAgent::request: ()
    > LWP::UserAgent::request: Simple response: Bad Request Could not get content at /export/home/sdltool/www/cgi-bin/automation1-2.cgi line 40.
    >
    > ----------------------------------------------------------------------


    You should also probably call the no_proxy() method somewhere to disable
    proxying for requests to your internal network.

    HTH - keith
     
    ko, Sep 30, 2003
    #4
  5. Hon Guin Lee - Web Producer - SMI Marketing

    Bart Lateur Guest

    Hon Guin Lee - Web Producer - SMI Marketing wrote:

    >$ua->proxy(['http','https','ftp'], 'file:///usr/dist/share/proxy_config/uk.pac'); # set proxy


    I don't think that will work. A .pac file is typically a Javascript
    source file.

    Try using a real URL for the proxy.

    --
    Bart.
     
    Bart Lateur, Sep 30, 2003
    #5
  6. On Tue, 30 Sep 2003, Bart Lateur wrote:

    > Hon Guin Lee - Web Producer - SMI Marketing wrote:
    >
    > >$ua->proxy(['http','https','ftp'], 'file:///usr/dist/share/proxy_config/uk.pac'); # set proxy

    >
    > I don't think that will work. A .pac file is typically a Javascript
    > source file.


    Indeed...

    > Try using a real URL for the proxy.


    Well, that file:///... thingy is in some senses a "real URL": maybe
    it would be helpful to mention that the kind of URL that you had in
    mind was something like http://wwwcache.dom.example:8080/
    or http://11.22.33.44:8001/ , substituting appropriate DNS name or
    IP address and port number. A read of the lwp cookbook might also be
    helpful for the original poster.

    cheers
     
    Alan J. Flavell, Sep 30, 2003
    #6
  7. Hon Guin Lee - Web Producer - SMI Marketing

    Bart Lateur Guest

    Alan J. Flavell wrote:

    >> Try using a real URL for the proxy.

    >
    >Well, that file:///... thingy is in some senses a "real URL": maybe
    >it would be helpful to mention that the kind of URL that you had in
    >mind was something like http://wwwcache.dom.example:8080/
    >or http://11.22.33.44:8001/ , substituting appropriate DNS name or
    >IP address and port number.


    Indeed, I ment the real URL *for the proxy*, not to some configuration
    script.

    --
    Bart.
     
    Bart Lateur, Sep 30, 2003
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jeff
    Replies:
    1
    Views:
    316
    Chris Smith
    Dec 27, 2004
  2. Replies:
    3
    Views:
    413
    Daniel T.
    Oct 12, 2006
  3. mmoski
    Replies:
    7
    Views:
    373
  4. DZantow
    Replies:
    0
    Views:
    210
    DZantow
    Dec 20, 2011
  5. Hon Guin Lee - Web Producer - SMI Marketing

    LWP::Simple get() problem

    Hon Guin Lee - Web Producer - SMI Marketing, Sep 26, 2003, in forum: Perl Misc
    Replies:
    1
    Views:
    114
    Michael Budash
    Sep 26, 2003
Loading...

Share This Page