A couple of vague LWP questions

Discussion in 'Perl Misc' started by Franklin H., Apr 25, 2005.

  1. Franklin H.

    Franklin H. Guest

    1) When using LWP::Simple to grab a webpage the GET request
    occasionally and irreproducibly appears to hang and does not return.
    Any clue as to why this could conceivably occur? There doesn't appear
    to be a way to set the request timeout with this particular module but
    perhaps someone may know of a workaround?

    2) When using LWP::UserAgent to grab the same webpage as above the
    webserver somehow seems able to recognizes the request as coming from
    an "automated tool". Any idea why this might possibly occure with
    LWP::UserAgent but not with LWP::Simple?

    TYIA,
    Fr.
     
    Franklin H., Apr 25, 2005
    #1
    1. Advertising

  2. Franklin H.

    Franklin H. Guest

    > 2) When using LWP::UserAgent to grab the same webpage as above the
    > webserver somehow seems able to recognizes the request as coming from


    > an "automated tool". Any idea why this might possibly occure with
    > LWP::UserAgent but not with LWP::Simple?


    It would appear that the trick here is to set USERAGENt to something
    other than the default "libwww-perl/#.##". Arbitrarily I chise:

    $ua->agent('Mozilla/5.001');
     
    Franklin H., Apr 25, 2005
    #2
    1. Advertising

  3. Franklin H.

    Franklin H. Guest


    > 2) When using LWP::UserAgent to grab the same webpage as above the
    > webserver somehow seems able to recognizes the request as coming from


    > an "automated tool". Any idea why this might possibly occure with
    > LWP::UserAgent but not with LWP::Simple?


    It would appear that the trick here is to set USERAGENT to something
    other than the default "libwww-perl/#.##".

    Arbitrarily I chose: $ua->agent('Mozilla/5.001');
     
    Franklin H., Apr 25, 2005
    #3
  4. Franklin H.

    Brian Wakem Guest

    Franklin H. wrote:

    >> 2) When using LWP::UserAgent to grab the same webpage as above the
    >> webserver somehow seems able to recognizes the request as coming from

    >
    >> an "automated tool". Any idea why this might possibly occure with
    >> LWP::UserAgent but not with LWP::Simple?

    >
    > It would appear that the trick here is to set USERAGENt to something
    > other than the default "libwww-perl/#.##". Arbitrarily I chise:
    >
    > $ua->agent('Mozilla/5.001');



    If you are trying to blend in with normal traffic then I suggest using -

    Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

    - which is IE6 on Windows XP.


    The answer to your other question is either use LWP::UserAgent and use the
    timeout function provdied ( $ua->timeout( $secs ) ), or use alarm.

    eval {
    local $SIG{ALRM} = sub { die "timeout" };
    alarm $secs;
    $response = get($url);
    alarm 0;
    };
    if ($@ =~ m/timeout/) {
    # timed out
    }



    --
    Brian Wakem
     
    Brian Wakem, Apr 25, 2005
    #4
  5. Franklin H.

    Franklin H. Guest

    Well I am tryting t9o make this platform independent and as such would
    hate to run into problems with $SIG{ALRM} on XP.

    Similarly, mightn't "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
    " be suspicious if the request came from a LINUX OS?
     
    Franklin H., Apr 25, 2005
    #5
  6. In article <>,
    Franklin H. <> wrote:
    >1) When using LWP::Simple to grab a webpage the GET request
    >occasionally and irreproducibly appears to hang and does not return.
    >Any clue as to why this could conceivably occur? There doesn't appear
    >to be a way to set the request timeout with this particular module but
    >perhaps someone may know of a workaround?
    >


    LWP::Simple's is built on LWP::UserAgent so you can import
    $ua and invoke a timeout,e.g:

    use LWP qw($ua); $ua->timeout(10);

    See LWP::Simple doc for discussion of above.

    >2) When using LWP::UserAgent to grab the same webpage as above the
    >webserver somehow seems able to recognizes the request as coming from
    >an "automated tool". Any idea why this might possibly occure with
    >LWP::UserAgent but not with LWP::Simple?
    >


    Some servers may be checking the user agent id. No idea why
    LWP::Simple would slip by if that's the case. Again see
    LWP::UserAgent vs LWP::Simple docs or how to alter setting.

    hth,
    --
    Charles DeRykus
     
    Charles DeRykus, Apr 25, 2005
    #6
  7. Franklin H. wrote:
    > Well I am tryting t9o make this platform independent and as such would
    > hate to run into problems with $SIG{ALRM} on XP.
    >
    > Similarly, mightn't "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
    > " be suspicious if the request came from a LINUX OS?
    >

    Nah. The remote server only sees an HTTP request: it has no idea from
    what type of system the request originated, other than what is in the
    HTTP headers.

    Mark
     
    Mark Clements, Apr 25, 2005
    #7
  8. Franklin H.

    Joe Smith Guest

    Charles DeRykus wrote:

    > Some servers may be checking the user agent id. No idea why
    > LWP::Simple would slip by if that's the case.


    perldoc LWP::UserAgent
    the default agent identifier is "libwww-perl/#.##"

    Line 43 of LWP/Simple.pm
    $ua->agent("LWP::Simple/$LWP::VERSION");
     
    Joe Smith, Apr 25, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VisionSet
    Replies:
    4
    Views:
    390
    John C. Bollinger
    Dec 6, 2004
  2. Ali Syed
    Replies:
    3
    Views:
    586
    Mark McIntyre
    Oct 13, 2004
  3. akennis
    Replies:
    7
    Views:
    803
    Peter
    Jul 26, 2006
  4. Chad

    vague lvalue vs rvalue question

    Chad, Apr 16, 2008, in forum: C Programming
    Replies:
    0
    Views:
    269
  5. Bill Youngman
    Replies:
    1
    Views:
    208
    Bill Youngman
    Oct 11, 2005
Loading...

Share This Page