get (url) until match is found

Discussion in 'Perl Misc' started by Lydia Shawn, Jan 26, 2004.

  1. Lydia Shawn

    Lydia Shawn Guest

    hi,
    i need someones good advice solving the following problem:
    i am matching a html page for certain triggers and want to grep only
    the text in between two triggers.
    as soon as the last trigger has been matched, the html page should not
    be downloaded any further

    here is what i got, which works well, but it requires the entire html
    page do be downloaded before it starts matching:

    input: test.htm
    bla1
    trigger1 bla2 trigger2
    bla3

    output:
    bla2

    assuming bla3 is a long text, i do not wish to was bandwith
    downloading it, if the match has already been found.

    the script i wrote:

    require 5.004;
    use LWP::Simple;

    $return = get("http://test/test.htm");
    $before = 'trigger1';
    $after = 'trigger2';
    ($match) = $return =~ /$before(.*?)$after/si;
    print $match;


    is there a way i can combine the get command with something like
    "until match = anything" ?

    any help would be greatly appreciated!
    thanks in advance!
    lydia
     
    Lydia Shawn, Jan 26, 2004
    #1
    1. Advertising

  2. Lydia Shawn

    Ben Morrow Guest

    (Lydia Shawn) wrote:
    > i am matching a html page for certain triggers and want to grep only
    > the text in between two triggers.
    > as soon as the last trigger has been matched, the html page should not
    > be downloaded any further


    Read perldoc lwpcook "LARGE DOCUMENTS".

    Ben

    --
    Every twenty-four hours about 34k children die from the effects of poverty.
    Meanwhile, the latest estimate is that 2800 people died on 9/11, so it's like
    that image, that ghastly, grey-billowing, double-barrelled fall, repeated
    twelve times every day. Full of children. [Iain Banks]
     
    Ben Morrow, Jan 26, 2004
    #2
    1. Advertising

  3. (Lydia Shawn) writes:

    > i need someones good advice solving the following problem:


    Others have given you a fish, but I would like to show you how you
    could have caught it yourself...

    > $return = get("http://test/test.htm");
    > $before = 'trigger1';
    > $after = 'trigger2';
    > ($match) = $return =~ /$before(.*?)$after/si;
    > print $match;
    >
    >
    > is there a way i can combine the get command with something like
    > "until match = anything" ?


    So what you are saying is that you are using LWP::Simple and you need
    more control.

    So let's take a look at the fisrt paragraph of the DESCRIPTION perldoc
    of LWP::Simple.


    This interface is intended for those who want a simplified view
    of the libwww-perl library. It should also be suit- able for
    one-liners. If you need more control [...] you should use the
    full object oriented interface provided by the "LWP::UserAgent"
    module.

    If you now look at the SYNOPSIS of LWP::UserAgent you'll find an
    example like...

    $response = $ua->request($request, \&callback, 4096);

    sub callback { my($data, $response, $protocol) = @_; .... }

    Well that looks like what you were after.

    --
    \\ ( )
    . _\\__[oo
    .__/ \\ /\@
    . l___\\
    # ll l\\
    ###LL LL\\
     
    Brian McCauley, Jan 26, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jody Greening
    Replies:
    5
    Views:
    690
    Jody Greening
    Jan 6, 2005
  2. Jody Greening
    Replies:
    0
    Views:
    355
    Jody Greening
    Jan 6, 2005
  3. anon1m0us

    Match until new line

    anon1m0us, Feb 7, 2007, in forum: Ruby
    Replies:
    4
    Views:
    100
    Brian Candler
    Feb 7, 2007
  4. John

    regex - match anything until

    John, Sep 10, 2009, in forum: Perl Misc
    Replies:
    1
    Views:
    153
  5. Lester
    Replies:
    2
    Views:
    98
    Lester
    Sep 25, 2006
Loading...

Share This Page