Need to extract text between two HTML comments

Discussion in 'Perl Misc' started by mmk16, Jan 22, 2004.

  1. mmk16

    mmk16 Guest

    I need to extract all text in a HTML page between two patterns like
    <!-- cachedResultsStart --> and <!-- cachedResultsEnd -->.

    I am trying something like

    my $start_tag = '<!-- cachedResultsStart -->';
    my $end_tag = '<!-- cachedResultsEnd -->';

    my $num_lines = /$start_tag/ .. /$end_tag/ ;

    if ($num_lines ) {
    /$start_tag(.+?)$end_tag/m ;
    $this_is_what_i_need = $1 ;

    }

    print $this_is_what_id_need ;

    I used this approach from an earlier posting by Uri Guttman
    However, this is not working for me.
    mmk16, Jan 22, 2004
    #1
    1. Advertising

  2. mmk16 wrote:
    > I need to extract all text in a HTML page between two patterns like
    > <!-- cachedResultsStart --> and <!-- cachedResultsEnd -->.
    >
    > I am trying something like

    --------------^^^^^^^^^^^^^^

    Okay. Would you mind letting us know what you are actually trying?

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Jan 22, 2004
    #2
    1. Advertising

  3. mmk16 <> wrote:

    > $this_is_what_i_need = $1 ;


    > print $this_is_what_id_need ;



    One of these things is not like the other, one of these things
    just isn't the same...


    Imagine all the time you would have saved if you had "use strict"
    turned on. It would have found that problem right away.

    Imagine all the time you would have saved _us_ if you had asked
    for a machine's help *before* asking thousands of people for help.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Jan 22, 2004
    #3
  4. mmk16 <> wrote:

    > /$start_tag(.+?)$end_tag/m ;



    The "m" modifier is a no-op there. It does nothing.

    Perhaps you meant //s instead?


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Jan 22, 2004
    #4
  5. mmk16

    Uri Guttman Guest

    >>>>> "TM" == Tad McClellan <> writes:

    TM> mmk16 <> wrote:
    >> $this_is_what_i_need = $1 ;


    >> print $this_is_what_id_need ;



    TM> One of these things is not like the other, one of these things
    TM> just isn't the same...


    and he is blaming ME for inspiring him. i tend to strictness and win all
    my variable spelling bees.

    TM> Imagine all the time you would have saved _us_ if you had asked
    TM> for a machine's help *before* asking thousands of people for help.

    i ask for help from my machine all the time but it still can't tell me
    the picks for the next powerball drawing.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Jan 22, 2004
    #5
  6. mmk16

    mmk16 Guest

    Gunnar Hjalmarsson <> wrote in message news:<bung5n$ivjq5$-berlin.de>...
    > mmk16 wrote:
    > > I need to extract all text in a HTML page between two patterns like
    > > <!-- cachedResultsStart --> and <!-- cachedResultsEnd -->.
    > >
    > > I am trying something like

    > --------------^^^^^^^^^^^^^^
    >
    > Okay. Would you mind letting us know what you are actually trying?


    Between <!-- cachedResultsStart --> and <!-- cachedResultsEnd -->
    the html page will contain a HTML table with data that I am
    interested in. I wish to further process this using HTML::TableExtract
    mmk16, Jan 22, 2004
    #6
  7. mmk16

    Tore Aursand Guest

    On Wed, 21 Jan 2004 19:25:10 -0800, mmk16 wrote:
    > I need to extract all text in a HTML page between two patterns like
    > <!-- cachedResultsStart --> and <!-- cachedResultsEnd -->.


    Working with HTML can be some scary rocket science, but here goes:

    my $start = '<!-- cachedResultsStart -->';
    my $end = '<!-- cachedResultsEnd -->';
    if ( $html =~ m,$start(.*)$end,g ) {
    print $1;
    }

    No need to know how many lines or anything like that. Just grab what you
    need.

    Problems arise, however, if the HTML code contains more than one occurance
    of the expression above. That's when you should consider doing a while()
    to match all of them.


    --
    Tore Aursand <>
    "I know not with what weapons World War 3 will be fought, but World War
    4 will be fought with sticks and stones." -- Albert Einstein
    Tore Aursand, Jan 22, 2004
    #7
  8. In article <>, "mmk16"
    <> wrote:


    > I need to extract all text in a HTML page between two patterns like <!--
    > cachedResultsStart --> and <!-- cachedResultsEnd -->. I am trying
    > something like
    > my $start_tag = '<!-- cachedResultsStart -->'; my $end_tag = '<!--
    > cachedResultsEnd -->'; my $num_lines = /$start_tag/ .. /$end_tag/ ;
    > if ($num_lines ) {
    > /$start_tag(.+?)$end_tag/m ;
    > $this_is_what_i_need = $1 ;
    >
    > }
    > print $this_is_what_id_need ;
    > I used this approach from an earlier posting by Uri Guttman However,
    > this is not working for me.


    May I suggest using the HTML::parser module, specifically the
    HTML::pullParser which will easily allow you to get what you want

    R
    Richard Gration, Jan 22, 2004
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    644
  2. Replies:
    0
    Views:
    1,088
  3. Monk
    Replies:
    10
    Views:
    1,430
    Michael Wojcik
    Apr 20, 2005
  4. Mladen
    Replies:
    5
    Views:
    159
    Peter Scott
    Feb 22, 2011
  5. Replies:
    4
    Views:
    570
    Dr John Stockton
    Jun 3, 2006
Loading...

Share This Page