Need to extract text between two HTML comments

M

mmk16

I need to extract all text in a HTML page between two patterns like
<!-- cachedResultsStart --> and <!-- cachedResultsEnd -->.

I am trying something like

my $start_tag = '<!-- cachedResultsStart -->';
my $end_tag = '<!-- cachedResultsEnd -->';

my $num_lines = /$start_tag/ .. /$end_tag/ ;

if ($num_lines ) {
/$start_tag(.+?)$end_tag/m ;
$this_is_what_i_need = $1 ;

}

print $this_is_what_id_need ;

I used this approach from an earlier posting by Uri Guttman
However, this is not working for me.
 
G

Gunnar Hjalmarsson

mmk16 said:
I need to extract all text in a HTML page between two patterns like
<!-- cachedResultsStart --> and <!-- cachedResultsEnd -->.

I am trying something like
--------------^^^^^^^^^^^^^^

Okay. Would you mind letting us know what you are actually trying?
 
T

Tad McClellan

mmk16 said:
$this_is_what_i_need = $1 ;
print $this_is_what_id_need ;


One of these things is not like the other, one of these things
just isn't the same...


Imagine all the time you would have saved if you had "use strict"
turned on. It would have found that problem right away.

Imagine all the time you would have saved _us_ if you had asked
for a machine's help *before* asking thousands of people for help.
 
U

Uri Guttman

TM> One of these things is not like the other, one of these things
TM> just isn't the same...


and he is blaming ME for inspiring him. i tend to strictness and win all
my variable spelling bees.

TM> Imagine all the time you would have saved _us_ if you had asked
TM> for a machine's help *before* asking thousands of people for help.

i ask for help from my machine all the time but it still can't tell me
the picks for the next powerball drawing.

uri
 
M

mmk16

Gunnar Hjalmarsson said:
--------------^^^^^^^^^^^^^^

Okay. Would you mind letting us know what you are actually trying?

Between <!-- cachedResultsStart --> and <!-- cachedResultsEnd -->
the html page will contain a HTML table with data that I am
interested in. I wish to further process this using HTML::TableExtract
 
T

Tore Aursand

I need to extract all text in a HTML page between two patterns like
<!-- cachedResultsStart --> and <!-- cachedResultsEnd -->.

Working with HTML can be some scary rocket science, but here goes:

my $start = '<!-- cachedResultsStart -->';
my $end = '<!-- cachedResultsEnd -->';
if ( $html =~ m,$start(.*)$end,g ) {
print $1;
}

No need to know how many lines or anything like that. Just grab what you
need.

Problems arise, however, if the HTML code contains more than one occurance
of the expression above. That's when you should consider doing a while()
to match all of them.
 
R

Richard Gration

I need to extract all text in a HTML page between two patterns like <!--
cachedResultsStart --> and <!-- cachedResultsEnd -->. I am trying
something like
my $start_tag = '<!-- cachedResultsStart -->'; my $end_tag = '<!--
cachedResultsEnd -->'; my $num_lines = /$start_tag/ .. /$end_tag/ ;
if ($num_lines ) {
/$start_tag(.+?)$end_tag/m ;
$this_is_what_i_need = $1 ;

}
print $this_is_what_id_need ;
I used this approach from an earlier posting by Uri Guttman However,
this is not working for me.

May I suggest using the HTML::parser module, specifically the
HTML::pullParser which will easily allow you to get what you want

R
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top