HTML::TokeParser, problems 'getting' 'til end-tag

P

Patrick Joly

I am having a hard time parsing HTML with HTML::TokeParser, i.e.
trying to fetch text up to the next '/p' end tag in a string with the
get_text() method. In my example, get_text('/p') fetches up to the first
</p> but the html stream get emptied as a by-product. That is, I can no
longer get any more text after the first get_text in a loop. Any ideas as
to what I might be doing wrong in Section-A below? I wasted all afternoon
on this and I am spent. TIA!

use strict;
use warnings;
use HTML::TokeParser;
my ($t, $p, $tok, $html);
$html = q!
<p><i>¼ cup olive oil</i></p>
<b>¼ cup canned tuna</b>
<p><i>and chopped liver</i></p>!;

#
# Section - A
# there should be more than 1 iteration, but there isn't

$p = new HTML::TokeParser( \$html );
$p->unbroken_text(1);
my $i = 0;
while (my $txt = $p->get_text( '/p' ) ) {
print $txt;
print "\n(iteration" . ++$i . ")\n\n";
}
print "see, no text left after first iteration\n\n";

#
# Section - B
# the following shows there indeed are 2 separate
# '/p' tags and the Text tokens print just fine

$p = new HTML::TokeParser( \$html );
$p->unbroken_text(1);
$i = 0;
while (my $tok = $p->get_token ) {
print 'Token is: ' . $tok->[0] . " -> " . $tok->[1];
print "\n(iteration" . ++$i . ")\n\n";
}

__END__

Here is the output I get:
-------------------------

+ cup olive oil
(iteration1)

see, no text left after first iteration

Token is: T ->

(iteration1)

Token is: S -> p
(iteration2)

Token is: S -> i
(iteration3)

Token is: T -> ¼ cup olive oil
(iteration4)

Token is: E -> i
(iteration5)

Token is: E -> p
(iteration6)

Token is: T ->

(iteration7)

Token is: S -> b
(iteration8)

Token is: T -> ¼ cup canned tuna
(iteration9)

Token is: E -> b
(iteration10)

Token is: T ->

(iteration11)

Token is: S -> p
(iteration12)

Token is: S -> i
(iteration13)

Token is: T -> and chopped liver
(iteration14)

Token is: E -> i
(iteration15)

Token is: E -> p
(iteration16)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top