HTML::TokeParser, problems 'getting' 'til end-tag

Patrick Joly · Feb 25, 2004

I am having a hard time parsing HTML with HTML::TokeParser, i.e.
trying to fetch text up to the next '/p' end tag in a string with the
get_text() method. In my example, get_text('/p') fetches up to the first
 but the html stream get emptied as a by-product. That is, I can no
longer get any more text after the first get_text in a loop. Any ideas as
to what I might be doing wrong in Section-A below? I wasted all afternoon
on this and I am spent. TIA!

use strict;
use warnings;
use HTML::TokeParser;
my ($t, $p, $tok, $html);
$html = q!
¼ cup olive oil
¼ cup canned tuna
and chopped liver!;

#
# Section - A
# there should be more than 1 iteration, but there isn't

$p = new HTML::TokeParser( \$html );
$p->unbroken_text(1);
my $i = 0;
while (my $txt = $p->get_text( '/p' ) ) {
print $txt;
print "\n(iteration" . ++$i . ")\n\n";
}
print "see, no text left after first iteration\n\n";

#
# Section - B
# the following shows there indeed are 2 separate
# '/p' tags and the Text tokens print just fine

$p = new HTML::TokeParser( \$html );
$p->unbroken_text(1);
$i = 0;
while (my $tok = $p->get_token ) {
print 'Token is: ' . $tok->[0] . " -> " . $tok->[1];
print "\n(iteration" . ++$i . ")\n\n";
}

__END__

Here is the output I get:
-------------------------

+ cup olive oil
(iteration1)

see, no text left after first iteration

Token is: T ->

(iteration1)

Token is: S -> p
(iteration2)

Token is: S -> i
(iteration3)

Token is: T -> ¼ cup olive oil
(iteration4)

Token is: E -> i
(iteration5)

Token is: E -> p
(iteration6)

Token is: T ->

(iteration7)

Token is: S -> b
(iteration8)

Token is: T -> ¼ cup canned tuna
(iteration9)

Token is: E -> b
(iteration10)

Token is: T ->

(iteration11)

Token is: S -> p
(iteration12)

Token is: S -> i
(iteration13)

Token is: T -> and chopped liver
(iteration14)

Token is: E -> i
(iteration15)

Token is: E -> p
(iteration16)

Can't solve problems! please Help	0	Sep 26, 2022
Lexical Analysis on C++	1	Oct 31, 2023
Style Tag Problem	1	May 16, 2020
Return HTML between tags with HTML::TokeParser ?	4	Feb 23, 2005
HTML::TokeParser; __DATA__ as a filehandle	2	Oct 24, 2006
Blue J Ciphertext Program	2	Nov 22, 2023
How do I follow links stored in an array?	3	Apr 29, 2008
Python client/server that reads HTML body from server	1	Apr 12, 2023

HTML::TokeParser, problems 'getting' 'til end-tag

Patrick Joly

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads