Return HTML between tags with HTML::TokeParser ?

M

Maqo

Is it possible to use HTML::TokeParser to return the raw HTML between
two <A> tags, as opposed to just the text? My source file contains
several blocks of code--containing anchor links for each--that I'm
trying to extract by section while maintaining formatting.

My code:

my $p = HTML::TokeParser->new("file.txt" || die "Can't open file.");
while (my $t = $p->get_tag("a")) {
my $name = $t->[1]{name};
next unless $name && ($name eq "anchor");
print "$name : " . $p->get_text("a");

Example HTML source:

<A NAME='anchor1'></a><p>Some text and HTML formatting</p><BR>
<A NAME='anchor2'></a><p>Some text and HTML formatting</p><BR>
....
<A NAME='anchor10'></a><p>Some text and HTML formatting</p><BR>

The above code returns the "text and formatting" portions nicely,
albeit only as text. Is there an easy way to do this using
HTML::parser to return the desired portion, with HTML markup included?
Many thanks.
 
A

A. Sinan Unur

Is it possible to use HTML::TokeParser to return the raw HTML between
two <A> tags, as opposed to just the text? My source file contains
several blocks of code--containing anchor links for each--that I'm
trying to extract by section while maintaining formatting.

My code:

my $p = HTML::TokeParser->new("file.txt" || die "Can't open file.");

Cute but counter-productive. Please post real code.
while (my $t = $p->get_tag("a")) {
my $name = $t->[1]{name};
next unless $name && ($name eq "anchor");
print "$name : " . $p->get_text("a");

Example HTML source:

<A NAME='anchor1'></a><p>Some text and HTML formatting</p><BR>

The above code returns the "text and formatting" portions nicely,
albeit only as text.

Once the bugs are fixed, the code above runs successfully and produces
no output at all. That is exactly what I expected to see based on the
sample data you provided. Problem solved.

Hvae you read the posting guidelines?

Sinan
 
M

Michael Wagg

A. Sinan Unur said:
Cute but counter-productive. Please post real code.

With the exception of the input filename (which was changed from
"digest.html"), this is the exact code being used.
while (my $t = $p->get_tag("a")) {
my $name = $t->[1]{name};
next unless $name && ($name eq "anchor");
print "$name : " . $p->get_text("a");

Example HTML source:

<A NAME='anchor1'></a><p>Some text and HTML formatting</p><BR>


Am I missing something here? There is no text between <a> and </a>
above.

The above code returns the text between one open tag and the next open
tag (<A> -> <A>), not between one open tag and the subsequent closing
tag (<A> -> </A>).
 
S

Sam Holden

With the exception of the input filename (which was changed from
"digest.html"), this is the exact code being used.

That's a really silly || with a constant true value on the left.

Why would you bother with code that can not be executed? Especially
when all it could possibly serve to do is to trick other people,
and perhaps yourself, into thinking there's error checking when
there isn't.
 
A

A. Sinan Unur

With the exception of the input filename (which was changed from
"digest.html"), this is the exact code being used.

my $p = HTML::TokeParser->new("file.txt")
or "Can't open file.";
while (my $t = $p->get_tag("a")) {
my $name = $t->[1]{name};
next unless $name && ($name eq "anchor");

Now I realize why it doesn't return anything: There are no anchors named
'anchor' in the data you provided.

Sorry, I don't have time to look at the rest of the stuff right now.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,190
Latest member
Martindap

Latest Threads

Top