[TABLE NOT SHOWN] problem with HTML::Parse

M

Mitchua

When I run the well quoted line:
my $ascii =
HTML::FormatText->new->format(HTML::parse::parse_html($html));
to remove HTML tags from an html document, it replaces all tables with
"[TABLE NOT SHOWN]". Is there a quick and easy way to get the table content
parsed too?

Thanks a lot,
Mitchua
 
J

James E Keenan

Mitchua said:
When I run the well quoted line:
my $ascii =
HTML::FormatText->new->format(HTML::parse::parse_html($html));
to remove HTML tags from an html document, it replaces all tables with
"[TABLE NOT SHOWN]". Is there a quick and easy way to get the table content
parsed too?
The documentation for HTML::FormatText states: "Formatting of HTML tables
and forms is not implemented." So not with that module. The documentation
makes a reference to HTML::Formatter
(http://search.cpan.org/author/SBURKE/HTML-Format-2.03/lib/HTML/Formatter.pm
), which in turn contains references to other modules that may be of some
help.
 
J

James E Keenan

Mitchua said:
Are there any other (easy) ways to remove all html tags (including tricky
tags like comments, etc.) from a web page without using those modules? I'm
looking for a solution beyond a regular expression.
"Easy": no. That's why we have all those modules in the HTML section of
CPAN -- the solution is always difficult, messy and "beyond a regular
expression."

I note that in your OP you used HTML::parse. The 1-line description of this
indicates that it is deprecated. Have you looked into HTML::parser? People
speak highly of that module.
 
M

Mitchua

James E Keenan said:
"Easy": no. That's why we have all those modules in the HTML section of
CPAN -- the solution is always difficult, messy and "beyond a regular
expression."

I note that in your OP you used HTML::parse. The 1-line description of this
indicates that it is deprecated. Have you looked into HTML::parser? People
speak highly of that module.

I found this code on the web that uses it:

use HTML::parser;
$p = HTML::parser->new;
$p->parse($notes); # parse the HTML in notes
$p->eof; # signal end of parse file
print $p->as_string; # print out the parsed text

but i get the error "Can't locate ../HTML/Parser/as_string.al". I'm looking
for that file now.

Jonathan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top