How to use HTML::Parser to remove HTML tags and print result

M

Mitchua

I am trying to use HTML::parser to parse an HTML file, remove all HTML tags
(including comments, etc.), replace all ENTITIES (e.g. &amp), and put the
result into a variable as a string. I figure HTML::parser itself can
somehow preform the filtering, but how do I get it back as a string? I'd
appreciate some sample code if anyone has any. Sorry if this is a real n00b
question.

Thanks a lot,
Mitchua
 
I

Ice Demon

Mitchua said:
I am trying to use HTML::parser to parse an HTML file, remove all HTML tags
(including comments, etc.), replace all ENTITIES (e.g. &amp), and put the
result into a variable as a string. I figure HTML::parser itself can
somehow preform the filtering, but how do I get it back as a string? I'd
appreciate some sample code if anyone has any. Sorry if this is a real n00b
question.

Thanks a lot,
Mitchua

Try this for a sample of parsing a webpage
http://www.wdvl.com/Authoring/Languages/Perl/PerlfortheWeb/summarizer.html
If you are just trying to remove all the html tags, you could just do this
$webpage =~ s/<.*?>//g;

Ice Demon
http://adult-xxx-newsgroups.com
http://adult-cybergames.com
http://adult-spider.com
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,734
Messages
2,569,441
Members
44,832
Latest member
GlennSmall

Latest Threads

Top