Feasibility of using Perl to strip selected html tags and/or attributes (scripting/programming novic

Spartanicus · May 12, 2005

I'm wondering what it would take to employ Perl to strip selected tags,
attributes, and optionally the content between those tags from html
files for a scripting and/or programming illiterate such as me.

Initially I investigated learning regexes since they are supported by my
editor (Homesite). But after skimming through Jeffrey E.F. Friedl's
"Mastering Regular Expressions" I'm inclined to think that regexes are
actually poorly suited for this task. Afaics ideal would be if the
content could be manipulated via a parser that understands html & sgml.

I didn't manage to find such a tool with a "dummy/GUI" interface. But I
am intrigued by the result of a search that suggested to me that Perl
has such a html & sgml parser and a module that seems designed
specifically for this job:
http://search.cpan.org/~ncleaton/HTML-StripScripts-0.03/StripScripts.pm

I'm looking for estimates on how much work it would be to learn how to
use Perl for this specific purpose.

Tad McClellan · May 12, 2005

Spartanicus said:
I'm wondering what it would take to employ Perl to strip selected tags,
attributes, and optionally the content between those tags from html

perldoc -q html

How do I remove HTML from a string?

Joe Smith · May 13, 2005

Spartanicus said:
I'm looking for estimates on how much work it would be to learn how to
use Perl for this specific purpose.

If you want to learn Perl, you should do so for proper edification.
Approaching the topic in as "what is the least amount of learning
do I need for one specific problem" is contraproductive.

Perl class to remove HTML tags from a page using a list of CSSselectors?	0	Mar 23, 2009
Stripping HTML attributes and tags	5	Nov 27, 2005

Feasibility of using Perl to strip selected html tags and/or attributes (scripting/programming novic

Spartanicus

Tad McClellan

Joe Smith

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads