Feasibility of using Perl to strip selected html tags and/or attributes (scripting/programming novic

S

Spartanicus

I'm wondering what it would take to employ Perl to strip selected tags,
attributes, and optionally the content between those tags from html
files for a scripting and/or programming illiterate such as me.

Initially I investigated learning regexes since they are supported by my
editor (Homesite). But after skimming through Jeffrey E.F. Friedl's
"Mastering Regular Expressions" I'm inclined to think that regexes are
actually poorly suited for this task. Afaics ideal would be if the
content could be manipulated via a parser that understands html & sgml.

I didn't manage to find such a tool with a "dummy/GUI" interface. But I
am intrigued by the result of a search that suggested to me that Perl
has such a html & sgml parser and a module that seems designed
specifically for this job:
http://search.cpan.org/~ncleaton/HTML-StripScripts-0.03/StripScripts.pm

I'm looking for estimates on how much work it would be to learn how to
use Perl for this specific purpose.
 
T

Tad McClellan

Spartanicus said:
I'm wondering what it would take to employ Perl to strip selected tags,
attributes, and optionally the content between those tags from html


perldoc -q html

How do I remove HTML from a string?
 
J

Joe Smith

Spartanicus said:
I'm looking for estimates on how much work it would be to learn how to
use Perl for this specific purpose.

If you want to learn Perl, you should do so for proper edification.
Approaching the topic in as "what is the least amount of learning
do I need for one specific problem" is contraproductive.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top