S
Spartanicus
I'm wondering what it would take to employ Perl to strip selected tags,
attributes, and optionally the content between those tags from html
files for a scripting and/or programming illiterate such as me.
Initially I investigated learning regexes since they are supported by my
editor (Homesite). But after skimming through Jeffrey E.F. Friedl's
"Mastering Regular Expressions" I'm inclined to think that regexes are
actually poorly suited for this task. Afaics ideal would be if the
content could be manipulated via a parser that understands html & sgml.
I didn't manage to find such a tool with a "dummy/GUI" interface. But I
am intrigued by the result of a search that suggested to me that Perl
has such a html & sgml parser and a module that seems designed
specifically for this job:
http://search.cpan.org/~ncleaton/HTML-StripScripts-0.03/StripScripts.pm
I'm looking for estimates on how much work it would be to learn how to
use Perl for this specific purpose.
attributes, and optionally the content between those tags from html
files for a scripting and/or programming illiterate such as me.
Initially I investigated learning regexes since they are supported by my
editor (Homesite). But after skimming through Jeffrey E.F. Friedl's
"Mastering Regular Expressions" I'm inclined to think that regexes are
actually poorly suited for this task. Afaics ideal would be if the
content could be manipulated via a parser that understands html & sgml.
I didn't manage to find such a tool with a "dummy/GUI" interface. But I
am intrigued by the result of a search that suggested to me that Perl
has such a html & sgml parser and a module that seems designed
specifically for this job:
http://search.cpan.org/~ncleaton/HTML-StripScripts-0.03/StripScripts.pm
I'm looking for estimates on how much work it would be to learn how to
use Perl for this specific purpose.