Deleting element tags

R

Rafal Konopka

Hi,

I need to delete element tags from many HTML files. The elements in
question are 'b' and 'strong' but only if they have a child elelment 'a'

I'm using the TreeBuilder module and HTML::Element methods. The code
correctly identifies those elements that I need. For debugging
purposes, I create a hash associating the file name with all the
elements that match my condition. Now the big question is, how do I
remove the tags? I looked in several modules, but I couldn't find a
method like (see below) $bx->starttag->delete()/$bx->endtag->delete()

And a secondary question is how can I output newlines after some element
tags? if I want to prettify the HTML output?

Here's my solution so far:

#!perl -w

use HTML::TreeBuilder;
chomp(my @filelist = `DIR *.htm /s /b`); #it's run on Windows XP
my %main_hash = ();

foreach my $f (@filelist) {

my $tree = HTML::TreeBuilder->new();
$tree->parse_file($f);
my @bs = $tree->find_by_tag_name('strong','b');

foreach my $bx (@bs) {

if ( $bx->find_by_tag_name('a') ) {
push(@{$main_hash{$f}},$bx->as_HTML);
}
}
print $tree->as_HTML(''," "), "\n";
$tree->delete;
}

foreach my $f (keys %main_hash) {
print "File $f\n";
print join("",@{$main_hash{$f}}), "\n";
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top