R
Rafal Konopka
Hi,
I need to delete element tags from many HTML files. The elements in
question are 'b' and 'strong' but only if they have a child elelment 'a'
I'm using the TreeBuilder module and HTML::Element methods. The code
correctly identifies those elements that I need. For debugging
purposes, I create a hash associating the file name with all the
elements that match my condition. Now the big question is, how do I
remove the tags? I looked in several modules, but I couldn't find a
method like (see below) $bx->starttag->delete()/$bx->endtag->delete()
And a secondary question is how can I output newlines after some element
tags? if I want to prettify the HTML output?
Here's my solution so far:
#!perl -w
use HTML::TreeBuilder;
chomp(my @filelist = `DIR *.htm /s /b`); #it's run on Windows XP
my %main_hash = ();
foreach my $f (@filelist) {
my $tree = HTML::TreeBuilder->new();
$tree->parse_file($f);
my @bs = $tree->find_by_tag_name('strong','b');
foreach my $bx (@bs) {
if ( $bx->find_by_tag_name('a') ) {
push(@{$main_hash{$f}},$bx->as_HTML);
}
}
print $tree->as_HTML(''," "), "\n";
$tree->delete;
}
foreach my $f (keys %main_hash) {
print "File $f\n";
print join("",@{$main_hash{$f}}), "\n";
}
I need to delete element tags from many HTML files. The elements in
question are 'b' and 'strong' but only if they have a child elelment 'a'
I'm using the TreeBuilder module and HTML::Element methods. The code
correctly identifies those elements that I need. For debugging
purposes, I create a hash associating the file name with all the
elements that match my condition. Now the big question is, how do I
remove the tags? I looked in several modules, but I couldn't find a
method like (see below) $bx->starttag->delete()/$bx->endtag->delete()
And a secondary question is how can I output newlines after some element
tags? if I want to prettify the HTML output?
Here's my solution so far:
#!perl -w
use HTML::TreeBuilder;
chomp(my @filelist = `DIR *.htm /s /b`); #it's run on Windows XP
my %main_hash = ();
foreach my $f (@filelist) {
my $tree = HTML::TreeBuilder->new();
$tree->parse_file($f);
my @bs = $tree->find_by_tag_name('strong','b');
foreach my $bx (@bs) {
if ( $bx->find_by_tag_name('a') ) {
push(@{$main_hash{$f}},$bx->as_HTML);
}
}
print $tree->as_HTML(''," "), "\n";
$tree->delete;
}
foreach my $f (keys %main_hash) {
print "File $f\n";
print join("",@{$main_hash{$f}}), "\n";
}