K
Kevin B
I have the following short script that I'm using to clean up the source of a
web page in order to index and search the page:
#!/usr/bin/perl
#striphtml.pl
undef $/;
open FD, "< testfile1.txt" or die $!;
while (<FD>) {
#s/\r\n//gs;
#s/^\s+$//;
s/<.*?>//gs;
trim();
print "$_";
}
sub trim {
my @out = @_ ? @_ : $_;
$_ = join(' ', split(' ')) for @out;
return wantarray ? @out : "@out";
}
the problem is that it leaves blank lines in the output and the use of chomp
does not clean up. What am I missing to clean up the lines?
Kevin
web page in order to index and search the page:
#!/usr/bin/perl
#striphtml.pl
undef $/;
open FD, "< testfile1.txt" or die $!;
while (<FD>) {
#s/\r\n//gs;
#s/^\s+$//;
s/<.*?>//gs;
trim();
print "$_";
}
sub trim {
my @out = @_ ? @_ : $_;
$_ = join(' ', split(' ')) for @out;
return wantarray ? @out : "@out";
}
the problem is that it leaves blank lines in the output and the use of chomp
does not clean up. What am I missing to clean up the lines?
Kevin