Reading text file

K

Kevin B

I have the following short script that I'm using to clean up the source of a
web page in order to index and search the page:

#!/usr/bin/perl
#striphtml.pl

undef $/;
open FD, "< testfile1.txt" or die $!;

while (<FD>) {
#s/\r\n//gs;

#s/^\s+$//;
s/<.*?>//gs;
trim();
print "$_";
}

sub trim {

my @out = @_ ? @_ : $_;
$_ = join(' ', split(' ')) for @out;
return wantarray ? @out : "@out";
}


the problem is that it leaves blank lines in the output and the use of chomp
does not clean up. What am I missing to clean up the lines?

Kevin
 
R

Roy Johnson

This newsgroup is defunct. You will reach more people if you post in
comp.lang.perl.misc instead.

Kevin B said:
undef $/;

Ok, you're slurping the whole file in at once...
open FD, "< testfile1.txt" or die $!;

while (<FD>) {

No real point in a while, if you're getting the whole file in one
read. Just do
$_ = said:
s/<.*?>//gs;

strip out all the tags...
print "$_";

No need for the quotes. In this case, no need for an argument at all.
Just
print;
the problem is that it leaves blank lines in the output and the use of chomp
does not clean up. What am I missing to clean up the lines?

Maybe something like
tr/\n//s;
or
s/\n\s*\n/\n/g;
?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top