I am new to the perl thing and i am trying to extract some date from
some web pages and am having problems.... can someone please tell me
what i am doing wrong... i think i have become a charter member of the
"idiots 'r' us" club...

)!
Nope; Perl is IMO harder to learn than some other languages. You're not
helping yourself enough though. I'll get to your problem in a moment,
but first some things you should do (a) to help you find your problems
before posting here, and (b) to get better and quicker help here.
1. Always code "use strict;" and "use warnings"; had you done so you
might have picked up the logic problem in your code, but it will
certainly ensure that you pick up many others.
2. Code not only a test program (well done for doing that) but also
some suitable data. I had to make some in order to do the testing.
3. Learn to use the Perl debugger (perl -d yourprog.pl) and to use the
breakpoint and examine commands. Doing that I found your problem in
one pass through the program.
this is my script... pretty simple so far, i am just trying to get one
piece of info working to start. i can traverse the directory and print
the filenames, but it only seems to get the data and do the pattern
matching from the first file in the directory....
What you mean is that once it has found a file with a match it then
finds that match in all subsequent files even if they themselves don't
have it. I recommend you try to be very precise about your problem.
Actually, showing your incorrect output is very precise and saves extra
thought on your part!
#!/usr/bin/perl
$dir="/Users/test/";
If you code "use strict" you'll need to put "my $dir", and the same
elsewhere in the file.
opendir(DIRECTORY, $dir) || die("Cannot open directory");
@thefiles= readdir(DIRECTORY);
This is OK as far as it goes but assumes you have enough memory to read
in the whole directory. Better practice is to read the directory line
by line, as you've (partly) done with the file.
closedir(DIRECTORY);
foreach $file (@thefiles) {
unless ( ($file eq ".") || ($file eq "..") || ($file eq ".DS_Store")
) {
A regex could do this (untested)
unless ( $file =~ /^\.{1,2}$|^\.DS_Store$/ ) {
.... but if you could just reject all "dot" files it would be even easier
unless ( $file =~ /^\./ )
open FILE, "$dir/$file" or die "Can't open $file : $!";
Well done for checking the file open result. Lots of beginners don't.
while( <FILE> ) {
s/\t//; # ignore tabs by erasing them
next if /^(\s)*$/; # skip blank lines
chomp; # remove trailing newline characters
push @lines, $_; # push the data line onto the array
Again, you're assuming that you always have enough memory for the whole
file.
Your problem is here. Because you didn't code "use strict" you aren't
forcing yourself to take control of the scope of your variables. Perl
has allocated "@lines" once for the whole program; when you process the
next file in the directory you push the lines on the bottom; the match
for the HTML title then fires every time. If you'd coded "my @lines"
just before the "while (<FILE)" line then you'd have got a new "@lines"
each time and your program would have worked as you wanted it to.
}
close FILE;
$string = "@lines";
This is ugly, and produces a slap on the wrist from Perl when you code
"use strict; use warnings". Not that it doesn't give you what you want,
though ... it's up to you as to whether you want to write with good style.
When "strict" forces you to code "my $n" then you'll have to put it
outside the directory-read loop.
print "$n:$file:";
$string =~ /<span class=searchtitle><B> (.*?)<\/B><\/span><BR>/is;
print "$1\n"; # print html page title
Always check the extracted text. When I fixed your program so it only
examined the text of the current file I got errors from this statement
every time it failed to find a match.
Here's a minimally-fixed version of your program which "works", in the
sense that it finds the HTML titles. It still needs quite a lot of
cleaning up and more Perlish idiom.
#!/usr/bin/perl
# Jim Goodman's problem April 9
use strict; use warnings; # I added this
#$dir="/Users/test/";
my $dir="F:/scratch"; # My directory instead of his
opendir(DIRECTORY, $dir) || die("Cannot open directory");
my @thefiles= readdir(DIRECTORY);
closedir(DIRECTORY);
my $n;
foreach my $file (@thefiles) {
unless ( ($file eq ".") || ($file eq "..") || ($file eq ".DS_Store")
) {
open FILE, "$dir/$file" or die "Can't open $file : $!";
my @lines = ();
while( <FILE> ) {
s/\t//; # ignore tabs by erasing them
next if /^(\s)*$/; # skip blank lines
chomp; # remove trailing newline characters
push @lines, $_; # push the data line onto the array
}
close FILE;
my $string = "@lines";
$n++;
print "$n:$file:";
$string =~ /<span class=searchtitle><B> (.*?)<\/B><\/span><BR>/is;
print "$1\n" if $1; # print html page title
}
}
But I think I'd feel inclined use "grep" to find the files that had the
relevant string in them, and pipe the output into a much smaller Perl
program to find the HTML titles and print them out. You'd lose the
incrementing count of the files, though.