Geoff said:
On Sun, 07 Dec 2003 18:02:07 GMT, Geoff Cox
I should have made things a bit clearer - so here is the whole code
and a sample of html which it is to work on .. can any one see why it
doesn't get the name and address info?!
Cheers
Geoff
My code is as follows but it does not work!
-------------------------------^^^^^^^^^^^^^
A much more specific description of what your code does/doesn't do it
called for in a newsgroup posting. Please state exactly what it does
that it shouldn't do, or what it doesn't do that it should do. "Doesn't
work" is next to meaningless -- we can't read your mind.
use warnings;
print ("name of html file?\n");
my $namehtml = <STDIN>;
print ("name of email list file?\n");
my $newhtml = <STDIN>;
open(IN, "$namehtml");
open(OUT, ">>$newhtml");
my $line = <IN>;
Since you didn't modify $/, this will read only one line. I think
that's your fundamental problem. Try:
my $line;
{local $/;$line=<IN>} #slurp the input
and see if that works better.
while (defined($line=<IN>)) {
Here you are reading the rest of the lines of filehandle IN, but one at
a time. You will have skipped the first line (which was read above).
If you slurp the input, you should get rid of the while loop.
# if ($line =~ / (.*?)<\/H6>/i) {
# print OUT ("$1 \n");
# }
if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
.+?
Address.+?<TD[^>]+>([^<]+)
/isx ) {
print OUT ("Name: $1\nAddress: $2\n");
}
}
close (IN);
close (OUT);
-----------------------------
which is working on for example
<TD align=left width="20%" colSpan=2><B>Head Teacher</B></TD>
<TD vAlign=top width="80%" colSpan=2>Fred Green</TD></TR>
<TR>
<TD align=left width="20%" colSpan=2><B>Address</B></TD>
<TD vAlign=top width="80%" colSpan=2>Park Road, Northgate,
London N88 5XX</TD></TR> ...
Geoff
Yes: you read the first line of your file, and throw it away. That was
the line with Teacher etc in it. But even if you didn't do that, the
remainder of the lines are read one at a time, and no one line contains
enough stuff to match your pattern. Slurp it all, and your pattern
might match. Here is a slightly modified standalone copy/paste/execute
style copy of your program that looks like it might "work":
use strict;
use warnings;
#print ("name of html file?\n");
#my $namehtml = <STDIN>;
#print ("name of email list file?\n");
#my $newhtml = <STDIN>;
#open(IN, "$namehtml");
#open(OUT, ">>$newhtml");
my $line;
{local $/;$line = <DATA>} #slurp the file
#while (defined($line=<DATA>)) {
# if ($line =~ / (.*?)<\/H6>/i) {
# print OUT ("$1 \n");
# }
if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
.+?
Address.+?<TD[^>]+>([^<]+)
/isx ) {
print ("Name: $1\nAddress: $2\n");
}
#}
#close (IN);
#close (OUT);
__END__
<TD align=left width="20%" colSpan=2><B>Head Teacher</B></TD>
<TD vAlign=top width="80%" colSpan=2>Fred Green</TD></TR>
<TR>
<TD align=left width="20%" colSpan=2><B>Address</B></TD>
<TD vAlign=top width="80%" colSpan=2>Park Road, Northgate,
London N88 5XX</TD></TR>
HTH.