Printing only a portion of a matched regex -- newbie quesiton

DIAMOND Mark R. · Aug 9, 2004

My apologies to begin with. I am a relatively new, and infrequent user of
perl.

I have a series of html files with contact information for doctors. The
files have enormous amounts of other stuff in them including script, image
links and so on.
But the names all appear between a particular tag and a tag,
with the words like "level7Name" or "level2Contact" (the quotes are in the
tag) marking the particlar spans.
Line breaks don't seem to follow any particular pattern. The two structures
 .... nametoprint and the equivalent for the
contact address are quite distinct without any strange embedding of the two.

What I'd like to do is print out the names, and the contact information, but
I've obviously gone wrong somewhere. I couldn't work out whether I should or
should not have a global at the end of the s///, but in either case, I still
have a problem. Any help would be very much appreciated.

$/ = ".\n";
$doctorlistfile = "c:\\tmp\\doctors.tmp";
open(DOCTORLISTFILE, "> $doctorlistfile" ) || die "Can't open
$doctorlistfile \n";
while(<>) {
s/([^<]*)<\/b>/ $1 /;
print DOCTORLISTFILE $1;
s/([^<]*)<\/b>/ $1 /;
print DOCTORLISTFILE $1;
}

DIAMOND Mark R. · Aug 9, 2004

I should have added that I have searched the NG on Google groups, but part
of the problem is that I'm not quite sure what I should be searching for
"print only match OR matching" pointed me to solutions which printed only
*lines* with an appropriate match.

mark

DIAMOND Mark R. · Aug 9, 2004

Thanks, Brian. You are quite right. I just want to match, not change. And I
do want those newlines.. But it only prints the first instance of a name. I
have made two slight changes . The first so that the print is conditional,
the second because I realised that the tag that marks the end of the name or
contact is not always the same, so I have checked for the beginning of the
tag only in the following.

$/ = ".\n";
$doctorlistfile = "c:\\tmp\\doctors.tmp";
open(DOCTORLISTFILE, "> $doctorlistfile" ) || die "Can't open
$doctorlistfile \n";
while(<>) {
print DOCTORLISTFILE "$1\n" if m/([^<]*)</;
print DOCTORLISTFILE "$1\n" if m/([^<]*)</;
}

but as I say, only a single name (the first correct match) is extracted from
the file.

Another question to which I am unsure of the answer is whether the second
appearance of $1 is correct, or whether the indices of the $ increase
throughout the loop rather than just within each regex; i.e. is the first
match in the second regex actually called $2 ?

Cheers.

Joe Smith · Aug 9, 2004

DIAMOND said:
$/ = ".\n";
while(<>) {

If your file does not have any lines that end with a period, then
the entire file will be read in by <>, and the code inside the while{}
block will be executed only once. Try
print "$. = '$_'\n";
as a debugging aid.

print DOCTORLISTFILE "$1\n" if m/([^<]*)</;
print DOCTORLISTFILE "$1\n" if m/([^<]*)</;

Another question to which I am unsure of the answer is whether the second
appearance of $1 is correct

In each regex, $1 corresponds to the first set of capturing parentheses in
that regex. The presence of any other regex in the file does not change this.
-Joe

gnari · Aug 9, 2004

DIAMOND Mark R. said:
Thanks, Brian. You are quite right. I just want to match, not change. And I
do want those newlines.. But it only prints the first instance of a name. I
have made two slight changes . The first so that the print is conditional,
the second because I realised that the tag that marks the end of the name or
contact is not always the same, so I have checked for the beginning of the
tag only in the following.

$/ = ".\n";

this looks a bit tentative in light of your first post.
skip it

$doctorlistfile = "c:\\tmp\\doctors.tmp";
open(DOCTORLISTFILE, "> $doctorlistfile" ) || die "Can't open
$doctorlistfile \n";
while(<>) {
print DOCTORLISTFILE "$1\n" if m/([^<]*)</;

you were almost there.
change the if to a while and add a /g:
print DOCTORLISTFILE "$1\n"

while m/ said:
but as I say, only a single name (the first correct match) is extracted from
the file.

consistent with your $/ , probably

Another question to which I am unsure of the answer is whether the second
appearance of $1 is correct, or whether the indices of the $ increase
throughout the loop rather than just within each regex; i.e. is the first
match in the second regex actually called $2 ?

each regex resets the $n variables

gnari

DIAMOND Mark R. · Aug 10, 2004

Many thanks to all. I have solved my problem and learned quite a bit.

How can I get my menu inside of a menu to function properly?	1	Jan 19, 2023
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
stuck in regex	9	Jul 23, 2011
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
Regex to match a numerical IP range	7	Dec 11, 2010
counting the number of characters that were matched in a regularexpression	2	Apr 16, 2008
I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
Printing a drop down menu for a specific field.	4	Oct 21, 2013

Printing only a portion of a matched regex -- newbie quesiton

DIAMOND Mark R.

DIAMOND Mark R.

DIAMOND Mark R.

Joe Smith

gnari

DIAMOND Mark R.

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads