Novice - help with pattern matching needed

R

Robert Day

Hi

I am using a very basic Perl script to parse a file and extract just
the elements I need but one aspect is causing me trouble and I am sure
the answer is probably quite simple. Below are examples of two of the
lines (watch wrapping) - the value I seek is that between the date on
the left and the "UV Port" on the right.

Enter bookmobile session location code (or NONE) : NONE 06 FEB 2004
March Mobile A UV Port 51
Circulation
06 FEB 2004 Papworth Library
UV Port 50

The section of code dealing with this is currently

if(/UV/) {
$library = $`;
$library =~ s/^\s+\d{2}\s\w{3}\s\d{4}\s+//;
$library =~ s/- CAMBOOK//g;
$library =~ s/(\w+)/\u\L$1/g;
print "$library\n";
}

The 2nd and 3rd pattern matches deal with other lines in the data (not
shown) in which the value I seek is all CAPS or has "- CAMBOOK"
appended. This code works fine on line 2 of the sample data given
above but I don't know how to get rid of "Enter bookmobile session
location code (or NONE) : NONE" when it appears (as it does on a few
entries). i have tried various patterns and I am sure the solution is
simple but it eludes me at present. Can anyone help?

Robert
 
G

gnari

Robert Day said:
... the value I seek is that between the date on
the left and the "UV Port" on the right.

Enter bookmobile session location code (or NONE) : NONE 06 FEB 2004
March Mobile A UV Port 51
Circulation
06 FEB 2004 Papworth Library
UV Port 50

The section of code dealing with this is currently

if(/UV/) {
$library = $`;
$library =~ s/^\s+\d{2}\s\w{3}\s\d{4}\s+//;

are you sure about the '^' here?
$library =~ s/- CAMBOOK//g;
$library =~ s/(\w+)/\u\L$1/g;
print "$library\n";
}

I just would do somethng like:
if ( ($library)=/\d\d \w\w\w \d{4} (.*?)(- CAMBOOK)? UV/ ) {
print "$library\n";
}

gnari
 
G

Gunnar Hjalmarsson

Robert said:
I am using a very basic Perl script to parse a file and extract
just the elements I need ...

I don't know how to get rid of "Enter bookmobile session location
code (or NONE) : NONE" when it appears (as it does on a few
entries). i have tried various patterns and I am sure the solution
is simple but it eludes me at present. Can anyone help?

As regards the approach I have to ask: If you want to extract
something, why do you not write code that does just that rather than
deleting everything that you do not want to keep?

$library =~ s/^\s+\d{2}\s\w{3}\s\d{4}\s+//;
----------------------^
What's your considerations behind beginning the pattern with the ^
metacharacter?

perldoc perlvar points out that the $` variable "anywhere in a program
imposes a considerable performance penalty on all regular expression
matches". There appears not to be any reason to use it here.
$library = $`;
$library =~ s/^\s+\d{2}\s\w{3}\s\d{4}\s+//;
$library =~ s/- CAMBOOK//g;

You may want to replace those three lines with:

my ($library) = /\d{2} \w{3} \d{4}\s+(.+?)(?:- CAMBOOK)?\s+UV/;
 
R

Robert

Gunnar Hjalmarsson said:
As regards the approach I have to ask: If you want to extract
something, why do you not write code that does just that rather than
deleting everything that you do not want to keep?

It seemed simpler because there is consistency in the stuff to remove but
the value I want to keep could be one of 70 different values, with a variety
of different formats.
$library =~ s/^\s+\d{2}\s\w{3}\s\d{4}\s+//;
----------------------^
What's your considerations behind beginning the pattern with the ^
metacharacter?

This is a leftover from the way the code worked before the introduction of
entries with the "Enter bookmobile....." line. At that time the dates were
always the leftmost item so always matched the ^ metacharacter.
You may want to replace those three lines with:

my ($library) = /\d{2} \w{3} \d{4}\s+(.+?)(?:- CAMBOOK)?\s+UV/;

Thanks. I'll give it a go (and then try to understand exactly what it is
doing!)
Robert
 
G

Gunnar Hjalmarsson

Robert said:
It seemed simpler because there is consistency in the stuff to
remove but the value I want to keep could be one of 70 different
values, with a variety of different formats.

Okay. As you can see from both my and gnari's examples, that should
not prevent you from capturing rather than removing stuff.
Thanks. I'll give it a go (and then try to understand exactly what
it is doing!)

It can also be written:

my $library;
if ( /\d{2} \w{3} \d{4}\s+(.+?)(?:- CAMBOOK)?\s+UV/ ) {
$library = $1;
}

Please study perldoc perlre about capturing, the meaning of the $1
variable, etc.
 
R

R Day

Gunnar Hjalmarsson said:
It can also be written:

my $library;
if ( /\d{2} \w{3} \d{4}\s+(.+?)(?:- CAMBOOK)?\s+UV/ ) {
$library = $1;
}

Thanks. This works as required.
Please study perldoc perlre about capturing, the meaning of the $1
variable, etc.

I will do.

Robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top