help with some capturing syntax

  • Thread starter Matt Williamson
  • Start date
M

Matt Williamson

Given the following, is there an easy way to preface the print $status,
"\n"; line with job started, job ended or job completion status? I've been
reading about about capturing in the blue camel, but I can't figure out if
or how to make it work.

foreach my $line (@content){
if ($line =~ /(?:job started|job ended|job completion status)/i) {
$line =~ /:(.*)$/;
my $status = $1;
chomp $status;
for ($status) {
s/^\s+//;
s/\s+$//;
}
print |insert the status that matched above| $status, "\n";
}
}

TIA

Matt
 
M

Matt Williamson

I figured it out. If there is a more efficient way to code it though, I'm
open.

foreach my $line (@content){
if ($line =~ /((job started)|(job ended)|(job completion status))/i) {
my $label = $1;
$line =~ /:(.*)$/;
my $status = $1;
chomp $status;
for ($status) {
s/^\s+//;
s/\s+$//;
}
print $label, " : ",$status, "\n";
}
 
P

Paul Lalli

Matt said:
Given the following, is there an easy way to preface the print $status,
"\n"; line with job started, job ended or job completion status? I've been
reading about about capturing in the blue camel, but I can't figure out if
or how to make it work.

foreach my $line (@content){
if ($line =~ /(?:job started|job ended|job completion status)/i) {

Adding the ?: above specifically makes this *not* capture. If you
wanted to capture them, why are you specifically telling perl *not* to
capture them? Capture it, and then assign a permanent variable to $1,
so you can later print it out:

if ($line =~ /(job started|job ended|job completion status)/i) {
my $job_type = $1;

$line =~ /:(.*)$/;
my $status = $1;
chomp $status;

More succinctly written: chomp (my ($status) = $line =~ /:(.*)$/);
for ($status) {
s/^\s+//;
s/\s+$//;
}

what is the point of a loop that goes iterates only once? Are you just
trying to avoid writing "$status" twice instead of once? Does that
really make sense to you?

$status =~ s/^\s+//;
$status =~ s/\s+$//;

Of course, you could have equally well just not captured the whitespace
in your original match.
print |insert the status that matched above| $status, "\n";
}
}

If I were to write this whole code, to do what I *think* you're trying
to accomplish, it would look something like:

foreach my $line (@content){
if ($line =~ /(job (?:started|ended|completion
status)):\s*(.*?)\s*$/i) {
my ($job_type, $status) = ($1, $2);
print "$job_type: $status\n";
}
}

Of course, without any sample input or output to go by, I'm only
guessing.

Paul Lalli
 
P

Paul Lalli

Matt said:
I figured it out. If there is a more efficient way to code it though, I'm
open.

foreach my $line (@content){
if ($line =~ /((job started)|(job ended)|(job completion status))/i) {

What do you think those three inner parentheses are doing?

(see my previous post for a critique of the rest of the code)

Paul Lalli
 
M

Matt Williamson

what is the point of a loop that goes iterates only once? Are you just
trying to avoid writing "$status" twice instead of once? Does that
really make sense to you?

$status =~ s/^\s+//;
$status =~ s/\s+$//;

It's in perlfaq 4 that way. I'm quite new to this, so I can't really say
what does or doesn't make sense. <g>


If I were to write this whole code, to do what I *think* you're trying
to accomplish, it would look something like:

foreach my $line (@content){
if ($line =~ /(job (?:started|ended|completion
status)):\s*(.*?)\s*$/i) {
my ($job_type, $status) = ($1, $2);
print "$job_type: $status\n";
}
}

Of course, without any sample input or output to go by, I'm only
guessing.

This is much cleaner code and you've taught me a good bit by posting it.
Thanks!
 
M

Matt Williamson

Where? The only thing I see that deals with s/\s+// is at
http://perldoc.perl.org/perlfaq4.html#How-do-I-strip-blank-space-from-the-beginning/end-of-a-string?
which gives you those two s///'s, but certainly doesn't recommend
creating a one-iteartion loop...

This is what it says in my version of Perlfaq 4

___

How do I strip blank space from the beginning/end of a string?
Although the simplest approach would seem to be

$string =~ s/^\s*(.*?)\s*$/$1/;

not only is this unnecessarily slow and destructive, it also fails with
embedded newlines. It is much faster to do this operation in two steps:

$string =~ s/^\s+//;
$string =~ s/\s+$//;

Or more nicely written as:

for ($string) {
s/^\s+//;
s/\s+$//;
}

This idiom takes advantage of the "foreach" loop's aliasing behavior to
factor out common code. You can do this on several strings at once, or
arrays, or even the values of a hash if you use a slice:

# trim whitespace in the scalar, the array,
# and all the values in the hash
foreach ($scalar, @array, @hash{keys %hash}) {
s/^\s+//;
s/\s+$//;
}

___

Since it said "Or more nicely written as" I thought that must be the correct
way to do it. It now seems that it's only better if you have multiple
strings to use it on.

Matt
 
P

Paul Lalli

Matt said:
This is what it says in my version of Perlfaq 4

___

How do I strip blank space from the beginning/end of a string?
Although the simplest approach would seem to be

$string =~ s/^\s*(.*?)\s*$/$1/;

not only is this unnecessarily slow and destructive, it also fails with
embedded newlines. It is much faster to do this operation in two steps:

$string =~ s/^\s+//;
$string =~ s/\s+$//;

Or more nicely written as:

for ($string) {
s/^\s+//;
s/\s+$//;
}

Yeesh. I certainly am glad that FAQ's been updated. :)
This idiom takes advantage of the "foreach" loop's aliasing behavior to
factor out common code. You can do this on several strings at once, or
arrays, or even the values of a hash if you use a slice:

# trim whitespace in the scalar, the array,
# and all the values in the hash
foreach ($scalar, @array, @hash{keys %hash}) {
s/^\s+//;
s/\s+$//;
}

___

Since it said "Or more nicely written as" I thought that must be the correct
way to do it.

One of the things you should learn about Perl is that there's no such
thing as "the" correct way to do anything. Indeed, one of Perl's
mottos is "There Is More Than One Way To Do It". Obviously, someone
thought (and probably still thinks) that aliasing the variable to $_ by
means of a one-iteration foreach was a good way of doing it. I
disagree, as it seems to be needlessly misleading. That doesn't mean
that either my way or that old FAQ's way are "wrong".
It now seems that it's only better if you have multiple strings to use it on.

That is my opinion, yes. There is, of course, something to be said for
extensability. With the FAQ's way, your code is all set to have more
variables added to it, just by typing them into the foreach's list.
"My" way, you'd have to copy and paste code. You need to decide which
is the worst trade-off: readability, or extensability.

Paul Lalli
 
J

John W. Krahn

Matt said:
Given the following, is there an easy way to preface the print $status,
"\n"; line with job started, job ended or job completion status? I've been
reading about about capturing in the blue camel, but I can't figure out if
or how to make it work.

foreach my $line (@content){
if ($line =~ /(?:job started|job ended|job completion status)/i) {

You are using non-capturing parentheses. If you want to capture the job
status you have to use capturing parentheses:

if ($line =~ /(job (?:started|ended|completion status))/i) {
my $job_status = $1;
$line =~ /:(.*)$/;
my $status = $1;

You should only use the numerical variables after a successful match:

my ( $status ) = $line =~ /:(.*)$/;

chomp $status;

/:(.*)$/ will not match a newline so unless you have changed the value of $/
there is nothing for chomp to remove and in any case the s/\s+$//; later would
remove any trailing newlines.

for ($status) {
s/^\s+//;
s/\s+$//;
}
print |insert the status that matched above| $status, "\n";
}
}



John
 
T

Tad McClellan

Matt Williamson said:
I figured it out. If there is a more efficient way to code it though, I'm
open.


You have the same cardinal sin that the original had though.

$line =~ /:(.*)$/;
my $status = $1;


You should never use the dollar-digit variables unless you
have first ensured that the match *succeeded*.

If you ever get a $line with no colons, then $status will NOT have
the status in it, it will have the same value as $label because
$1 was set way back when _that_ match succeeded.


die "no colon in '$line'" unless $line =~ /:(.*)$/;
my $status = $1; # now it's safe to use $1


This is a common mistake.

I made the very same one here in 1995:

Message-ID: <[email protected]>

Now I'm just returning the favor. :)
 
T

Tad McClellan

Paul Lalli said:
Matt Williamson wrote:

More succinctly written: chomp (my ($status) = $line =~ /:(.*)$/);


Even more succinctly written: my($status) = $line =~ /:(.*)$/;

.... since there cannot be any newlines in $status anyway. :)
 
X

Xicheng Jia

Paul said:
Adding the ?: above specifically makes this *not* capture. If you
wanted to capture them, why are you specifically telling perl *not* to
capture them? Capture it, and then assign a permanent variable to $1,
so you can later print it out:

if ($line =~ /(job started|job ended|job completion status)/i) {
my $job_type = $1;



More succinctly written: chomp (my ($status) = $line =~ /:(.*)$/);


what is the point of a loop that goes iterates only once? Are you just
trying to avoid writing "$status" twice instead of once? Does that
really make sense to you?

$status =~ s/^\s+//;
$status =~ s/\s+$//;

Of course, you could have equally well just not captured the whitespace
in your original match.


If I were to write this whole code, to do what I *think* you're trying
to accomplish, it would look something like:

foreach my $line (@content){
if ($line =~ /(job (?:started|ended|completion
status)):\s*(.*?)\s*$/i) {

no need to guess if you change the above *if* statement to the
following:

if ($line =~ /(?=.*?(job (?:started|ended|completion
status))).*?:\s*(.*?)\s*$/i)

Xicheng
 
D

Dr.Ruud

Paul Lalli schreef:
Matt Williamson:

what is the point of a loop that goes iterates only once? Are you
just trying to avoid writing "$status" twice instead of once? Does
that really make sense to you?

$status =~ s/^\s+//;
$status =~ s/\s+$//;

Ik like it as

s/^\s+//, s/\s+$// for $status;

but I don't remember ever having checked that idiom for having
optimization benefits or performance penalties.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top