help with some capturing syntax

Matt Williamson · Jul 7, 2006

Given the following, is there an easy way to preface the print $status,
"\n"; line with job started, job ended or job completion status? I've been
reading about about capturing in the blue camel, but I can't figure out if
or how to make it work.

foreach my $line (@content){
if ($line =~ /(?:job started|job ended|job completion status)/i) {
$line =~ /

.*)$/;
my $status = $1;
chomp $status;
for ($status) {
s/^\s+//;
s/\s+$//;
}
print |insert the status that matched above| $status, "\n";
}
}

TIA

Matt

Matt Williamson · Jul 7, 2006

I figured it out. If there is a more efficient way to code it though, I'm
open.

foreach my $line (@content){
if ($line =~ /((job started)|(job ended)|(job completion status))/i) {
my $label = $1;
$line =~ /

.*)$/;
my $status = $1;
chomp $status;
for ($status) {
s/^\s+//;
s/\s+$//;
}
print $label, " : ",$status, "\n";
}

Paul Lalli · Jul 7, 2006

Matt said:
Given the following, is there an easy way to preface the print $status,
"\n"; line with job started, job ended or job completion status? I've been
reading about about capturing in the blue camel, but I can't figure out if
or how to make it work.

foreach my $line (@content){
if ($line =~ /(?:job started|job ended|job completion status)/i) {

Adding the ?: above specifically makes this *not* capture. If you
wanted to capture them, why are you specifically telling perl *not* to
capture them? Capture it, and then assign a permanent variable to $1,
so you can later print it out:

if ($line =~ /(job started|job ended|job completion status)/i) {
my $job_type = $1;

$line =~ /.*)$/;
my $status = $1;
chomp $status;

More succinctly written: chomp (my ($status) = $line =~ /

.*)$/);

for ($status) {
s/^\s+//;
s/\s+$//;
}

what is the point of a loop that goes iterates only once? Are you just
trying to avoid writing "$status" twice instead of once? Does that
really make sense to you?

$status =~ s/^\s+//;
$status =~ s/\s+$//;

Of course, you could have equally well just not captured the whitespace
in your original match.

print |insert the status that matched above| $status, "\n";
}
}

If I were to write this whole code, to do what I *think* you're trying
to accomplish, it would look something like:

foreach my $line (@content){
if ($line =~ /(job (?:started|ended|completion
status)):\s*(.*?)\s*$/i) {
my ($job_type, $status) = ($1, $2);
print "$job_type: $status\n";
}
}

Of course, without any sample input or output to go by, I'm only
guessing.

Paul Lalli

Paul Lalli · Jul 7, 2006

Matt said:
I figured it out. If there is a more efficient way to code it though, I'm
open.

foreach my $line (@content){
if ($line =~ /((job started)|(job ended)|(job completion status))/i) {

What do you think those three inner parentheses are doing?

(see my previous post for a critique of the rest of the code)

Paul Lalli

Matt Williamson · Jul 7, 2006

what is the point of a loop that goes iterates only once? Are you just
trying to avoid writing "$status" twice instead of once? Does that
really make sense to you?

$status =~ s/^\s+//;
$status =~ s/\s+$//;

It's in perlfaq 4 that way. I'm quite new to this, so I can't really say
what does or doesn't make sense. <g>

If I were to write this whole code, to do what I *think* you're trying
to accomplish, it would look something like:

foreach my $line (@content){
if ($line =~ /(job (?:started|ended|completion
status)):\s*(.*?)\s*$/i) {
my ($job_type, $status) = ($1, $2);
print "$job_type: $status\n";
}
}

Of course, without any sample input or output to go by, I'm only
guessing.

This is much cleaner code and you've taught me a good bit by posting it.
Thanks!

Paul Lalli · Jul 7, 2006

Matt said:
It's in perlfaq 4 that way.

Where? The only thing I see that deals with s/\s+// is at
http://perldoc.perl.org/perlfaq4.html#How-do-I-strip-blank-space-from-the-beginning/end-of-a-string?
which gives you those two s///'s, but certainly doesn't recommend
creating a one-iteartion loop...

This is much cleaner code and you've taught me a good bit by posting it.
Thanks!

Quite welcome.

Paul Lalli

Matt Williamson · Jul 7, 2006

Where? The only thing I see that deals with s/\s+// is at
http://perldoc.perl.org/perlfaq4.html#How-do-I-strip-blank-space-from-the-beginning/end-of-a-string?
which gives you those two s///'s, but certainly doesn't recommend
creating a one-iteartion loop...

This is what it says in my version of Perlfaq 4

___

How do I strip blank space from the beginning/end of a string?
Although the simplest approach would seem to be

$string =~ s/^\s*(.*?)\s*$/$1/;

not only is this unnecessarily slow and destructive, it also fails with
embedded newlines. It is much faster to do this operation in two steps:

$string =~ s/^\s+//;
$string =~ s/\s+$//;

Or more nicely written as:

for ($string) {
s/^\s+//;
s/\s+$//;
}

This idiom takes advantage of the "foreach" loop's aliasing behavior to
factor out common code. You can do this on several strings at once, or
arrays, or even the values of a hash if you use a slice:

# trim whitespace in the scalar, the array,
# and all the values in the hash
foreach ($scalar, @array, @hash{keys %hash}) {
s/^\s+//;
s/\s+$//;
}

___

Since it said "Or more nicely written as" I thought that must be the correct
way to do it. It now seems that it's only better if you have multiple
strings to use it on.

Matt

Paul Lalli · Jul 7, 2006

Matt said:
This is what it says in my version of Perlfaq 4

___

How do I strip blank space from the beginning/end of a string?
Although the simplest approach would seem to be

$string =~ s/^\s*(.*?)\s*$/$1/;

not only is this unnecessarily slow and destructive, it also fails with
embedded newlines. It is much faster to do this operation in two steps:

$string =~ s/^\s+//;
$string =~ s/\s+$//;

Or more nicely written as:

for ($string) {
s/^\s+//;
s/\s+$//;
}

Yeesh. I certainly am glad that FAQ's been updated.

This idiom takes advantage of the "foreach" loop's aliasing behavior to
factor out common code. You can do this on several strings at once, or
arrays, or even the values of a hash if you use a slice:

# trim whitespace in the scalar, the array,
# and all the values in the hash
foreach ($scalar, @array, @hash{keys %hash}) {
s/^\s+//;
s/\s+$//;
}

___

Since it said "Or more nicely written as" I thought that must be the correct
way to do it.

One of the things you should learn about Perl is that there's no such
thing as "the" correct way to do anything. Indeed, one of Perl's
mottos is "There Is More Than One Way To Do It". Obviously, someone
thought (and probably still thinks) that aliasing the variable to $_ by
means of a one-iteration foreach was a good way of doing it. I
disagree, as it seems to be needlessly misleading. That doesn't mean
that either my way or that old FAQ's way are "wrong".

It now seems that it's only better if you have multiple strings to use it on.

That is my opinion, yes. There is, of course, something to be said for
extensability. With the FAQ's way, your code is all set to have more
variables added to it, just by typing them into the foreach's list.
"My" way, you'd have to copy and paste code. You need to decide which
is the worst trade-off: readability, or extensability.

Paul Lalli

John W. Krahn · Jul 7, 2006

Matt said:
Given the following, is there an easy way to preface the print $status,
"\n"; line with job started, job ended or job completion status? I've been
reading about about capturing in the blue camel, but I can't figure out if
or how to make it work.

foreach my $line (@content){
if ($line =~ /(?:job started|job ended|job completion status)/i) {

You are using non-capturing parentheses. If you want to capture the job
status you have to use capturing parentheses:

if ($line =~ /(job (?:started|ended|completion status))/i) {
my $job_status = $1;

$line =~ /.*)$/;
my $status = $1;

You should only use the numerical variables after a successful match:

my ( $status ) = $line =~ /

.*)$/;

chomp $status;

/

.*)$/ will not match a newline so unless you have changed the value of $/
there is nothing for chomp to remove and in any case the s/\s+$//; later would
remove any trailing newlines.

for ($status) {
s/^\s+//;
s/\s+$//;
}
print |insert the status that matched above| $status, "\n";
}
}

John

Tad McClellan · Jul 7, 2006

Matt Williamson said:
I figured it out. If there is a more efficient way to code it though, I'm
open.

You have the same cardinal sin that the original had though.

$line =~ /.*)$/;
my $status = $1;

You should never use the dollar-digit variables unless you
have first ensured that the match *succeeded*.

If you ever get a $line with no colons, then $status will NOT have
the status in it, it will have the same value as $label because
$1 was set way back when _that_ match succeeded.

die "no colon in '$line'" unless $line =~ /

.*)$/;
my $status = $1; # now it's safe to use $1

This is a common mistake.

I made the very same one here in 1995:

Message-ID: <[email protected]>

Now I'm just returning the favor.

Tad McClellan · Jul 7, 2006

Paul Lalli said:
Matt Williamson wrote:

More succinctly written: chomp (my ($status) = $line =~ /.*)$/);

Even more succinctly written: my($status) = $line =~ /

.*)$/;

.... since there cannot be any newlines in $status anyway.

Xicheng Jia · Jul 8, 2006

Paul said:
Adding the ?: above specifically makes this *not* capture. If you
wanted to capture them, why are you specifically telling perl *not* to
capture them? Capture it, and then assign a permanent variable to $1,
so you can later print it out:

if ($line =~ /(job started|job ended|job completion status)/i) {
my $job_type = $1;

More succinctly written: chomp (my ($status) = $line =~ /.*)$/);

what is the point of a loop that goes iterates only once? Are you just
trying to avoid writing "$status" twice instead of once? Does that
really make sense to you?

$status =~ s/^\s+//;
$status =~ s/\s+$//;

Of course, you could have equally well just not captured the whitespace
in your original match.

If I were to write this whole code, to do what I *think* you're trying
to accomplish, it would look something like:

foreach my $line (@content){
if ($line =~ /(job (?:started|ended|completion
status)):\s*(.*?)\s*$/i) {

no need to guess if you change the above *if* statement to the
following:

if ($line =~ /(?=.*?(job (?:started|ended|completion
status))).*?:\s*(.*?)\s*$/i)

Xicheng

Dr.Ruud · Jul 9, 2006

Paul Lalli schreef:

Matt Williamson:

what is the point of a loop that goes iterates only once? Are you
just trying to avoid writing "$status" twice instead of once? Does
that really make sense to you?

$status =~ s/^\s+//;
$status =~ s/\s+$//;

Ik like it as

s/^\s+//, s/\s+$// for $status;

but I don't remember ever having checked that idiom for having
optimization benefits or performance penalties.

Help with script to get backup log status on windows systems	3	Jul 19, 2006
Help with code	0	Jun 12, 2022
Help with syntax	4	Jan 29, 2007
MySQLdb syntax issues - HELP	10	Dec 16, 2007
Help with Hash of Hashes	1	Mar 1, 2006
Help with python code!	18	Mar 31, 2013
help on HTTP 400 Bad Request syntax error on urllib2.urlopen	0	Jan 10, 2012
Some python syntax that I'm not getting	5	Dec 7, 2007

help with some capturing syntax

Matt Williamson

Matt Williamson

Paul Lalli

Paul Lalli

Matt Williamson

Paul Lalli

Matt Williamson

Paul Lalli

John W. Krahn

Tad McClellan

Tad McClellan

Xicheng Jia

Dr.Ruud

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads