Interesting PERL anamoly - confirmation and/or explanations welcomed

X

xyz88888

I encountered a problem with a PERL script I wrote, and I would like
to find out if this is just a problem with the ActiveState PERL
interpreter, or a problem in the PERL spec itself. I am using the
ActivePerl interpreter, build 820 version 5.8.8. I looked for other
free downloadable PERL interpreters to compare results, but had no
luck. So I'm hoping to get some feedback from any of you interested
folks.

The input: a multimedia production storyboard in ASCII-text format.
Sections of the file contain dialogue for voice-overs, and I want to
parse out just this stuff and ignore the rest. In this case, I need
all lines starting with the string '(rules script) "' copied to the
output file of my script.

The process: my PERL script reads in the file line-by-line, does
string matching against the first part of each line and based on some
logic, determines if and how that line should be copied to the output
file.

The problem: in some instances, the string to be used in the matching
function needs to be stripped out before copied to the output file. I
do this using "positional parameters" ($1, $2,...) from the match
command. Whenever the string contains non-alphanumeric characters
(with the "\" escape), those characters seem to cause the substitution
command to fail. After numerous codings, I concluded that based on the
various outputs.

The code: For brevity I am only including the combinations of match
conditionals and the substitution commands I tried that are related to
this specific problem. I'll copy the whole script down below for those
who want to see the whole thing, although everything else works fine.
The match function is written to match the string '(rules script) "'
at the beginning of a line.

Variation 1:

} elsif (/(^\(rules script\) ")/) {
s/$1//;
print OUTFILE;
}

Result 1:
(rules script) "If you need.....

Comments:
This was the first attempt. The outer parans go with the elsif, the
middle parans set the value for $1 (I expected!), and the inner parans
are escaped as they are part of the string matching. Based on the
output I concluded no sub-ing of any kind was performed - input
matched output verbatim.

variation 2:

} elsif (/(^\(rules script\) \")/) {
s/$1//;
print OUTFILE;
}

Result 2:
(rules script) "If you need.....

Comments:
My first suspician was the double-quote at the end of the match. So I
put an "\" before it to ensure it was treated as a literal and copied
into the value for $1. No change.

Variation 3:

} elsif (/(^\(rules script\) ")/) {

print STDERR "\n1=$1\n";
s/\($1\)\s\"//;
print OUTFILE;
}

Result 3:
(to STDERR)1=(rules script) "

OUTPUT: (rules script) "If you need.....

My suspicion was confirmed. $1 has the correct string after the
successful match, but for some reason the sub command is failing. I
hoped that putting in the non-alphanums into the first parameter of
the sub command would work, but it did not. Onto 4 (and qualified
success!)...

Variation 4:

} elsif (/(rules script)/) {

print STDERR "\n1=$1\n";
s/\($1\)\s\"//;
print OUTFILE;
}

Result 4:
(to STDERR)1=(rules script) "

OUTPUT: If you need.....

Comments:
At last, I got the result I wanted! Unfortunately I had to literally
spell it out for the script. Wanted to rule out that it was the non-
alphanums that were mucking up the sub.

Going for broke, I tried doing the sub and matching together in the
conditional test, and that worked fine too.
} elsif (s/^(\(rules script\)\s\")//) {


Conclusions:
* using the positional parameter ($1) in sub-ing = ok
* using non-alphanums in sub-ing = ok
* using a positional parameter containing alpha-nums in sub-ing = NA-
AH!

So my curiosity is whether this is a bug in the ActiveState version of
PERL, the PERL spec itself (not likely), or my logic (usually my first
suspicion, but this time I think I'm off the hook of guilt, no?).

Please reply if you have any insight or gave it a go yourself and got
some useful results. Thanks.

- DK

P.S. As promised, here is the whole script.

$x = 0;
$cont = "f";
$csm = "f";

open (INFILE, "lcm01_v1d-TO.txt");
open (OUTFILE, ">outfile.txt");

while (<INFILE>) {
++$x;
chomp;

if (/^(lcm01_\d{3}\w*)\s?/) {
$csm = "f";
print OUTFILE "\n$_\nNARRATOR: ";

} elsif (s/^(\(rules script\)\s\")//) {

print STDERR "\n1=$1\n";
s/\($1\)\s\"//;
print STDERR "\n$x:\n$_\n";

s/"\s*$/ /;
print OUTFILE;
print OUTFILE "When you have completed this exercise, click the
Next button to continue.\n";

} elsif ($cont eq "t") {
chomp; chomp; chomp;

if ((/^Notes$/)
|| (/^Correct Answer:$/)
|| (//))
{
$cont = "f";

} elsif (/"\s*$/) {
$cont = "f";
s/"\s*$/\n/;
print OUTFILE;

} else {
print OUTFILE;
}

} elsif ((! /TRAINER:/)
&& (/([A-Z]+):\s+"?(.+)/))
{
$cont = "t";

if ($1 eq "CSM") {
$csm = "t";
$line = "\n$1: $2";

} elsif (($line = $2) =~ m/^Click/) {
$cont = "f";

if ($csm eq "f") {
$line = "\n$line";
} else {
$line = "\nNARRATOR: $line\n";
}

} elsif ($1 eq "NARRATION") {
$line = "\n$1: $2";

} elsif ($csm eq "t") {
$line = "\n$1: $2";

} elsif ($csm eq "f") {
$line = "$2";
}

if ($line =~ m/("\s*)$/) {
$line =~ s/$1/\n/;
$cont = "f";
}

print OUTFILE $line;
}
}

close INFILE;
close OUTFILE;
end;
 
B

Brian McCauley

I encountered a problem with a PERL script I wrote,

See FAQ: What's the difference between "perl" and "Perl"?
The problem: in some instances, the string to be used in the matching
function needs to be stripped out before copied to the output file. I
do this using "positional parameters" ($1, $2,...) from the match
command. Whenever the string contains non-alphanumeric characters
(with the "\" escape), those characters seem to cause the substitution
command to fail.

The positional parameters ($1 et al) are a red herring here.

A much simpler illustration of your problem.

for my $foo ( 'This string contains [regex] metacharacters!','but this
does not') {
print "\$foo is: $foo\n";
print "\$foo does not match \$foo!\n" unless $foo =~ /$foo/;
}

When you interpolate a string into a regex any regex metacharacters
are (by default) still treated as meta. (IIRC this will change in
Perl6).

To interpolate the string without metacharacter interpretation...

/\Q$foo\E/
} elsif (/(^\(rules script\) \")/) {
s/$1//;
print OUTFILE;
}

Of course this is much more simply written...

} elsif (s/(^\(rules script\) ")//) {
print OUTFILE;
}

....which side-steps the whole issue in this particular case.
 
P

Paul Lalli

The problem: in some instances, the string to be used in the matching
function needs to be stripped out before copied to the output file. I
do this using "positional parameters" ($1, $2,...) from the match
command. Whenever the string contains non-alphanumeric characters
(with the "\" escape), those characters seem to cause the substitution
command to fail. After numerous codings, I concluded that based on the
various outputs.

perldoc -f quotemeta

perldoc -q quote
Found in /opt2/Perl5_8_4/lib/perl5/5.8.4/pod/perlfaq6.pod
How can I quote a variable to use in a regex?


Paul Lalli
 
D

Dr.Ruud

(e-mail address removed) schreef:
I encountered a problem with a PERL script I wrote, and I would like
to find out if this is just a problem with the ActiveState PERL
interpreter, or a problem in the PERL spec itself.

The third and most likely error is that you made bad assumptions.

$x = 0;
$cont = "f";
$csm = "f";

Missing:
use strict;
use warnings;
and the proper declaration of the variables.

[...]
while (<INFILE>) {
++$x;

That $x is emulating $., so remove it.
[...]
} elsif (s/^(\(rules script\)\s\")//) {

print STDERR "\n1=$1\n";
s/\($1\)\s\"//;
print STDERR "\n$x:\n$_\n";

Yuck.

m/^\(rules script\)\s+"(.*)"$/ and print $1, "\n";

(and if you wouldn't have chomped, the "\n" at the end can go)
 
J

John W. Krahn

I encountered a problem with a PERL script I wrote, and I would like
to find out if this is just a problem with the ActiveState PERL
interpreter, or a problem in the PERL spec itself. I am using the
ActivePerl interpreter, build 820 version 5.8.8. I looked for other
free downloadable PERL interpreters to compare results, but had no
luck. So I'm hoping to get some feedback from any of you interested
folks.

The input: a multimedia production storyboard in ASCII-text format.
Sections of the file contain dialogue for voice-overs, and I want to
parse out just this stuff and ignore the rest. In this case, I need
all lines starting with the string '(rules script) "' copied to the
output file of my script.

The process: my PERL script reads in the file line-by-line, does
string matching against the first part of each line and based on some
logic, determines if and how that line should be copied to the output
file.

The problem: in some instances, the string to be used in the matching
function needs to be stripped out before copied to the output file. I
do this using "positional parameters" ($1, $2,...) from the match
command. Whenever the string contains non-alphanumeric characters
(with the "\" escape), those characters seem to cause the substitution
command to fail. After numerous codings, I concluded that based on the
various outputs.

The code: For brevity I am only including the combinations of match
conditionals and the substitution commands I tried that are related to
this specific problem. I'll copy the whole script down below for those
who want to see the whole thing, although everything else works fine.
The match function is written to match the string '(rules script) "'
at the beginning of a line.

Variation 1:

} elsif (/(^\(rules script\) ")/) {
s/$1//;
print OUTFILE;
}

Put the substitution in the conditionl:

} elsif ( s/(^\(rules script\) ")// ) {
print OUTFILE;
}

[ SNIP ]
P.S. As promised, here is the whole script.

use warnings;
use strict;
$x = 0;
$cont = "f";
$csm = "f";

open (INFILE, "lcm01_v1d-TO.txt");
open (OUTFILE, ">outfile.txt");

You should *ALWAYS* verify that the files opened correctly!

open INFILE, '<', 'lcm01_v1d-TO.txt' or die "Cannot open 'lcm01_v1d-TO.txt' $!";
open OUTFILE, '>', 'outfile.txt' or die "Cannot open 'outfile.txt' $!";

while (<INFILE>) {
++$x;
chomp;

if (/^(lcm01_\d{3}\w*)\s?/) {
$csm = "f";
print OUTFILE "\n$_\nNARRATOR: ";

} elsif (s/^(\(rules script\)\s\")//) {

print STDERR "\n1=$1\n";
s/\($1\)\s\"//;
print STDERR "\n$x:\n$_\n";

You don't need the $x variable to get the current line number:

print STDERR "\n$.:\n$_\n";

s/"\s*$/ /;
print OUTFILE;
print OUTFILE "When you have completed this exercise, click the
Next button to continue.\n";

} elsif ($cont eq "t") {
chomp; chomp; chomp;

What do you think that using chomp three times will accomplish (that using it
only once will not?)

if ((/^Notes$/)
|| (/^Correct Answer:$/)
|| (//))

What did you think that (//) was going to do?

perldoc perlop
[ SNIP ]

If the PATTERN evaluates to the empty string, the last successfully
matched regular expression is used instead. In this case, only the "g"
and "c" flags on the empty pattern is honoured - the other flags are
taken from the original pattern. If no match has previously succeeded,
this will (silently) act instead as a genuine empty pattern (which
will always match).

{
$cont = "f";

} elsif (/"\s*$/) {
$cont = "f";
s/"\s*$/\n/;
print OUTFILE;

You should just put the substitution in the conditionl:

} elsif ( s/"\s*$/\n/ ) {
$cont = "f";
print OUTFILE;

} else {
print OUTFILE;
}

} elsif ((! /TRAINER:/)
&& (/([A-Z]+):\s+"?(.+)/))
{
$cont = "t";

if ($1 eq "CSM") {
$csm = "t";
$line = "\n$1: $2";

} elsif (($line = $2) =~ m/^Click/) {
$cont = "f";

if ($csm eq "f") {
$line = "\n$line";
} else {
$line = "\nNARRATOR: $line\n";
}

} elsif ($1 eq "NARRATION") {
$line = "\n$1: $2";

} elsif ($csm eq "t") {
$line = "\n$1: $2";

} elsif ($csm eq "f") {
$line = "$2";
}

if ($line =~ m/("\s*)$/) {
$line =~ s/$1/\n/;
$cont = "f";
}

print OUTFILE $line;
}
}

close INFILE;
close OUTFILE;


John
 
D

Dave Weaver

$csm = "f";
} elsif ($csm eq "t") {
$line = "\n$1: $2";

} elsif ($csm eq "f") {
$line = "$2";
}

Unrelated to your problem, but I hate the use of strings to represent boolean
values.

Whilst Perl doesn't have built-in 'true'/'false' constants, it does recognise
boolean values, so using one of the natural representations of boolean values
can make your code more readable and less error prone.

For example, the string "f" is actually true in a boolean context, which can be
confusing:

my $csm = "f";
print "true" if $csm;
__END__
true

I like to use a sensibly named variable for my booleans, usually including "is"
or "can", and use the values 1 (true) and 0 (false), e.g.

use constant TRUE => 1;
use constant FALSE => 0;

my $can_sing = FALSE;
my $is_readable = TRUE;

if ( $moon_is_full and $wolf->can_howl() ) { ... }

wich is, IMO, much more readable than:

if ( $full_moon eq "t" and $wolf->howl() eq "t" ) { ... }
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top