"my" variables and recursive regexp strangeness

I

Ian

I have something strange happening with a recursive regexp compiled
with qr//x; It is a regular expression to match individual single
double and un-quoted strings, i.e. "string", 'string' and string.

It works fine when the sub parts of it are global variables, or
"local" variables, but if I change them to "my" variables, suddenly
they stop matching correctly (or at least start matching differently).

Anybody have any ideas why changing to "my" variables would affect it
this way?

I get the same behaviour using active perl 5.6.1, and perl 5.81 on
knoppix.

Other things I'd like to know if anybody has any idea are:
Is there a simpler way to regexp this kind of thing?
Why does perl crash with some recursive regexps?
Is there any particular reason for the warning generated when this
script is run using perl -W

I use test input something like:
aaa bbb "ccc"'ddd"ddd'"eee'eee" f\ \ \ ff

Here's the program, if you change the first two vars to "my" variables
it stops working. changing others don't seem to affect it.


#!perl

# Double-quoted-string data regexp
$dStringData = qr/
([^"\\]|\\.)+ (??{$dStringData})
|
"
/x;

# Single-quoted-string data regexp
$sStringData = qr/
([^'\\]|\\.)+ (??{$sStringData})
|
'
/x;

# Characters that are allowed in unquoted strings
$token = qr/([^\s\\'"]|\\.)/x;

# Unquoted-strings broken up by spaces regexp
$uStringData = qr/
(??{$token})+ (??{$uStringData})
|
\B|\b
/x;

# Matches single or double, single or unquoted strings
$string = qr/
(
(??{$token}) (??{$uStringData})
|
" (??{$dStringData})
|
' (??{$sStringData})
)
/x;

# Test program to identify "STRING"s or 'STRING's or STRINGs in the
input

while (<>) {
my @strings;

# remove them all one by one
while (/$string/) {
push @strings, $1;
s/$string//;
}

# print out of all them one by one
my $counter = 0;
foreach (@strings) {
print "$counter = [$_]\n";
$counter ++;
}
}
 
A

Anno Siegel

Ian said:
I have something strange happening with a recursive regexp compiled
with qr//x; It is a regular expression to match individual single
double and un-quoted strings, i.e. "string", 'string' and string.

It works fine when the sub parts of it are global variables, or
"local" variables, but if I change them to "my" variables, suddenly
they stop matching correctly (or at least start matching differently).

Anybody have any ideas why changing to "my" variables would affect it
this way?

It isn't the fact that they're lexical, but you apparently tried
to declare the variables in the same statement that uses them,
as in

my $dStringData = qr/
([^"\\]|\\.)+ (??{$dStringData})
|
"
/x;

You can't use a lexical in the same statement that declares it.
Use an extra "my" statement, and it works.

It would have been better to post the erroneous code, instead of
saying "if I change this, it doesn't work anymore". That way
we wouldn't have to guess your error.

Anno
 
G

Gunnar Hjalmarsson

Ian said:
I have something strange happening with a recursive regexp compiled
with qr//x; It is a regular expression to match individual single
double and un-quoted strings, i.e. "string", 'string' and string.

It works fine when the sub parts of it are global variables, or
"local" variables, but if I change them to "my" variables, suddenly
they stop matching correctly (or at least start matching
differently).

You'd better my() declare those variables before they are used:

my $dStringData;
$dStringData = qr/

etc. (Otherwise it's too late.)
Other things I'd like to know if anybody has any idea are: Is there
a simpler way to regexp this kind of thing?

This would do something similar:

my $token = qr/[^\s\\'"]|\\./;
while (<>) {
my @strings;
push @strings, $+ while /('[^']*')|("[^"]*")|($token+)/g;
print "$_ = [$strings[$_]]\n" for 0..$#strings;
}
 
J

Jeff 'japhy' Pinyan

[posted & mailed]

Anybody have any ideas why changing to "my" variables would affect it
this way?

Someone has already answered this. You can't declare and use the lexical
variable on the same line.

my $rx;
$rx = qr/...(??{ $rx }).../;

But there's another issue here.
# Double-quoted-string data regexp
$dStringData = qr/
([^"\\]|\\.)+ (??{$dStringData})
|
"
/x;
# Matches single or double, single or unquoted strings
$string = qr/
(
" (??{$dStringData})
)
/x;

I've stripped out everything but the double-quoted regexes. WHY are these
recursive? I don't see the value of that at all. Why not just

$dStringData = qr{ (?: [^"\\] | \\. )+ }xs;
$string = qr{ " $dStringData " }x;

$dStringData is not gaining anything by being recursive, since once the
non-closing-quote stuff matches, the next thing that will match *is* the
closing quote. So it "recurses" once. Unless, of course, you never match
a closing quote, in which case your regex tries a whole bunch of
permutations before failing.

Run this code:

print "slow\n";
$rx = qr{ (?: [^\\"] | \\. )+ (??{ $rx }) | " }x;
q{"this thing is too slow} =~ m{ " (??{ $rx }) }x;
print "done\n\n";

print "fast\n";
$rx = qr{ (?: [^\\"] | \\. )+ }x;
q{"this thing is too slow} =~ m{ " $rx " }x;
print "done\n\n";

You'll see the bottom one is MUCH MUCH faster. The reason the top one is
slow is because after it fails the first time, the (?:...)+ part
backtracks a bit, and then the (??{ $rx }) can match the part it didn't
match, and then it tries to match a " and fails, and it does this more and
more and more. Every character you add to that string results in a
quadratically longer wait. I took out the "!" at the end of the string
because I got impatient!

And you needn't put $rx inside (??{ ... }) in the outermost regex; it
works fine by itself.
 
J

Jeff 'japhy' Pinyan

Run this code:

print "slow\n";
$rx = qr{ (?: [^\\"] | \\. )+ (??{ $rx }) | " }x;
q{"this thing is too slow} =~ m{ " (??{ $rx }) }x;
print "done\n\n";

This becomes MUCH MUCH faster if you change $rx to

$rx = qr{ (?: [^\\"] | \\. ) (??{ $rx }) | " }x;

Note that there is no + quantifier on the (?:...) group.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top