"my" variables and recursive regexp strangeness

Discussion in 'Perl Misc' started by Ian, May 13, 2004.

  1. Ian

    Ian Guest

    I have something strange happening with a recursive regexp compiled
    with qr//x; It is a regular expression to match individual single
    double and un-quoted strings, i.e. "string", 'string' and string.

    It works fine when the sub parts of it are global variables, or
    "local" variables, but if I change them to "my" variables, suddenly
    they stop matching correctly (or at least start matching differently).

    Anybody have any ideas why changing to "my" variables would affect it
    this way?

    I get the same behaviour using active perl 5.6.1, and perl 5.81 on
    knoppix.

    Other things I'd like to know if anybody has any idea are:
    Is there a simpler way to regexp this kind of thing?
    Why does perl crash with some recursive regexps?
    Is there any particular reason for the warning generated when this
    script is run using perl -W

    I use test input something like:
    aaa bbb "ccc"'ddd"ddd'"eee'eee" f\ \ \ ff

    Here's the program, if you change the first two vars to "my" variables
    it stops working. changing others don't seem to affect it.


    #!perl

    # Double-quoted-string data regexp
    $dStringData = qr/
    ([^"\\]|\\.)+ (??{$dStringData})
    |
    "
    /x;

    # Single-quoted-string data regexp
    $sStringData = qr/
    ([^'\\]|\\.)+ (??{$sStringData})
    |
    '
    /x;

    # Characters that are allowed in unquoted strings
    $token = qr/([^\s\\'"]|\\.)/x;

    # Unquoted-strings broken up by spaces regexp
    $uStringData = qr/
    (??{$token})+ (??{$uStringData})
    |
    \B|\b
    /x;

    # Matches single or double, single or unquoted strings
    $string = qr/
    (
    (??{$token}) (??{$uStringData})
    |
    " (??{$dStringData})
    |
    ' (??{$sStringData})
    )
    /x;

    # Test program to identify "STRING"s or 'STRING's or STRINGs in the
    input

    while (<>) {
    my @strings;

    # remove them all one by one
    while (/$string/) {
    push @strings, $1;
    s/$string//;
    }

    # print out of all them one by one
    my $counter = 0;
    foreach (@strings) {
    print "$counter = [$_]\n";
    $counter ++;
    }
    }
    Ian, May 13, 2004
    #1
    1. Advertising

  2. Ian

    Anno Siegel Guest

    Ian <> wrote in comp.lang.perl.misc:
    > I have something strange happening with a recursive regexp compiled
    > with qr//x; It is a regular expression to match individual single
    > double and un-quoted strings, i.e. "string", 'string' and string.
    >
    > It works fine when the sub parts of it are global variables, or
    > "local" variables, but if I change them to "my" variables, suddenly
    > they stop matching correctly (or at least start matching differently).
    >
    > Anybody have any ideas why changing to "my" variables would affect it
    > this way?


    It isn't the fact that they're lexical, but you apparently tried
    to declare the variables in the same statement that uses them,
    as in

    my $dStringData = qr/
    ([^"\\]|\\.)+ (??{$dStringData})
    |
    "
    /x;

    You can't use a lexical in the same statement that declares it.
    Use an extra "my" statement, and it works.

    It would have been better to post the erroneous code, instead of
    saying "if I change this, it doesn't work anymore". That way
    we wouldn't have to guess your error.

    Anno
    Anno Siegel, May 13, 2004
    #2
    1. Advertising

  3. Ian wrote:
    > I have something strange happening with a recursive regexp compiled
    > with qr//x; It is a regular expression to match individual single
    > double and un-quoted strings, i.e. "string", 'string' and string.
    >
    > It works fine when the sub parts of it are global variables, or
    > "local" variables, but if I change them to "my" variables, suddenly
    > they stop matching correctly (or at least start matching
    > differently).


    You'd better my() declare those variables before they are used:

    my $dStringData;
    $dStringData = qr/

    etc. (Otherwise it's too late.)

    > Other things I'd like to know if anybody has any idea are: Is there
    > a simpler way to regexp this kind of thing?


    This would do something similar:

    my $token = qr/[^\s\\'"]|\\./;
    while (<>) {
    my @strings;
    push @strings, $+ while /('[^']*')|("[^"]*")|($token+)/g;
    print "$_ = [$strings[$_]]\n" for 0..$#strings;
    }

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, May 13, 2004
    #3
  4. [posted & mailed]

    On 13 May 2004, Ian wrote:

    >Anybody have any ideas why changing to "my" variables would affect it
    >this way?


    Someone has already answered this. You can't declare and use the lexical
    variable on the same line.

    my $rx;
    $rx = qr/...(??{ $rx }).../;

    But there's another issue here.

    ># Double-quoted-string data regexp
    >$dStringData = qr/
    > ([^"\\]|\\.)+ (??{$dStringData})
    > |
    > "
    > /x;


    ># Matches single or double, single or unquoted strings
    >$string = qr/
    > (
    > " (??{$dStringData})
    > )
    > /x;


    I've stripped out everything but the double-quoted regexes. WHY are these
    recursive? I don't see the value of that at all. Why not just

    $dStringData = qr{ (?: [^"\\] | \\. )+ }xs;
    $string = qr{ " $dStringData " }x;

    $dStringData is not gaining anything by being recursive, since once the
    non-closing-quote stuff matches, the next thing that will match *is* the
    closing quote. So it "recurses" once. Unless, of course, you never match
    a closing quote, in which case your regex tries a whole bunch of
    permutations before failing.

    Run this code:

    print "slow\n";
    $rx = qr{ (?: [^\\"] | \\. )+ (??{ $rx }) | " }x;
    q{"this thing is too slow} =~ m{ " (??{ $rx }) }x;
    print "done\n\n";

    print "fast\n";
    $rx = qr{ (?: [^\\"] | \\. )+ }x;
    q{"this thing is too slow} =~ m{ " $rx " }x;
    print "done\n\n";

    You'll see the bottom one is MUCH MUCH faster. The reason the top one is
    slow is because after it fails the first time, the (?:...)+ part
    backtracks a bit, and then the (??{ $rx }) can match the part it didn't
    match, and then it tries to match a " and fails, and it does this more and
    more and more. Every character you add to that string results in a
    quadratically longer wait. I took out the "!" at the end of the string
    because I got impatient!

    And you needn't put $rx inside (??{ ... }) in the outermost regex; it
    works fine by itself.

    --
    Jeff Pinyan RPI Acacia Brother #734 RPI Acacia Corp Secretary
    "And I vos head of Gestapo for ten | Michael Palin (as Heinrich Bimmler)
    years. Ah! Five years! Nein! No! | in: The North Minehead Bye-Election
    Oh. Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)
    Jeff 'japhy' Pinyan, May 13, 2004
    #4
  5. On Thu, 13 May 2004, Jeff 'japhy' Pinyan wrote:

    >Run this code:
    >
    > print "slow\n";
    > $rx = qr{ (?: [^\\"] | \\. )+ (??{ $rx }) | " }x;
    > q{"this thing is too slow} =~ m{ " (??{ $rx }) }x;
    > print "done\n\n";


    This becomes MUCH MUCH faster if you change $rx to

    $rx = qr{ (?: [^\\"] | \\. ) (??{ $rx }) | " }x;

    Note that there is no + quantifier on the (?:...) group.

    --
    Jeff Pinyan RPI Acacia Brother #734 RPI Acacia Corp Secretary
    "And I vos head of Gestapo for ten | Michael Palin (as Heinrich Bimmler)
    years. Ah! Five years! Nein! No! | in: The North Minehead Bye-Election
    Oh. Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)
    Jeff 'japhy' Pinyan, May 13, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. n00m
    Replies:
    12
    Views:
    1,113
  2. vamsi
    Replies:
    21
    Views:
    2,073
    Keith Thompson
    Mar 9, 2009
  3. Dale Amon

    regexp strangeness

    Dale Amon, Apr 9, 2009, in forum: Python
    Replies:
    3
    Views:
    227
    Steven D'Aprano
    Apr 11, 2009
  4. Jason Sweat
    Replies:
    17
    Views:
    532
    Nikolai Weibull
    Nov 6, 2004
  5. Joao Silva
    Replies:
    16
    Views:
    359
    7stud --
    Aug 21, 2009
Loading...

Share This Page