Lexical variables in (?{...}) regexp constructs

Discussion in 'Perl Misc' started by Ala, Jun 15, 2009.

  1. Ala

    Ala Guest

    Hi all,

    In perlre, the documentation for (?{...}) says:

    "Due to an unfortunate implementation issue, the Perl code contained
    in these blocks is treated as a compile time closure that can have
    seemingly bizarre consequences when used with lexically scoped
    variables inside of subroutines or loops. There are various
    workarounds for this, including simply using global variables instead.
    If you are using this construct and strange results occur then check
    for the use of lexically scoped variables."

    I'm indeed seeing weird things, mainly variables getting undefined for
    no reason.
    Any ideas on what the "various workarounds" that TFM is speaking of
    are? I'd rather stay as far away from global vars as I can.

    Thanks,
    --Ala
     
    Ala, Jun 15, 2009
    #1
    1. Advertising

  2. Ala

    Guest

    On Mon, 15 Jun 2009 10:57:06 -0700 (PDT), Ala <> wrote:

    >Hi all,
    >
    >In perlre, the documentation for (?{...}) says:
    >
    >"Due to an unfortunate implementation issue, the Perl code contained
    >in these blocks is treated as a compile time closure that can have
    >seemingly bizarre consequences when used with lexically scoped
    >variables inside of subroutines or loops. There are various
    >workarounds for this, including simply using global variables instead.
    >If you are using this construct and strange results occur then check
    >for the use of lexically scoped variables."
    >
    >I'm indeed seeing weird things, mainly variables getting undefined for
    >no reason.
    >Any ideas on what the "various workarounds" that TFM is speaking of
    >are? I'd rather stay as far away from global vars as I can.
    >
    >Thanks,
    >--Ala


    Probably the key phrase is "the Perl code contained in these blocks";
    Imbedding code inside of a regular expression really has limited use
    unless used in conjunction with a conditional or to immediatly store
    the value of the last capture group.

    It gets better, you can't nest another regular expression in the block
    (since the engine is not reentrant).

    Seemingly better results happen when you call a named subroutine from the
    code block. Here, lexicals seem to work and m// seems to work, but not s///
    (the latter causes a crash on my machine, so be carefull not to call a
    Perl function that uses the regex engine).

    It seems the lack of explanation and numerous caveats are meant as a warning
    to stay clear.

    Below are a few examples.
    The first one trys a lexical within the code block (not too good).
    The second calls a subroutine (that does lexicals) from withing the code block.
    The third is an example of somebody's IP parser I cleaned up that
    shows extended conditional and code embedding (using 5.10).

    Anyway, its hit or miss with extended/experimental stuff.

    -sln



    ## ex. 1
    ================
    use strict;
    use warnings;

    my $string = "yes no yes no";

    while ( $string =~ /yes(?{my $test = printmsg(); print "test = '$test'\n";})/g) {}

    sub printmsg
    {
    print "found yes\n";
    '';
    }
    __END__
    Output:
    found yes
    test = ''
    Use of uninitialized value $string in pattern match (m//) at dd.pl line 6.

    ## ex. 2
    ================
    use strict;
    use warnings;

    my $string = "yes no yes no";
    my $test = "this is test";

    while ( $string =~ /yes(?{$test = printmsg($test);})/g) {}
    print "test = '$test'\n";

    sub printmsg
    {
    my $param = shift;
    my $count = 2;

    while ($count--) {
    print "($count)found yes, was passed '$param'\n";
    }
    return '';
    ## Cannot do regex if being called from embedded code
    }
    __END__
    Output:
    (1)found yes, was passed 'this is test'
    (0)found yes, was passed 'this is test'
    (1)found yes, was passed ''
    (0)found yes, was passed ''
    test = ''

    ## ex. 3
    ================
    ## IpMatch_5_10.pl
    ## (To test new Perl 5.10 conditionals)
    ##

    require 5.10.0; # 5.10 only, new extended regex
    use strict;
    use warnings;

    my $Octlimit = 255;

    my $OctetPat = qr/
    \b (\d{1,3}) \b # capture a 3 digit number on boundries

    (?(?{ # start conditional code block

    # print "$^N\n"; # uncomment to print what matched last
    $^N > $Octlimit # condition: is number > octet limit ?

    }) # end code block

    (*FAIL) # yes, condition is true, force pattern to fail for this number
    )
    /x;

    my $dottedQuadPat = qr/ # Capture quad parts to named variables in the %+ hash
    \s*
    (?<O1>$OctetPat)
    \.
    (?<O2>$OctetPat)
    \.
    (?<O3>$OctetPat)
    \.
    (?<O4>$OctetPat)
    \s*
    /x;

    my $DressedIPv4Pat = qr/ # Capture dressed quad parts to named variables in the %+ hash
    \s* \[
    $dottedQuadPat
    \] \s*
    /x;

    while (my $ip = <DATA>)
    {
    chomp $ip;
    next if !length($ip);
    print "IP:\n'$ip'\n";

    ## Match all valid ip octets
    my @match = $ip =~/$OctetPat/g;
    if (@match)
    {
    print " ++ matched single octets\n";
    for my $val (@match) {
    print " $val\n";
    }
    } else {
    print " -- no single octet match\n";
    }

    ## Match dotted quad ip
    if ($ip =~ /^$dottedQuadPat$/)
    {
    print " ++ matched quad #.#.#.#\n";
    foreach my $key (sort keys %+) {
    print " $key = $+{$key}\n";
    }
    } else {
    print " -- no strict quad match\n";
    }

    ## Match dressed dotted quad ip
    if ($ip =~ /^$DressedIPv4Pat$/)
    {
    print " ++ matched dressed quad [#.#.#.#]\n";
    foreach my $key (sort keys %+) {
    print " $key = $+{$key}\n";
    }
    } else {
    print " -- no strict dressed quad match\n";
    }
    }
    __DATA__

    1.12.123.254.255.256.4872
    1.12.123.254
    [123.254.255.255]
     
    , Jun 16, 2009
    #2
    1. Advertising

  3. Ala

    Ala Guest

    On Jun 15, 5:00 pm, wrote:

    > Probably the key phrase is "the Perl code contained in these blocks";
    > Imbedding code inside of a regular expression really has limited use
    > unless used in conjunction with a conditional or to immediatly store
    > the value of the last capture group.


    Thanks for the reply. I realize my post wasn't very informative, and
    that I'm probably playing with fire :)
    Storing the last captured match is exactly what I'm using this
    construct for. The basic idea is that I created a module to help parse
    some non-trivial file format. The constructor of this module allows
    the user to specify which parts of each record to capture and into
    which variable to put it. It returns a compiled regexp that the user
    can use when parsing. Something like this contrived example:

    use R;
    my ($name, $x, $y);
    my $rgx = R->new(-capture => {
    name => \$name,
    locx => \$x,
    locy => \$y,
    });

    #... later on .. use $rgx ..
    while (<$fh>) {
    if (/$rgx/) {
    print "$name is at ($x, $y).\n";
    }
    }

    I used this module as part of a larger module, let's call it M, that
    parses the whole file and defines some other methods to manipulate the
    data.
    For the most part, this works perfectly well. Weird things start to
    happen, seemingly in a random fashion, when I instantiate module M
    multiple times in a loop.

    I guess I shouldn't be relying on experimental features, but since the
    docs mentioned a work-around, I was curious.

    Thanks,
    --Ala
     
    Ala, Jun 16, 2009
    #3
  4. Ala

    Guest

    On Tue, 16 Jun 2009 12:03:48 -0700 (PDT), Ala <> wrote:

    >On Jun 15, 5:00 pm, wrote:
    >
    >> Probably the key phrase is "the Perl code contained in these blocks";
    >> Imbedding code inside of a regular expression really has limited use
    >> unless used in conjunction with a conditional or to immediatly store
    >> the value of the last capture group.

    >
    >Thanks for the reply. I realize my post wasn't very informative, and
    >that I'm probably playing with fire :)
    >Storing the last captured match is exactly what I'm using this
    >construct for. The basic idea is that I created a module to help parse
    >some non-trivial file format. The constructor of this module allows
    >the user to specify which parts of each record to capture and into
    >which variable to put it. It returns a compiled regexp that the user
    >can use when parsing. Something like this contrived example:
    >
    > use R;
    > my ($name, $x, $y);
    > my $rgx = R->new(-capture => {
    > name => \$name,
    > locx => \$x,
    > locy => \$y,
    > });
    >
    > #... later on .. use $rgx ..
    > while (<$fh>) {
    > if (/$rgx/) {
    > print "$name is at ($x, $y).\n";
    > }
    > }
    >
    >I used this module as part of a larger module, let's call it M, that
    >parses the whole file and defines some other methods to manipulate the
    >data.
    >For the most part, this works perfectly well. Weird things start to
    >happen, seemingly in a random fashion, when I instantiate module M
    >multiple times in a loop.
    >
    >I guess I shouldn't be relying on experimental features, but since the
    >docs mentioned a work-around, I was curious.
    >
    >Thanks,
    >--Ala


    As far as I know, lexicals scoped within the block that initiates the
    regex, should be ok, be it a reference or not (haven't tried it but assume its ok).
    pseudo - example:
    SCOPE:
    {
    my ($name, $x, $y);
    my ($ref_name, $ref_x, $ref_y) = (\$name, \$x, \$y);

    my $rgx = qr/([a-z,A-Z]+)(?{$$name = $^N})(\d\d)(?{$$x = $^N}),(\d\d)(?{$$y = $^N})/;
    while (<$fh>) {
    if (/$rgx/) {
    print "$name is at ($x, $y).\n";
    }
    }
    };

    If what you say is true, this is the case:

    Package M;
    use R;
    my ($name, $x, $y);
    my $rgx = R->new();
    .... more code
    if (/$rgx/) { }
    1;

    As a side, the R->new() is being used as just a class function call.
    It does not bless() anything it appears since it is returning a string scalar.

    Then somewhere else, you create multiple instances of M
    (or just call M methods) in in a loop.

    The my ($name, $x, $y) appear to be file scoped variables.
    In the context I wrote, there is only one instance of ($name,$x,$y) no
    matter how many instances of M you create (class scoped?).

    If you had a method in M that creates many $rgx's, it would have to store
    those qr// in an object based (M) blessed referent (hash or array) for them
    and thereby thier references ($name,$x,$y) to persist.

    Usually this involves fleshing out either R or M with references to unique
    lexicals.

    So that this is the case (usually):

    while (<$fh>) {
    if (/$obj->$rgx/) {
    print "$obj->$name is at ($obj->$x, $obj->$y).\n";
    }
    }

    More than likely you would need accessors.

    Hope this helps.
    -sln
     
    , Jun 16, 2009
    #4
  5. Ala

    Guest

    On Tue, 16 Jun 2009 14:12:25 -0700, wrote:

    >On Tue, 16 Jun 2009 12:03:48 -0700 (PDT), Ala <> wrote:
    >

    [snip]
    >More than likely you would need accessors.
    >

    Here's an example of making a custom regex class. This is far removed
    from what you want though.

    In your case, you would create the regex in the constructor,
    add unique references (to lexicals) in a qr// statement, then
    assign it to the object hash ie. $self->{rgx} = qr//, then return
    the instance. The references must be unique if thats what your goal is.

    -sln

    -----------------------------------

    ###
    package RxP;
    our @ISA = qw();

    sub new
    {
    my $self;
    my $class = shift;
    if (defined($_[0]) && ref($_[0]) eq 'RxP') {
    %{$self} = %{$_[0]};
    return bless ($self, $class);
    }
    $self = {
    'regex' => '',
    'code' => sub{''},
    'type' => 's',
    'dflt_sub' => \&search
    };
    while (my ($name, $val) = splice (@_, 0, 2)) {
    next if (!defined $val);
    if ('regex' eq lc $name) {
    $self->{'regex'} = $val;
    }
    elsif ('code' eq lc $name && ref($val) eq 'CODE') {
    $self->{'code'} = $val;
    }
    elsif ('type' eq lc $name && $val =~ /(sg|gs|rg|gr|s|r)/i) {
    set_type ($self, lc $1);
    }
    }
    return bless ($self, $class);
    }
    sub get_type
    {
    return $_[0]->{'type'};
    }
    sub set_type
    {
    return 0 unless (defined $_[1]);
    if ($_[1] =~ /(sg|gs|rg|gr|s|r)/i) {
    $_[0]->{'dflt_sub'} = {
    's' => \&search,
    'sg' => \&search_g,
    'gs' => \&search_g,
    'r' => \&replace,
    'rg' => \&replace_g,
    'gr' => \&replace_g
    }->{lc $1};
    $_[0]->{'type'} = lc $1;
    return 1;
    }
    return 0;
    }
    sub clone
    {
    # clone self, return new
    return RxP->new( $_[0]);
    }
    sub copy
    {
    # copy other to self, return self
    return $_[0] unless (defined $_[1] && ref($_[1]) eq 'RxP');
    %{$_[0]} = %{$_[1]}; # no need for deep recursion
    return $_[0];
    }
    sub apply
    {
    return 0 unless (defined $_[1]);
    return &{$_[0]->{'dflt_sub'}};
    }
    sub search
    {
    return 0 unless (defined $_[1]);
    return $_[1] =~ /$_[0]->{'regex'}/;
    }
    sub search_g
    {
    return 0 unless (defined $_[1]);
    return $_[1] =~ /$_[0]->{'regex'}/g;
    }
    sub replace
    {
    return 0 unless (defined $_[1]);
    return $_[1] =~ s/$_[0]->{'regex'}/&{$_[0]->{'code'}}/e;
    }
    sub replace_g
    {
    return 0 unless (defined $_[1]);
    return $_[1] =~ s/$_[0]->{'regex'}/&{$_[0]->{'code'}}/ge;
    }
     
    , Jun 16, 2009
    #5
  6. Ala

    Guest

    On Tue, 16 Jun 2009 14:12:25 -0700, wrote:

    >On Tue, 16 Jun 2009 12:03:48 -0700 (PDT), Ala <> wrote:
    >
    >>On Jun 15, 5:00 pm, wrote:
    >>
    >>> Probably the key phrase is "the Perl code contained in these blocks";
    >>> Imbedding code inside of a regular expression really has limited use
    >>> unless used in conjunction with a conditional or to immediatly store
    >>> the value of the last capture group.

    >>
    >>Thanks for the reply. I realize my post wasn't very informative, and
    >>that I'm probably playing with fire :)
    >>Storing the last captured match is exactly what I'm using this
    >>construct for. The basic idea is that I created a module to help parse
    >>some non-trivial file format. The constructor of this module allows
    >>the user to specify which parts of each record to capture and into
    >>which variable to put it. It returns a compiled regexp that the user
    >>can use when parsing. Something like this contrived example:
    >>
    >> use R;
    >> my ($name, $x, $y);
    >> my $rgx = R->new(-capture => {
    >> name => \$name,
    >> locx => \$x,
    >> locy => \$y,
    >> });
    >>
    >> #... later on .. use $rgx ..
    >> while (<$fh>) {
    >> if (/$rgx/) {
    >> print "$name is at ($x, $y).\n";
    >> }
    >> }
    >>
    >>I used this module as part of a larger module, let's call it M, that
    >>parses the whole file and defines some other methods to manipulate the
    >>data.
    >>For the most part, this works perfectly well. Weird things start to
    >>happen, seemingly in a random fashion, when I instantiate module M
    >>multiple times in a loop.
    >>
    >>I guess I shouldn't be relying on experimental features, but since the
    >>docs mentioned a work-around, I was curious.
    >>
    >>Thanks,
    >>--Ala

    >
    >As far as I know, lexicals scoped within the block that initiates the
    >regex, should be ok, be it a reference or not (haven't tried it but assume its ok).
    >pseudo - example:
    >SCOPE:
    >{
    > my ($name, $x, $y);
    > my ($ref_name, $ref_x, $ref_y) = (\$name, \$x, \$y);
    >
    > my $rgx = qr/([a-z,A-Z]+)(?{$$name = $^N})(\d\d)(?{$$x = $^N}),(\d\d)(?{$$y = $^N})/;

    $$ref_name $$ref_x $$ref_y

    Sorry, I just mindlessly do this, depending too much on the compiler/interpreter to catch stuff.
    But, thats what crafts are... fix/debug/refactor, rinse, repeat

    -sln
     
    , Jun 17, 2009
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Joakim Hove

    Lexical variables - speed penalty?

    Joakim Hove, Oct 7, 2004, in forum: C Programming
    Replies:
    3
    Views:
    263
    CBFalconer
    Oct 7, 2004
  2. Replies:
    5
    Views:
    282
    Richard Bos
    Jan 31, 2005
  3. Joao Silva
    Replies:
    16
    Views:
    367
    7stud --
    Aug 21, 2009
  4. Joseph Ellis
    Replies:
    6
    Views:
    138
    Joseph Ellis
    Jul 25, 2003
  5. Peter J. Holzer

    Warning about unused lexical variables

    Peter J. Holzer, Sep 4, 2007, in forum: Perl Misc
    Replies:
    5
    Views:
    161
    Peter J. Holzer
    Sep 7, 2007
Loading...

Share This Page