Odd regex behavior

Discussion in 'Perl Misc' started by Mintcake, Oct 1, 2007.

  1. Mintcake

    Mintcake Guest

    I wouldd be grateful to anyone who can shed some light on the
    unexpected
    results from the regex in the following program.

    #!/usr/local/bin/perl -l

    use strict;

    my $y = ' href="/foo/bar?d=1&c=2&f=1&cards=1" x="123"';

    for ($y =~ /(\s+\w+=['"](.*?)["'])/gs)
    {
    print "1) $_";
    print "2) [$1][$2]";

    my $x = /(\w+)=['"](.*)["']/;
    print "3) [$x] [$1][$2]";

    my $x = /(\w+)=['"](.*)["']/;
    print "4) [$x] [$1][$2]";

    my $x = /(\w+)=['"](.*)["']/;
    print "5) [$x] [$1][$2]";

    print "";
    }
    __END__

    The results I get are as follows

    1) href="/foo/bar?d=1&c=2&f=1&cards=1"
    2) [ x="123"][123]
    3) [1] [href][/foo/bar?d=1&c=2&f=1&cards=1]
    4) [1] [href][/foo/bar?d=1&c=2&f=1&cards=1]
    5) [1] [href][/foo/bar?d=1&c=2&f=1&cards=1]

    1) /foo/bar?d=1&c=2&f=1&cards=1
    2) [href][/foo/bar?d=1&c=2&f=1&cards=1]
    3) [] [href][/foo/bar?d=1&c=2&f=1&cards=1]
    4) [] [href][/foo/bar?d=1&c=2&f=1&cards=1]
    5) [] [=2&f=][]

    1) x="123"
    2) [=2&f=][]
    3) [1] [x][123]
    4) [1] [x][123]
    5) [1] [x][123]

    1) 123
    2) [x][123]
    3) [] [x][123]
    4) [] [x][123]
    5) [] [x][123]

    Now I accept that this code is sloppy for several reasons but in my
    defence I have
    to say that it is not my code.

    1. A while loop would probably be better than a foreach loop

    2. The first regex is attempting to break the string in a list of
    att="value" type
    strings but is returning att="value" and "value" so the .*? should not
    be parenthesized

    3. No attempt is made to ensure that the same type of quote is used at
    the start and
    end of the value

    The thing I cannot explain are the results from the second iteration
    of the loop. The
    same regex is executed three times and each time it fails (correctly),
    however, the third
    time the $1 and $2 values are overwritten. I have always believed
    that the $digit variable
    would be preserved if the regex failed to match. Reading the Camel
    indicates that this
    should indeed be the case.

    No matter how many times the regex is executed within the loop it is
    only on the final one
    $1 and $2 are overwritten
     
    Mintcake, Oct 1, 2007
    #1
    1. Advertising

  2. Mintcake

    Ben Bullock Guest

    On Sun, 30 Sep 2007 20:37:13 -0700, Mintcake wrote:

    > I wouldd be grateful to anyone who can shed some light on the
    > unexpected
    > results from the regex in the following program.
    >
    > #!/usr/local/bin/perl -l
    >
    > use strict;


    Adding the line

    use warnings;

    to your script gives the answer to your problem.
     
    Ben Bullock, Oct 1, 2007
    #2
    1. Advertising

  3. Mintcake

    Paul Lalli Guest

    On Sep 30, 11:37 pm, Mintcake <> wrote:
    > I wouldd be grateful to anyone who can shed some light on the
    > unexpected
    > results from the regex in the following program.
    >
    > #!/usr/local/bin/perl -l
    >
    > use strict;


    Why are you asking people for help before asking Perl for help? Why
    haven't you enabled warnings?

    >
    > my $y = ' href="/foo/bar?d=1&c=2&f=1&cards=1" x="123"';
    >
    > for ($y =~ /(\s+\w+=['"](.*?)["'])/gs)
    > {
    > print "1) $_";
    > print "2) [$1][$2]";
    >
    > my $x = /(\w+)=['"](.*)["']/;
    > print "3) [$x] [$1][$2]";
    >
    > my $x = /(\w+)=['"](.*)["']/;
    > print "4) [$x] [$1][$2]";
    >
    > my $x = /(\w+)=['"](.*)["']/;
    > print "5) [$x] [$1][$2]";
    >
    > print "";}
    >
    > __END__


    > Now I accept that this code is sloppy for several reasons but in my
    > defence I have to say that it is not my code.
    >
    > 1. A while loop would probably be better than a foreach loop


    No, not probably. Definitely. They do not do the same thing at all
    in this case, because m//g has very different meanings when evaluated
    in a list vs a scalar context.

    > The thing I cannot explain are the results from the second
    > iteration of the loop. The same regex is executed three times


    No it's not. It's only executed once, because you evaluated it in a
    list context and then iterated over the results of that one
    evaluation, rather than iterating it repeatedly (and progressively) in
    a scalar context.

    Paul Lalli
     
    Paul Lalli, Oct 1, 2007
    #3
  4. Mintcake

    Paul Lalli Guest

    On Oct 1, 9:34 am, Paul Lalli <> wrote:
    > On Sep 30, 11:37 pm, Mintcake <> wrote:
    >
    > > I wouldd be grateful to anyone who can shed some light on the
    > > unexpected results from the regex in the following program.


    > > my $y = ' href="/foo/bar?d=1&c=2&f=1&cards=1" x="123"';

    >
    > > for ($y =~ /(\s+\w+=['"](.*?)["'])/gs)
    > > {
    > > print "1) $_";
    > > print "2) [$1][$2]";

    >
    > > my $x = /(\w+)=['"](.*)["']/;
    > > print "3) [$x] [$1][$2]";

    >
    > > my $x = /(\w+)=['"](.*)["']/;
    > > print "4) [$x] [$1][$2]";

    >
    > > my $x = /(\w+)=['"](.*)["']/;
    > > print "5) [$x] [$1][$2]";

    >
    > > print "";}



    My profuse apologies. I completely misparsed what your post was
    getting at, and came back with a completely wrong answer. Having run
    your code, I am also confused as to what's happening. How is $1 being
    set to '=2&f=' and how is $2 being undefined, especially seeing as how
    as you said, the pattern match is failing. I'm going to keep staring
    at it, but I look forward to other responses to this thread. . .

    Paul Lalli
     
    Paul Lalli, Oct 1, 2007
    #4
  5. Mintcake

    Greg Bacon Guest

    Looks like you've found a bug. Please file a report!

    Greg
    --
    When man attempts to rise above Nature, he usually falls below it.
    -- Sherlock Holmes
     
    Greg Bacon, Oct 1, 2007
    #5
  6. Mintcake

    Guest

    On Oct 1, 6:08 am, Ben Bullock <> wrote:
    > On Sun, 30 Sep 2007 20:37:13 -0700, Mintcake wrote:
    > > I wouldd be grateful to anyone who can shed some light on the
    > > unexpected
    > > results from the regex in the following program.

    >
    > > #!/usr/local/bin/perl -l

    >
    > > use strict;

    >
    > Adding the line
    >
    > use warnings;
    >
    > to your script gives the answer to your problem.


    No, warnings have nothing to do with this.

    Yves
     
    , Oct 1, 2007
    #6
  7. Mintcake

    Guest

    On Oct 1, 5:37 am, Mintcake <> wrote:
    > I wouldd be grateful to anyone who can shed some light on the
    > unexpected
    > results from the regex in the following program.
    >
    > #!/usr/local/bin/perl -l
    >
    > use strict;
    >
    > my $y = ' href="/foo/bar?d=1&c=2&f=1&cards=1" x="123"';
    >
    > for ($y =~ /(\s+\w+=['"](.*?)["'])/gs)
    > {
    > print "1) $_";
    > print "2) [$1][$2]";
    >
    > my $x = /(\w+)=['"](.*)["']/;
    > print "3) [$x] [$1][$2]";
    >
    > my $x = /(\w+)=['"](.*)["']/;
    > print "4) [$x] [$1][$2]";
    >
    > my $x = /(\w+)=['"](.*)["']/;
    > print "5) [$x] [$1][$2]";
    >
    > print "";}
    >
    > __END__
    >
    > The results I get are as follows
    >
    > 1) href="/foo/bar?d=1&c=2&f=1&cards=1"
    > 2) [ x="123"][123]
    > 3) [1] [href][/foo/bar?d=1&c=2&f=1&cards=1]
    > 4) [1] [href][/foo/bar?d=1&c=2&f=1&cards=1]
    > 5) [1] [href][/foo/bar?d=1&c=2&f=1&cards=1]
    >
    > 1) /foo/bar?d=1&c=2&f=1&cards=1
    > 2) [href][/foo/bar?d=1&c=2&f=1&cards=1]
    > 3) [] [href][/foo/bar?d=1&c=2&f=1&cards=1]
    > 4) [] [href][/foo/bar?d=1&c=2&f=1&cards=1]
    > 5) [] [=2&f=][]


    This is a bug for sure. Notice that '=2&f=' is the same length as
    'cards'. How it ends up at that offset im not sure and I havent
    debugged it to see whats up.

    The good news is that I already fixed this for 5.10, although its hard
    to say which fix was responsible, there were a number related to
    capturing and rollbacks and the like done in the 5.9.x line.

    The bad news is that the patch is highly unlikely to be back ported to
    5.8.x :-(

    Interesting bug tho. Cheers.

    Yves
     
    , Oct 1, 2007
    #7
  8. On Oct 1, 4:37 am, Mintcake <> wrote:
    > I wouldd be grateful to anyone who can shed some light on the
    > unexpected
    > results from the regex in the following program.


    I suspect that this is pretty much the same issue as was discussed
    here recently

    http://groups.google.com/group/comp.lang.perl.misc/msg/d128a5c4d28a917b

    Here's a much simpler way to reproduce it

    use strict;
    use warnings;

    'From outside loop' =~ /(.*)/;

    for my $pass ( 1, 2 ) {
    print "$1\n";
    'From later inside loop' =~ /(.*)/;
    }
    __END__

    The above could reasonably be expected to print 'From outside loop'
    twice but actually prints 'From later inside loop' the second time.

    The work-round is simply to double the {}

    use strict;
    use warnings;

    'From outside loop' =~ /(.*)/;

    for my $pass ( 1, 2 ) {{
    print "$1\n";
    'From later inside loop' =~ /(.*)/;
    }}
    __END__

    I am able to reproduce this in 5.9.5.
     
    Brian McCauley, Oct 2, 2007
    #8
  9. On Oct 2, 5:51 pm, Brian McCauley <> wrote:
    > On Oct 1, 4:37 am, Mintcake <> wrote:
    >
    > > I wouldd be grateful to anyone who can shed some light on the
    > > unexpected
    > > results from the regex in the following program.

    >
    > I suspect that this is pretty much the same issue as was discussed
    > here recently


    Correction - if it wasn't for that issue you probably would not have
    been able to observe the bug.

    There is, of course, as Yves points out a much more serious bug here
    too.
     
    Brian McCauley, Oct 2, 2007
    #9
  10. On Oct 3, 2:14 am, wrote:
    > On Tue, 02 Oct 2007 16:51:17 -0000, Brian McCauley <> wrote:
    > >'From outside loop' =~ /(.*)/;

    >
    > >for my $pass ( 1, 2 ) {
    > > print "$1\n";
    > > 'From later inside loop' =~ /(.*)/;
    > >}
    > >__END__

    >
    > >The above could reasonably be expected to print 'From outside loop'
    > >twice but actually prints 'From later inside loop' the second time.


    > I'm a little unsure of the logic. In your loop, you do a regex behind
    > the print $1.


    Yes, that's the whole point.

    > Wouldn't you expect the result from the last regex?


    No I'd expect the result from the last regex excluding those from
    dynamic scopes that have now ended. On the second iteration of the
    loop the dynamic scope from the first iteration has ended so I should
    not see the result of the regex.

    > If regex finally has "scope", you should expect garbage or unreliable results
    > in the first pass.


    No, it is defined that if there has been no successful regex match in
    the current dynamic scope then the parent dynamic scope is examined.
    This is usual for dynamic scopes.

    > The for { } is scope, the second pass prints the inside.


    Yes, this is the bug I'm reporting.

    > Probably, the $_ should clear the $n variables though, can't remember if it
    > does.


    $_ is not involved anywhere in my example.

    > I didn't try your code.


    I did.
     
    Brian McCauley, Oct 4, 2007
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Elliot M. Rodriguez

    PLEASE HELP = odd TextChanged behavior

    Elliot M. Rodriguez, Oct 21, 2003, in forum: ASP .Net
    Replies:
    2
    Views:
    323
    Elliot M. Rodriguez
    Oct 22, 2003
  2. Guest

    Step-thru code - odd behavior

    Guest, May 28, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    384
    Guest
    Jun 1, 2004
  3. =?Utf-8?B?Q2hyaXM=?=
    Replies:
    1
    Views:
    327
    Karl Seguin
    Jan 7, 2005
  4. Michael Speer

    Odd behavior with odd code

    Michael Speer, Feb 16, 2007, in forum: C Programming
    Replies:
    33
    Views:
    1,104
    Richard Heathfield
    Feb 18, 2007
  5. Replies:
    3
    Views:
    769
    Reedick, Andrew
    Jul 1, 2008
Loading...

Share This Page