Bizarre regex behaviour

Discussion in 'Perl Misc' started by Brian Wakem, Aug 5, 2005.

  1. Brian Wakem

    Brian Wakem Guest

    This is driving me mad. I've been chasing a bug in a program for 3hrs, I
    have narrowed it down to this complete and working code.

    This is the *exact code* I am running, followed by the actual output.

    I've tried it on two machines, which run different versions of perl and I
    get the same result.

    Please, someone tell me what the hell is going on here, why doesn't the
    regex in sub1 match the 3rd and 4th time through?


    #!/usr/bin/perl

    use strict;
    use warnings;

    sub1();
    sub1();
    if ('a' =~ m/a/) {
    sub1();
    }
    sub1();


    sub sub1 {
    print "This is sub1\n";
    if ('someword' =~ m//) {
    print "Regex matches\n";
    }
    else {
    print "Regex does not match, what on Earth is going on here?\n";
    }
    }




    $ perl -v

    This is perl, v5.8.6 built for i386-linux
    (with 1 registered patch, see perl -V for more detail)

    Copyright 1987-2004, Larry Wall

    Perl may be copied only under the terms of either the Artistic License or
    the
    GNU General Public License, which may be found in the Perl 5 source kit.

    Complete documentation for Perl, including FAQ lists, should be found on
    this system using `man perl' or `perldoc perl'. If you have access to the
    Internet, point your browser at http://www.perl.org/, the Perl Home Page.

    $ perl tmp37.pl
    This is sub1
    Regex matches
    This is sub1
    Regex matches
    This is sub1
    Regex does not match, what on Earth is going on here?
    This is sub1
    Regex does not match, what on Earth is going on here?
    $



    --
    Brian Wakem
    Email: http://homepage.ntlworld.com/b.wakem/myemail.png
     
    Brian Wakem, Aug 5, 2005
    #1
    1. Advertising

  2. Brian Wakem

    Paul Lalli Guest

    Brian Wakem wrote:
    > Please, someone tell me what the hell is going on here, why doesn't the
    > regex in sub1 match the 3rd and 4th time through?
    >
    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    >
    > sub1();
    > sub1();
    > if ('a' =~ m/a/) {
    > sub1();
    > }
    > sub1();
    >
    >
    > sub sub1 {
    > print "This is sub1\n";
    > if ('someword' =~ m//) {
    > print "Regex matches\n";
    > }
    > else {
    > print "Regex does not match, what on Earth is going on here?\n";
    > }
    > }


    > $ perl tmp37.pl
    > This is sub1
    > Regex matches
    > This is sub1
    > Regex matches
    > This is sub1
    > Regex does not match, what on Earth is going on here?
    > This is sub1
    > Regex does not match, what on Earth is going on here?
    > $


    I'm having trouble locating the exact place in the perldoc where this
    is talked about. Basically, an 'empty' pattern match is special. It
    repeats the last pattern match executed. By the time m// is seen the
    3rd and fourth times, you had previously tried to match m/a/. This
    pattern is therefore used again in the subsequent empty pattern
    matches. 'someword' does not contain any 'a' characters, so the
    pattern match fails.

    If I manage to find the correct reference, I'll post an update here.

    Paul Lalli
     
    Paul Lalli, Aug 5, 2005
    #2
    1. Advertising

  3. Brian Wakem

    Paul Lalli Guest

    Paul Lalli wrote:
    > I'm having trouble locating the exact place in the perldoc where this
    > is talked about. Basically, an 'empty' pattern match is special. It
    > repeats the last pattern match executed. By the time m// is seen the
    > 3rd and fourth times, you had previously tried to match m/a/. This
    > pattern is therefore used again in the subsequent empty pattern
    > matches. 'someword' does not contain any 'a' characters, so the
    > pattern match fails.
    >
    > If I manage to find the correct reference, I'll post an update here.


    Found it. I was searching perlre, perlretut, and perlreref. It's
    actually about the m// operator, not regexp's themselves. So...

    perldoc perlop
    m/PATTERN/cgimosx
    ...
    If the PATTERN evaluates to the empty string, the
    last successfully matched regular expression is used
    instead. In this case, only the "g" and "c" flags on
    the empty pattern is honoured - the other flags are
    taken from the original pattern. If no match has
    previously succeeded, this will (silently) act
    instead as a genuine empty pattern (which will
    always match)

    Paul Lalli
     
    Paul Lalli, Aug 5, 2005
    #3
  4. Brian Wakem

    Brian Wakem Guest

    Paul Lalli wrote:

    > Found it. I was searching perlre, perlretut, and perlreref. It's
    > actually about the m// operator, not regexp's themselves. So...
    >
    > perldoc perlop
    > m/PATTERN/cgimosx
    > ...
    > If the PATTERN evaluates to the empty string, the
    > last successfully matched regular expression is used
    > instead. In this case, only the "g" and "c" flags on
    > the empty pattern is honoured - the other flags are
    > taken from the original pattern. If no match has
    > previously succeeded, this will (silently) act
    > instead as a genuine empty pattern (which will
    > always match)
    >
    > Paul Lalli



    Thanks Paul, that certainly explains why I'm seeing this behaviour.

    What I don't understand is why on Earth is that the default behaviour? It
    makes no sense to me. If there's a good reason for it I can't see it. A
    bug, not a feature in my opinion.


    --
    Brian Wakem
    Email: http://homepage.ntlworld.com/b.wakem/myemail.png
     
    Brian Wakem, Aug 5, 2005
    #4
  5. Brian Wakem

    Guest

    Brian Wakem <> wrote:
    > This is driving me mad. I've been chasing a bug in a program for 3hrs, I
    > have narrowed it down to this complete and working code.
    >
    > This is the *exact code* I am running, followed by the actual output.
    >
    > I've tried it on two machines, which run different versions of perl and I
    > get the same result.
    >
    > Please, someone tell me what the hell is going on here, why doesn't the
    > regex in sub1 match the 3rd and 4th time through?



    As documented in perldoc perlretut,

    If the regexp evaluates to the empty string, the
    regexp in the last successful match is used instead.

    This seems like a mal-feature to me, but there you have it.
    Note thst sub1 uses an empty string as the regex.

    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    >
    > sub1();


    At this point, there is no last successful match. I would consider the
    behavior to be undefined. Apparently perl just uses an actual empty regex,
    which of course matches anything.

    > sub1();


    At this point, it uses the last successful regex, again the empty one.

    > if ('a' =~ m/a/) {
    > sub1();


    At this point, sub1 uses the last successful regex, which is /a/

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Aug 5, 2005
    #5
  6. Brian Wakem

    Eric Amick Guest

    On 05 Aug 2005 22:40:11 GMT, wrote:

    >Brian Wakem <> wrote:
    >> This is driving me mad. I've been chasing a bug in a program for 3hrs, I
    >> have narrowed it down to this complete and working code.
    >>
    >> This is the *exact code* I am running, followed by the actual output.
    >>
    >> I've tried it on two machines, which run different versions of perl and I
    >> get the same result.
    >>
    >> Please, someone tell me what the hell is going on here, why doesn't the
    >> regex in sub1 match the 3rd and 4th time through?

    >
    >
    >As documented in perldoc perlretut,
    >
    > If the regexp evaluates to the empty string, the
    > regexp in the last successful match is used instead.
    >
    >This seems like a mal-feature to me, but there you have it.


    It exists because of historical precedent, I think--a number of Unix
    editors treat an empty regex that way to reduce typing when doing a
    substitution repeatedly.

    --
    Eric Amick
    Columbia, MD
     
    Eric Amick, Aug 6, 2005
    #6
  7. Brian Wakem

    Guest

    Eric Amick <> wrote:
    > On 05 Aug 2005 22:40:11 GMT, wrote:
    >>Brian Wakem <> wrote:


    >>As documented in perldoc perlretut,
    >>
    >> If the regexp evaluates to the empty string, the
    >> regexp in the last successful match is used instead.
    >>
    >>This seems like a mal-feature to me, but there you have it.


    > It exists because of historical precedent, I think--a number of Unix
    > editors treat an empty regex that way to reduce typing when doing a
    > substitution repeatedly.


    For searching rather than substitution. Vi for example, as in Perl,
    a substitution with an empty string is always taken to mean
    that there should be a deletion. Which is quite logical.

    Axel
     
    , Aug 6, 2005
    #7
  8. Brian Wakem

    Eric Amick Guest

    On Sat, 06 Aug 2005 14:24:59 GMT, wrote:

    >Eric Amick <> wrote:
    >> On 05 Aug 2005 22:40:11 GMT, wrote:
    >>>Brian Wakem <> wrote:

    >
    >>>As documented in perldoc perlretut,
    >>>
    >>> If the regexp evaluates to the empty string, the
    >>> regexp in the last successful match is used instead.
    >>>
    >>>This seems like a mal-feature to me, but there you have it.

    >
    >> It exists because of historical precedent, I think--a number of Unix
    >> editors treat an empty regex that way to reduce typing when doing a
    >> substitution repeatedly.

    >
    >For searching rather than substitution. Vi for example, as in Perl,
    >a substitution with an empty string is always taken to mean
    >that there should be a deletion. Which is quite logical.


    I was thinking of the target of the substitution, i.e., the string to be
    replaced, when I said that, but you have a point about searching in
    general.

    --
    Eric Amick
    Columbia, MD
     
    Eric Amick, Aug 7, 2005
    #8
  9. Brian Wakem wrote:
    > Paul Lalli wrote:
    >
    >
    >>Found it. I was searching perlre, perlretut, and perlreref. It's
    >>actually about the m// operator, not regexp's themselves. So...
    >>
    >>perldoc perlop
    >> m/PATTERN/cgimosx
    >> ...
    >> If the PATTERN evaluates to the empty string, the
    >> last successfully matched regular expression is used
    >> instead. In this case, only the "g" and "c" flags on
    >> the empty pattern is honoured - the other flags are
    >> taken from the original pattern. If no match has
    >> previously succeeded, this will (silently) act
    >> instead as a genuine empty pattern (which will
    >> always match)
    >>
    >>Paul Lalli

    >
    >
    >
    > Thanks Paul, that certainly explains why I'm seeing this behaviour.
    >
    > What I don't understand is why on Earth is that the default behaviour? It
    > makes no sense to me. If there's a good reason for it I can't see it. A
    > bug, not a feature in my opinion.


    If you've ever used one of the "ed" family of editors (vi, vim, ...),
    you'll find that it is the default behaviour there, too.

    And ... as it is in the documentation, you can hardly call it a bug, can
    you.

    --
    Josef Möllers (Pinguinpfleger bei FSC)
    If failure had no penalty success would not be a prize
    -- T. Pratchett
     
    Josef Moellers, Aug 8, 2005
    #9
  10. Brian Wakem

    Anno Siegel Guest

    Josef Moellers <> wrote in comp.lang.perl.misc:
    > Brian Wakem wrote:
    > > Paul Lalli wrote:
    > >
    > >
    > >>Found it. I was searching perlre, perlretut, and perlreref. It's
    > >>actually about the m// operator, not regexp's themselves. So...
    > >>
    > >>perldoc perlop
    > >> m/PATTERN/cgimosx
    > >> ...
    > >> If the PATTERN evaluates to the empty string, the
    > >> last successfully matched regular expression is used
    > >> instead. In this case, only the "g" and "c" flags on
    > >> the empty pattern is honoured - the other flags are
    > >> taken from the original pattern. If no match has
    > >> previously succeeded, this will (silently) act
    > >> instead as a genuine empty pattern (which will
    > >> always match)
    > >>
    > >>Paul Lalli

    > >
    > >
    > >
    > > Thanks Paul, that certainly explains why I'm seeing this behaviour.
    > >
    > > What I don't understand is why on Earth is that the default behaviour? It
    > > makes no sense to me. If there's a good reason for it I can't see it. A
    > > bug, not a feature in my opinion.

    >
    > If you've ever used one of the "ed" family of editors (vi, vim, ...),
    > you'll find that it is the default behaviour there, too.


    ....with one small but essential difference: In Perl, it is necessary
    for a regex to *match successfully* before it is taken as the default
    for m//. The editors accept any (syntactically correct) regex.

    Perl's behavior makes the feature practically useless. If you know
    the regex at coding time, you can always write it out. You would want
    to use the feature when the regex is only given at run time, but then
    you'd have to make it match once. That is, given an arbitrary regex,
    you'd have to construct a string that this regex matches. That is an
    utterly non-trivial task that doesn't always have a solution. You
    wouldn't want to do that just to set a default.

    > And ... as it is in the documentation, you can hardly call it a bug, can
    > you.


    I can still call it a misfeature, and I do. I suppose it was an
    implementation error -- it was meant to be "successfully compiled
    regex" but got implemented as "successfully matching regex". Now
    we're stuck with it.

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Aug 8, 2005
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Pascal Herczog

    This is bizarre html behaviour...

    Pascal Herczog, Nov 13, 2005, in forum: HTML
    Replies:
    5
    Views:
    479
    Toby Inkster
    Nov 15, 2005
  2. Simon Pryor
    Replies:
    5
    Views:
    1,126
    Peter Kragh
    Jun 17, 2004
  3. Replies:
    3
    Views:
    794
    Reedick, Andrew
    Jul 1, 2008
  4. musosdev
    Replies:
    0
    Views:
    573
    musosdev
    Jul 1, 2008
  5. Sean Inglis

    Bizarre instanceof behaviour

    Sean Inglis, Jan 11, 2007, in forum: Javascript
    Replies:
    3
    Views:
    95
    Douglas Crockford
    Jan 12, 2007
Loading...

Share This Page