regex question: extended [^...] concept?

Discussion in 'Perl Misc' started by Werner Lemberg, Mar 23, 2009.

  1. Folks,


    consider this input:

    foo ... foo ... bar

    where `...' doesn't contain the word `foo'. How can I write a regular
    expression which matches `foo ... bar' but not `foo ... foo ... bar'? Were
    it a single character as in

    f ... f ... bar

    I could write

    /f[^f]*bar/

    but how can I do something similar for a word? In other words, I search an
    extension of the [^.] concept which covers a sequence of characters.

    I've looked into both the `perlre' and `perlretut' manual pages (of perl
    5.10.0), but it contains relevant to this problem.


    Werner
    Werner Lemberg, Mar 23, 2009
    #1
    1. Advertising

  2. Werner Lemberg

    smallpond Guest

    On Mar 23, 4:00 pm, Werner Lemberg <> wrote:
    > Folks,
    >
    > consider this input:
    >
    > foo ... foo ... bar
    >
    > where `...' doesn't contain the word `foo'. How can I write a regular
    > expression which matches `foo ... bar' but not `foo ... foo ... bar'? Were
    > it a single character as in
    >
    > f ... f ... bar
    >
    > I could write
    >
    > /f[^f]*bar/
    >
    > but how can I do something similar for a word? In other words, I search an
    > extension of the [^.] concept which covers a sequence of characters.
    >
    > I've looked into both the `perlre' and `perlretut' manual pages (of perl
    > 5.10.0), but it contains relevant to this problem.
    >
    > Werner


    print "OK" if ($source =~ /foo.*bar/ and $source !~ /foo.*foo.*bar/);
    smallpond, Mar 23, 2009
    #2
    1. Advertising

  3. Werner Lemberg

    Willem Guest

    Werner Lemberg wrote:
    )
    ) Folks,
    )
    )
    ) consider this input:
    )
    ) foo ... foo ... bar
    )
    ) where `...' doesn't contain the word `foo'. How can I write a regular
    ) expression which matches `foo ... bar' but not `foo ... foo ... bar'? Were
    ) it a single character as in
    )
    ) f ... f ... bar
    )
    ) I could write
    )
    ) /f[^f]*bar/
    )
    ) but how can I do something similar for a word? In other words, I search an
    ) extension of the [^.] concept which covers a sequence of characters.

    That's quite difficult and complicated to do in a single regexp.
    You basically have to cover all cases.

    This might work, but I'm not sure I got all cases right:

    /foo[^f]*((f[^o]|fo[^o])[^f]*)*f?bar/

    You can see that using two regexes one after the other (as mentioned
    crossthread) is a lot easier.


    SaSW, Willem
    --
    Disclaimer: I am in no way responsible for any of the statements
    made in the above text. For all I know I might be
    drugged or something..
    No I'm not paranoid. You all think I'm paranoid, don't you !
    #EOT
    Willem, Mar 23, 2009
    #3
  4. Willem <> wrote:
    > ) consider this input:
    > )
    > ) foo ... foo ... bar
    > )
    > ) where `...' doesn't contain the word `foo'. How can I write a regular
    > ) expression which matches `foo ... bar' but not `foo ... foo ... bar'?
    >
    > That's quite difficult and complicated to do in a single regexp.
    > You basically have to cover all cases.


    > This might work, but I'm not sure I got all cases right:


    > /foo[^f]*((f[^o]|fo[^o])[^f]*)*f?bar/


    > You can see that using two regexes one after the other (as mentioned
    > crossthread) is a lot easier.


    Thanks for the answers. I'm really surprised that there are so many regex
    extensions in Perl but not a single one which covers this. Is this
    difficult to handle in a regex machine, or is there no need normally for
    it?

    Especially in combination with the (?PARNO) stuff (as described in the
    perlre man page) this could be quite handy for recursively parsing nested
    expressions.

    I also wonder why there is no callback mechanism with in regular
    expressions. The (?{ code }) construct allows execution of arbitrary Perl
    code but within the regex it always evaluates to true. I would like to have
    a similar construct, say, (!{ code }), which evaluates to true or false
    depending on `code'. Then I could implement my above request by myself,
    simply checking the passed subgroup whether it contains the given string.


    Werner
    Werner Lemberg, Mar 23, 2009
    #4
  5. Werner Lemberg <> wrote:
    >consider this input:
    >
    > foo ... foo ... bar
    >
    >where `...' doesn't contain the word `foo'. How can I write a regular
    >expression which matches `foo ... bar' but not `foo ... foo ... bar'?


    reverse() the text and match non-greedy /^rab...oof/, then reverse() the
    found match again.

    jue
    Jürgen Exner, Mar 23, 2009
    #5
  6. Werner Lemberg

    smallpond Guest

    On Mar 23, 4:45 pm, Willem <> wrote:
    > Werner Lemberg wrote:
    >
    > )
    > ) Folks,
    > )
    > )
    > ) consider this input:
    > )
    > ) foo ... foo ... bar
    > )
    > ) where `...' doesn't contain the word `foo'. How can I write a regular
    > ) expression which matches `foo ... bar' but not `foo ... foo ... bar'? Were
    > ) it a single character as in
    > )
    > ) f ... f ... bar
    > )
    > ) I could write
    > )
    > ) /f[^f]*bar/
    > )
    > ) but how can I do something similar for a word? In other words, I search an
    > ) extension of the [^.] concept which covers a sequence of characters.
    >
    > That's quite difficult and complicated to do in a single regexp.
    > You basically have to cover all cases.
    >
    > This might work, but I'm not sure I got all cases right:
    >
    > /foo[^f]*((f[^o]|fo[^o])[^f]*)*f?bar/
    >
    > You can see that using two regexes one after the other (as mentioned
    > crossthread) is a lot easier.
    >
    > SaSW, Willem
    > --
    > Disclaimer: I am in no way responsible for any of the statements
    > made in the above text. For all I know I might be
    > drugged or something..
    > No I'm not paranoid. You all think I'm paranoid, don't you !
    > #EOT


    $s="fooafoosdbar";
    print "OK" if ($s =~ /foo[^f]*((f[^o]|fo[^o])[^f]*)*f?bar/);

    OK

    It won't work unless you prevent backtracking, I think. The initial
    foo in your pattern can match the second one in the string.
    smallpond, Mar 23, 2009
    #6
  7. > >consider this input:
    > >
    > > foo ... foo ... bar
    > >
    > >where `...' doesn't contain the word `foo'. How can I write a regular
    > >expression which matches `foo ... bar' but not `foo ... foo ... bar'?


    > reverse() the text and match non-greedy /^rab...oof/, then reverse() the
    > found match again.


    Thank you. While this is a solution for my concrete example, it
    unfortunately leads to nothing if you want to generalize the [^.] concept.


    Werner
    Werner Lemberg, Mar 24, 2009
    #7
  8. Ben Morrow <> wrote:

    > > foo ... foo ... bar
    > >
    > > where `...' doesn't contain the word `foo'. How can I write a regular
    > > expression which matches `foo ... bar' but not `foo ... foo ... bar'?


    > /foo (?: (?!foo) . )* bar/x


    Aah. I've already thought of a negative lookahead, but I haven't had the
    idea of using `(.)*' to provide a `moving anchor' for it. Thanks for the
    idea.

    > If a suffix of "foo" matches a prefix of "bar" you may end up with false
    > negatives, depending on what you wanted. That is,


    > "foo...fobar" =~ /foo (?: (?!fob) . )* bar/x


    > is false, even though the "fob" you don't want to match is part of the
    > "bar" you do. It is possible to correct this with yet another negative
    > lookahead:


    > /foo (?: (?! fob (?!ar) ) . )* bar/x


    Uuh, a negative lookahead *within* another negative lookahead. How is the
    exactly defined? Is it equivalent to

    (?! fob ) (?! ar )

    ?


    Werner
    Werner Lemberg, Mar 24, 2009
    #8
  9. On 2009-03-23, Werner Lemberg <> wrote:
    > foo ... foo ... bar
    >
    > where `...' doesn't contain the word `foo'. How can I write a regular
    > expression which matches `foo ... bar' but not `foo ... foo ... bar'?


    This does not make sense, since `foo ... bar' is a substring of `foo
    .... foo ... bar'.

    I assume you want to allow the REX to match this substring, but not
    the larger string. Then the simplest solution would be to fasttrack to
    the LATEST occurence of foo which is followed by bar:

    /^ (?> .* (?=foo .* bar) ) (foo .* bar) /x; # add \b where needed

    or just (depending on the needs)

    / ^ .* (foo .* bar) /x;

    If you want to disallow ANY match which contains foo foo bar, then it
    may as simple as

    / ^ (?! .*? foo .* foo .* bar) .*? ( foo .* bar )/x
    or
    / ^ (?! (?> (?> .*? foo) .*? foo) .* bar) .*? ( foo .* bar )/x

    However, the problem becomes much trickier if you prohibit using ^...

    Hope this helps,
    Ilya

    P.S. Of course, with "onion rings" implemented (google for it) there
    would be no problem whatsoever...
    Ilya Zakharevich, Mar 24, 2009
    #9
  10. Werner Lemberg

    Guest

    On Mon, 23 Mar 2009 20:00:29 +0000 (UTC), Werner Lemberg <> wrote:

    >
    >Folks,
    >
    >
    >consider this input:
    >
    > foo ... foo ... bar
    >
    >where `...' doesn't contain the word `foo'. How can I write a regular
    >expression which matches `foo ... bar' but not `foo ... foo ... bar'? Were
    >it a single character as in
    >
    > f ... f ... bar
    >
    >I could write
    >
    > /f[^f]*bar/
    >
    >but how can I do something similar for a word? In other words, I search an
    >extension of the [^.] concept which covers a sequence of characters.
    >
    >I've looked into both the `perlre' and `perlretut' manual pages (of perl
    >5.10.0), but it contains relevant to this problem.
    >
    >
    > Werner


    I've always thought these are good ways.

    -sln

    ---------------------------------------
    use strict;
    use warnings;

    if ( "foo ... foo ... bar ... bar" =~ /(foo (?: . (?! foo) ) * bar)/x )
    {
    print "$1\n";
    }
    ## or

    if ( "foo ... foo ... bar ... bar" =~ /(foo (?: . (?! foo) ) *? bar)/x )
    {
    print "$1\n";
    }

    __END__

    foo ... bar ... bar
    foo ... bar
    , Mar 24, 2009
    #10
  11. On 2009-03-23, Werner Lemberg <> wrote:
    *SKIP*
    > Thanks for the answers. I'm really surprised that there are so many regex
    > extensions in Perl but not a single one which covers this. Is this
    > difficult to handle in a regex machine, or is there no need normally for
    > it?


    Watch what you say. Those aren't 'regex extensions in Perl'. Those are
    Perl regex (or 'perlre', for short)

    *CUT*

    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
    Eric Pozharski, Mar 24, 2009
    #11
  12. regex that ever fail (was: regex question: extended [^...] concept?)

    On 2009-03-24, Ben Morrow <> wrote:
    *SKIP*
    > as being something like
    >
    > (?{ if (/pattern/) { fail } })
    >
    > where 'fail' is a hypothetical builtin that causes the surrounding match
    > to fail. (Of course $_ would have to have the appropriate value, as
    > well, which it doesn't.) OTOH, it may not... :)


    Once I've considered aproach of intentionally failing match within
    perlre itself (prepropcessing wasn't an option).

    I would come with something like this:

    perl -Mstrict -wle '
    my $x = qr[(??{ substr($`, -1, 1) eq q|f| ? qr/(?<=f)/ : qr/(?!.|$)/
    })];
    foreach ( q||, qw| x f fx xf | ) {
    print m{$x};
    print qq|<$&>|;
    print qq|$_\n|; };
    print q|FIN|'

    Use of uninitialized value $& in concatenation (.) or string at -e line 5.
    <>



    Use of uninitialized value $& in concatenation (.) or string at -e line 5.
    <>
    x

    1
    <>
    f

    1
    <>
    fx

    1
    <>
    xf

    FIN

    But isn't that a way havy (I'm not even about C<$`>)?

    *CUT*

    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
    Eric Pozharski, Mar 24, 2009
    #12
  13. Ilya Zakharevich <> wrote:

    > Hope this helps,


    Thanks a lot to all the posters who presented further ideas and comments.

    > P.S. Of course, with "onion rings" implemented (google for it) there
    > would be no problem whatsoever...


    Yeah. I've looked at

    http://dev.perl.org/perl6/rfc/198.html

    and this is exactly what I would like to have. In the same document there
    is also

    (?*{code})

    which I would like to see too.


    Werner
    Werner Lemberg, Mar 24, 2009
    #13
  14. Werner Lemberg

    Guest

    On Tue, 24 Mar 2009 21:41:10 +0000 (UTC), Werner Lemberg <> wrote:

    >Ilya Zakharevich <> wrote:
    >
    >> Hope this helps,

    >
    >Thanks a lot to all the posters who presented further ideas and comments.
    >
    >> P.S. Of course, with "onion rings" implemented (google for it) there
    >> would be no problem whatsoever...

    >
    >Yeah. I've looked at
    >
    > http://dev.perl.org/perl6/rfc/198.html
    >
    >and this is exactly what I would like to have. In the same document there
    >is also
    >
    > (?*{code})
    >
    >which I would like to see too.
    >
    >
    > Werner


    Note that (?{ code }) always passes, it has no real effect on the regex except you
    can tweek special variables, pos(), etc..

    Hopefully, it won't come down to all that.

    -sln
    , Mar 25, 2009
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Karl Seguin [MVP]

    Re: asp.net c# concept question

    Karl Seguin [MVP], Mar 16, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    603
    Karl Seguin [MVP]
    Mar 16, 2006
  2. Rhino

    Concept Question

    Rhino, Jan 13, 2005, in forum: XML
    Replies:
    1
    Views:
    351
    Henri Sivonen
    Jan 13, 2005
  3. Sergey
    Replies:
    6
    Views:
    2,903
    Victor Bazarov
    Apr 1, 2005
  4. jiing

    concept question

    jiing, Apr 21, 2005, in forum: C++
    Replies:
    10
    Views:
    515
    Default User
    Apr 21, 2005
  5. Replies:
    3
    Views:
    730
    Reedick, Andrew
    Jul 1, 2008
Loading...

Share This Page