Inverted RegEx on list of numbers

Discussion in 'Perl Misc' started by Stephan Mann, Feb 13, 2008.

  1. Stephan Mann

    Stephan Mann Guest

    Hi!

    I'm relatively new to Perl and I'm not a RegEx guru either. I'd like to
    understand, why the following code does not work as I expect.
    Obviously, I'm oblivious to some detail of how the RegEx engine works.

    my @foo = ('one', 'two', 'three');
    print join " ", grep { /[^(two)]/ } @foo, "\n";

    my @bar = ('123:44', '123:45', '123:46');
    print join " ", grep { /[^(123:45)]/ } @bar, "\n";
    print join " ", grep { !/123:45/ } @bar, "\n";

    Output:

    one three
    123:46
    123:44 123:46

    I'm completely lost as to why the second grep doesn't work like the first
    one. The colon doesn't seem to be the problem, since the behavior stays
    the same without it. How or why are numbers handled differently?!
    The third grep works, but it forces me to handle this in a separate
    RegEx which I can't extend if a want to do more with one RegEx.

    tia, stephan


    PS: This is my first post to a non-test group with slrn. Please let me
    know if there is anything wrong with my post.
     
    Stephan Mann, Feb 13, 2008
    #1
    1. Advertising

  2. Stephan Mann schrieb:
    > Hi!
    >
    > I'm relatively new to Perl and I'm not a RegEx guru either. I'd like to
    > understand, why the following code does not work as I expect.
    > Obviously, I'm oblivious to some detail of how the RegEx engine works.
    >
    > my @foo = ('one', 'two', 'three');
    > print join " ", grep { /[^(two)]/ } @foo, "\n";
    >
    > my @bar = ('123:44', '123:45', '123:46');
    > print join " ", grep { /[^(123:45)]/ } @bar, "\n";
    > print join " ", grep { !/123:45/ } @bar, "\n";


    [^(two)] is a character class, meaning a character which is neither '('
    nor ')', nor 't', nor 'w', nor 'o', nor ')'.

    "one" matches by /[^(two)]/, because there is a character 'n', which is
    matched by [^(two)]. The same holds for 'h' in "three".

    On the other hand, there is no character in "123:44" which is not an
    '1', not an '2', and so on.
     
    Damian Lukowski, Feb 13, 2008
    #2
    1. Advertising

  3. Stephan Mann

    Achim Peters Guest

    Stephan Mann schrieb:

    > I'm relatively new to Perl and I'm not a RegEx guru either. I'd like to
    > understand, why the following code does not work as I expect.
    > Obviously, I'm oblivious to some detail of how the RegEx engine works.


    You have a mere RE "problem". There is nothing perl specific in your
    observations. I recommend reading more about RegExps. (Since your
    problem is not perl related, any documentation about REs will do)

    >
    > my @foo = ('one', 'two', 'three');
    > print join " ", grep { /[^(two)]/ } @foo, "\n";
    >
    > my @bar = ('123:44', '123:45', '123:46');
    > print join " ", grep { /[^(123:45)]/ } @bar, "\n";
    > print join " ", grep { !/123:45/ } @bar, "\n";
    >
    > Output:
    >
    > one three
    > 123:46
    > 123:44 123:46
    >
    > I'm completely lost as to why the second grep doesn't work like the first
    > one.


    The second grep _does_ work like the first one. Both just not the way
    you expect them to work. ;-)

    > How or why are numbers handled differently?!


    Numbers per se are _not_ handled differently.

    "[]" matches a single (one!) character, no matter how many characters
    you list in between the "[" and the "]". A character of the expression
    to be "grepped" matches the "[]" if and only if it is listed within the
    "[]" (ranges and classes of matching characters are possible), given the
    first character after the "[" is not a "^", or respectively matches if
    and only if it is *not* listed, given the first character after the "["
    is a "^".

    So, it doesn't matter, whether you write
    grep { /[^(two)]/ }
    or
    grep { /[^otw()]/ }

    In both cases the expression will match _a_ _single_ _character_ which
    is neither "o" nor "t" nor "w" nor "(" nor ")". Parentheses have no
    special meaning within [].

    'one' has a character which fulfills this expression: "n" ('one' has
    even one more character, which does, but since your regex consisted only
    of the [] the list item is already matched by the one matching character.
    'two' does not have any such character (one that is not a "t" nor a "w"
    nor an "o" nor a "(" nor a ")"
    'three' has the "h" --> match (and furthermore the "r" and the "e".

    I hope, you see by now, why
    print join " ", grep { /[^(123:45)]/ } @bar, "\n";
    worked the way it worked.

    Bye
    Achim
     
    Achim Peters, Feb 13, 2008
    #3
  4. Stephan Mann

    Stephan Mann Guest

    On 2008-02-13, Damian Lukowski <-aachen.de> wrote:
    > Stephan Mann schrieb:
    >> my @foo = ('one', 'two', 'three');
    >> print join " ", grep { /[^(two)]/ } @foo, "\n";
    >>
    >> my @bar = ('123:44', '123:45', '123:46');
    >> print join " ", grep { /[^(123:45)]/ } @bar, "\n";
    >> print join " ", grep { !/123:45/ } @bar, "\n";

    >
    > [^(two)] is a character class, meaning a character which is neither '('
    > nor ')', nor 't', nor 'w', nor 'o', nor ')'.
    >
    > "one" matches by /[^(two)]/, because there is a character 'n', which is
    > matched by [^(two)]. The same holds for 'h' in "three".
    >
    > On the other hand, there is no character in "123:44" which is not an
    > '1', not an '2', and so on.


    Thank you very much, Damian. I was under the impression that the ()
    inside the character class would be recognized and concatenate the
    characters to a string (If I ever find this website again... Grr!).

    After yet another hour of reading it became clear to me that there is no
    way to write RegEx like "Match anything but this string" other than with
    look-ahead/-behind. So my solution for now is

    print join(" ", grep { /^(?!123:45)/ } @bar), "\n";

    Hopefully this now works as intended, not only by accident.

    greetings, stephan
     
    Stephan Mann, Feb 13, 2008
    #4
  5. Stephan Mann

    Stephan Mann Guest

    On 2008-02-13, Achim Peters <> wrote:
    > Stephan Mann schrieb:
    >
    >> I'm relatively new to Perl and I'm not a RegEx guru either. I'd like to
    >> understand, why the following code does not work as I expect.
    >> Obviously, I'm oblivious to some detail of how the RegEx engine works.

    >
    > You have a mere RE "problem". There is nothing perl specific in your
    > observations. I recommend reading more about RegExps. (Since your
    > problem is not perl related, any documentation about REs will do)
    >
    >> print join " ", grep { !/123:45/ } @bar, "\n";


    The !/../ solution is Perl specific, isn't it? But of course you are
    right, I got the RegEx completely wrong. Thank you and Abigail for the
    detailed explanation.

    greetings, stephan
     
    Stephan Mann, Feb 13, 2008
    #5
  6. Stephan Mann wrote:
    > On 2008-02-13, Damian Lukowski <-aachen.de> wrote:
    >> Stephan Mann schrieb:
    >>> my @foo = ('one', 'two', 'three');
    >>> print join " ", grep { /[^(two)]/ } @foo, "\n";
    >>>
    >>> my @bar = ('123:44', '123:45', '123:46');
    >>> print join " ", grep { /[^(123:45)]/ } @bar, "\n";
    >>> print join " ", grep { !/123:45/ } @bar, "\n";

    >> [^(two)] is a character class, meaning a character which is neither '('
    >> nor ')', nor 't', nor 'w', nor 'o', nor ')'.
    >>
    >> "one" matches by /[^(two)]/, because there is a character 'n', which is
    >> matched by [^(two)]. The same holds for 'h' in "three".
    >>
    >> On the other hand, there is no character in "123:44" which is not an
    >> '1', not an '2', and so on.

    >
    > Thank you very much, Damian. I was under the impression that the ()
    > inside the character class would be recognized and concatenate the
    > characters to a string (If I ever find this website again... Grr!).
    >
    > After yet another hour of reading it became clear to me that there is no
    > way to write RegEx like "Match anything but this string" other than with
    > look-ahead/-behind. So my solution for now is
    >
    > print join(" ", grep { /^(?!123:45)/ } @bar), "\n";
    >
    > Hopefully this now works as intended, not only by accident.


    It looks like from your example you could also do:

    print join( " ", grep !/^123:45$/, @bar ), "\n";

    Or:

    print join( " ", grep $_ ne '123:45', @bar ), "\n";



    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
     
    John W. Krahn, Feb 13, 2008
    #6
  7. Stephan Mann schrieb:
    > After yet another hour of reading it became clear to me that there is no
    > way to write RegEx like "Match anything but this string" other than with
    > look-ahead/-behind.


    In case it is an "academic" regular expression (as defined in computer
    science) with no backreferences and zero-width assertions, there is
    always another regex which matches exactly the opposite, as there is an
    algorithm for giving you the complementary regex. If I'm not mistaken
    the complementary regex to /two/ is something like
    /^(?:[^t]|t+[^tw]|(?:t+w)+([^to]|t+[^tw]))*(?:t+w)+o/.

    Damian
     
    Damian Lukowski, Feb 13, 2008
    #7
  8. Damian Lukowski schrieb:
    > If I'm not mistaken
    > the complementary regex to /two/ is something like
    > /^(?:[^t]|t+[^tw]|(?:t+w)+([^to]|t+[^tw]))*(?:t+w)+o/.
    >
    > Damian


    Well, I am mistaken here, as I forgot an important step. :)
    I won't correct it, because it will be more complicated as it is already. :)
     
    Damian Lukowski, Feb 13, 2008
    #8
  9. Stephan Mann

    Stephan Mann Guest

    On 2008-02-13, Damian Lukowski wrote:
    > Stephan Mann schrieb:
    >> After yet another hour of reading it became clear to me that there is no
    >> way to write RegEx like "Match anything but this string" other than with
    >> look-ahead/-behind.

    >
    > In case it is an "academic" regular expression (as defined in computer
    > science) with no backreferences and zero-width assertions, there is
    > always another regex which matches exactly the opposite, as there is an
    > algorithm for giving you the complementary regex. If I'm not mistaken
    > the complementary regex to /two/ is something like
    > /^(?:[^t]|t+[^tw]|(?:t+w)+([^to]|t+[^tw]))*(?:t+w)+o/.


    Let me rephrase that: There is no way to write _a readable_ RegEx... ;-)

    Actually, I thought about this (although I would have failed writing it
    down) but this simply isn't applicable for strings longer than five
    characters and also doesn't work if your search string is variable.

    Thankfully, my problem was very "practical" :)

    thanks again for your efforts,
    stephan
     
    Stephan Mann, Feb 13, 2008
    #9
  10. Stephan Mann

    David Combs Guest

    In article <>,
    Damian Lukowski <-aachen.de> wrote:
    >Damian Lukowski schrieb:
    >> If I'm not mistaken
    >> the complementary regex to /two/ is something like
    >> /^(?:[^t]|t+[^tw]|(?:t+w)+([^to]|t+[^tw]))*(?:t+w)+o/.
    >>
    >> Damian

    >
    >Well, I am mistaken here, as I forgot an important step. :)
    >I won't correct it, because it will be more complicated as it is already. :)


    C'mon, give it a shot.

    After all, you did suck us into this. :=)

    And as something to add to a regexp-tutorial, describe
    how your existing line was supposed to work and accomplish --
    and then what was wrong with it.

    As you attach your fix, sequential thinking to it, with
    the pros and cons you went through on the way there.


    Of course no one would have time to do such a thing,
    but maybe the idea of that kind of annotation will
    stick in someone's head, and actually be done from
    time to time.


    David
     
    David Combs, Mar 8, 2008
    #10
  11. David Combs schrieb:
    > C'mon, give it a shot.
    >
    > After all, you did suck us into this. :=)
    >
    > And as something to add to a regexp-tutorial, describe
    > how your existing line was supposed to work and accomplish --
    > and then what was wrong with it.



    Well, okay.

    The rough approach to invert a regular expression is this:

    - Convert the regex to an equivalent epsilon-NFA.
    - Eliminate epsilon transitions.
    - Convert NFA to equivalent DFA.
    - Invert final state(s) to nonfinal state(s) and vice versa.
    - Convert the inverted DFA back into a regular expression.

    Above, I forgot the fourth step and converted a DFA into a regular
    expression without inverting any states. Thus, the former /two/ should
    be equivalent to the bloated one.

    Damian
     
    Damian Lukowski, Mar 8, 2008
    #11
  12. Stephan Mann

    paul Guest

    On Mar 8, 7:42 pm, Damian Lukowski <-aachen.de> wrote:
    > Well, okay.
    >
    > The rough approach to invert a regular expression is this:
    >
    > - Convert the regex to an equivalent epsilon-NFA.
    > - Eliminate epsilon transitions.
    > - Convert NFA to equivalent DFA.
    > - Invert final state(s) to nonfinal state(s) and vice versa.
    > - Convert the inverted DFA back into a regular expression.
    >
    > Above, I forgot the fourth step and converted a DFA into a regular
    > expression without inverting any states. Thus, the former /two/ should
    > be equivalent to the bloated one.
    >
    > Damian


    Hello.
    Is there any library or tool which can do it for us.
     
    paul, Mar 13, 2008
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Manfred Balik

    Inverted Clock in ACEX1K

    Manfred Balik, Aug 29, 2003, in forum: VHDL
    Replies:
    1
    Views:
    1,300
    Marc Guardiani
    Aug 30, 2003
  2. Jon Maz
    Replies:
    3
    Views:
    1,652
    Jon Maz
    Feb 13, 2004
  3. Sakthi
    Replies:
    0
    Views:
    361
    Sakthi
    Sep 15, 2004
  4. Sakthi
    Replies:
    0
    Views:
    423
    Sakthi
    Sep 15, 2004
  5. AviraM
    Replies:
    2
    Views:
    6,384
    Manish Pandit
    Sep 28, 2006
Loading...

Share This Page