regular expression negate a word (not character)

Discussion in 'Ruby' started by Summercool, Jan 26, 2008.

  1. Summercool

    Summercool Guest

    somebody who is a regular expression guru... how do you negate a word
    and grep for all words that is

    tire

    but not

    snow tire

    or

    snowtire

    so for example, it will grep for

    winter tire
    tire
    retire
    tired

    but will not grep for

    snow tire
    snow tire
    some snowtires

    need to do it in one regular expression
     
    Summercool, Jan 26, 2008
    #1
    1. Advertising

  2. Summercool

    Summercool Guest

    On Jan 25, 5:16 pm, Summercool <> wrote:
    > somebody who is a regular expression guru... how do you negate a word
    > and grep for all words that is
    >
    > tire
    >
    > but not
    >
    > snow tire
    >
    > or
    >
    > snowtire


    i could think of something like

    /[^s][^n][^o][^w]\s*tire/i

    but what if it is not snow but some 20 character-word, then do we need
    to do it 20 times to negate it? any shorter way?
     
    Summercool, Jan 26, 2008
    #2
    1. Advertising

  3. SpringFlowers AutumnMoon wrote:
    > On Jan 25, 5:16 pm, Summercool <> wrote:
    >>
    >> snowtire

    >
    > i could think of something like
    >
    > /[^s][^n][^o][^w]\s*tire/i
    >
    > but what if it is not snow but some 20 character-word, then do we need
    > to do it 20 times to negate it? any shorter way?


    I took a long look at this and I came up with a number of different
    methods, including an idea like the one you have above. If you have a
    set number of bad/undesirable words then everything falls apart. I
    tried negative look behinds but those don't work well with 0 or more
    spaces because look-behinds have to have a fixed length. I really don't
    think that this could be done elegantly with a single regular expression
    if you have multiple bad/undesirable words. However, if you split this
    into two regular expressions then it becomes rather straightforward.

    I really have spent the last 20 minutes trying out different
    possibilities with a single regular expressions but it just doesn't seem
    worth the difficulty =(

    May I ask why there is the requirement for a single regular expression?

    - Joe P
    --
    Posted via http://www.ruby-forum.com/.
     
    Joseph Pecoraro, Jan 26, 2008
    #3
  4. On Jan 25, 2008 6:19 PM, Summercool <> wrote:
    > On Jan 25, 5:16 pm, Summercool <> wrote:
    > > somebody who is a regular expression guru... how do you negate a word
    > > and grep for all words that is
    > >
    > > tire
    > >
    > > but not
    > >
    > > snow tire
    > >
    > > or
    > >
    > > snowtire

    >
    > i could think of something like
    >
    > /[^s][^n][^o][^w]\s*tire/i
    >
    > but what if it is not snow but some 20 character-word, then do we need
    > to do it 20 times to negate it? any shorter way?


    (?!snow)(\S{4})\s*(tire)|^\S{0,3}\s*(tire)

    I'm not thrilled with that, but without look-behind, it's rough to do
    what you're asking.

    Shameless pluggery: I used RegexpBench to do the experimentation to
    find your answer.

    Judson
    --
    Your subnet is currently 169.254.0.0/16. You are likely to be eaten by a grue.
     
    Judson Lester, Jan 26, 2008
    #4
  5. Summercool

    Summercool Guest

    On Jan 25, 6:35 pm, Joseph Pecoraro <> wrote:
    >
    > I really have spent the last 20 minutes trying out different
    > possibilities with a single regular expressions but it just doesn't seem
    > worth the difficulty =(
    >
    > May I ask why there is the requirement for a single regular expression?
    >
    > - Joe P


    thanks for your post. a reason is that some text editor lets users
    search all files using a regular expression... another reason is
    that... if 2 lines are used to test... then what if that line actually
    has tire and snowtire... then it may negate the whole line as a
    result, even though we want to grep it due to the first word "tire".
     
    Summercool, Jan 26, 2008
    #5
  6. SpringFlowers AutumnMoon wrote:
    >> May I ask why there is the requirement for a single regular expression?

    >
    > thanks for your post. a reason is that some text editor lets users
    > search all files using a regular expression... another reason is
    > that... if 2 lines are used to test... then what if that line actually
    > has tire and snowtire... then it may negate the whole line as a
    > result, even though we want to grep it due to the first word "tire".


    This is rather interesting to me. I recently (Dec-Jan) wrote a little
    find/replace Ruby script that can deal with multiple files. I call the
    utility rr.

    What you're suggesting is a pretty cool idea and opens a number of
    possible improvements that I did not think about. I can extend rr to
    take multiple regular expressions, and allow the user to say yes match
    this regex and No do not match this regular expression. I could also
    simply add an option to print out only the files where the Regular
    Expressions has a match, not performing the find/replace.

    I will have to think this through, especially this Sunday when I have
    more time.

    I am sorry that this doesn't help you with your search for a single
    regular expressions solution but I want to repeat that this seems so
    much easier using two regular expressions that I think developing such a
    utility would be worthwhile. I am really looking forward to
    implementing these new ideas. For that I thank you!

    I'm a rather intermediate Ruby programmer but if anyone would like to
    check out rr they can at my blog. Here is a link to the most recent
    article:
    http://bogojoker.com/weblog/2008/01/01/rr-11-in-place-edits-and-multiple-files/
    --
    Posted via http://www.ruby-forum.com/.
     
    Joseph Pecoraro, Jan 26, 2008
    #6
  7. Mark Tolonen, those were the exact Ruby negative look-behinds that I
    used. Its good to see that we had the same idea!
    --
    Posted via http://www.ruby-forum.com/.
     
    Joseph Pecoraro, Jan 26, 2008
    #7
  8. I just wrote up a quick script to do what I was thinking. I decided to
    make a different utility only because of the complications that would
    arise with tons of switches on the command line if I were to add it to a
    find/replace utility. (The user would have to say which regex they
    wanted for the actual replacement, and other inherent problems... moving
    on)

    So without further ado, here is my example
    ------------------------------
    joe[~/code/script]$ cat > input
    winter tire
    tire
    retire
    tired
    snow tire
    snow tire
    some snowtires

    joe[~/code/script]$ grepall -2 tire --neg snow input
    input [1]: winter tire
    input [2]: tire
    input [3]: retire
    input [4]: tired

    joe[~/code/script]$ grepall
    usage: grepall [-#] ( [-n] regex ) [filenames]
    # - the number of regular expressions, defaults to 1
    regex - regular expessions to be checked on the line
    filenames - names of the input files to be parsed, if blank uses STDIN

    options:
    --neg or -n do not match this regular expression

    special note:
    When using bash, if you want backslashes in the replace portion make
    sure
    to use the multiple argument usage with single quotes for the
    replacement.
    ------------------------------------------------------

    The utility is hopefully easily to understand, although the usage is
    tough to present:
    - line by line processing
    - in the above example the -2 says there will be two regular
    expressions
    - the first is /tire/ and that needs to match
    - the second is /snow/ and that is Negated because of the --neg (or
    just -n) option
    - the last argument is the filename

    The output needs to be tweaked, maybe so its more like grep. Right now
    it allows for multiple files so it prints the filename, [line number],
    and the line where there was a full match for all the regular
    expressions as correctly matched (negated where necessary). Obviously
    this is very simple at the moment and it doesn't cover the specific
    situation you mentioned where there was the word tire and snowtire on
    the same line.

    However if that is an issue you can:
    - find and replace all words SNOW with SPECIAL_STRING in all files
    - do what you have to do...
    - turn all SPECIAL_STRINGs back into SNOW in all files

    That can be done rather easily. You will have lost the case sensitivity
    in the word SNOW, but you can get around that by making your
    SPECIAL_STRING something like XsXnXoXwX based on the original case
    values of snow. I hope that made sense.

    Well I better get to bed, you made my night interesting!
    --
    Posted via http://www.ruby-forum.com/.
     
    Joseph Pecoraro, Jan 26, 2008
    #8
  9. Summercool

    Summercool Guest

    to add to the test cases, the regular expression must be able to grep


    snowbird tire
    tired on a snow day
    snow tire and regular tire
     
    Summercool, Jan 26, 2008
    #9
  10. Summercool

    Guest

    Summercool:
    > to add to the test cases, the regular expression must be able to grep
    > snow tire and regular tire


    I presume there only the second tire has to be found.

    This is my first try:

    text = """
    tire
    word tire word
    word retire word
    word tired word
    snowbird tire word
    tired on a snow day word
    snow tire and regular tire word
    word snow tire word
    word snow tire word
    word some snowtires word
    """

    import re

    def finder(text):
    patt = re.compile( r"\b (\w*) \s* (tire)", re.VERBOSE)
    for mo in patt.finditer(text):
    if not mo.group(1).endswith("snow"):
    yield mo.start(2)

    for end in finder(text):
    print end

    The (lazy) output is the starting point of the "tire" that match:


    1
    11
    28
    43
    63
    73
    120

    Bye,
    bearophile
     
    , Jan 26, 2008
    #10
  11. Summercool

    Paddy Guest

    On Jan 26, 1:16 am, Summercool <> wrote:
    > somebody who is a regular expression guru... how do you negate a word
    > and grep for all words that is
    >
    > tire
    >
    > but not
    >
    > snow tire
    >
    > or
    >
    > snowtire
    >
    > so for example, it will grep for
    >
    > winter tire
    > tire
    > retire
    > tired
    >
    > but will not grep for
    >
    > snow tire
    > snow tire
    > some snowtires
    >
    > need to do it in one regular expression


    Try the answer here:
    http://mail.python.org/pipermail/tutor/2003-August/024902.html
     
    Paddy, Jan 26, 2008
    #11
  12. Summercool

    Guest

    , Jan 26, 2008
    #12
  13. [A complimentary Cc of this posting was sent to
    Summercool
    <>], who wrote in article <>:
    > so for example, it will grep for
    >
    > winter tire
    > tire
    > retire
    > tired
    >
    > but will not grep for
    >
    > snow tire
    > snow tire
    > some snowtires


    This does not describe the problem completely. What about

    thisnow tire
    snow; tire

    etc? Anyway, one of the obvious modifications of

    (^ | \b(?!snow) \w+ ) \W* tire

    should work.

    Hope this helps,
    Ilya
     
    Ilya Zakharevich, Jan 26, 2008
    #13
  14. Summercool

    Greg Bacon Guest

    The code below at least passes your tests.

    Hope it helps,
    Greg

    #! /usr/bin/perl

    use warnings;
    use strict;

    use constant {
    MATCH => 1,
    NO_MATCH => 0,
    };

    my @tests = (
    [ "winter tire", => MATCH ],
    [ "tire", => MATCH ],
    [ "retire", => MATCH ],
    [ "tired", => MATCH ],
    [ "snowbird tire", => MATCH ],
    [ "tired on a snow day", => MATCH ],
    [ "snow tire and regular tire", => MATCH ],
    [ " tire" => MATCH ],
    [ "snow tire" => NO_MATCH ],
    [ "snow tire" => NO_MATCH ],
    [ "some snowtires" => NO_MATCH ],
    );

    my $not_snow_tire = qr/
    ^ \s* tire |
    ([^w\s]|[^o]w|[^n]ow|[^s]now)\s*tire
    /xi;

    my $fail;
    for (@tests) {
    my($str,$want) = @$_;
    my $got = $str =~ /$not_snow_tire/;
    my $pass = !!$want == !!$got;

    print "$str: ", ($pass ? "PASS" : "FAIL"), "\n";

    ++$fail unless $pass;
    }

    print "\n", (!$fail ? "PASS" : "FAIL"), "\n";

    __END__

    --
    ... all these cries of having 'abolished slavery,' of having 'preserved the
    union,' of establishing a 'government by consent,' and of 'maintaining the
    national honor' are all gross, shameless, transparent cheats -- so trans-
    parent that they ought to deceive no one. -- Lysander Spooner, "No Treason"
     
    Greg Bacon, Jan 28, 2008
    #14
  15. Summercool

    Dr.Ruud Guest

    Greg Bacon schreef:

    > #! /usr/bin/perl
    >
    > use warnings;
    > use strict;
    >
    > use constant {
    > MATCH => 1,
    > NO_MATCH => 0,
    > };
    >
    > my @tests = (
    > [ "winter tire", => MATCH ],
    > [ "tire", => MATCH ],
    > [ "retire", => MATCH ],
    > [ "tired", => MATCH ],
    > [ "snowbird tire", => MATCH ],
    > [ "tired on a snow day", => MATCH ],
    > [ "snow tire and regular tire", => MATCH ],
    > [ " tire" => MATCH ],
    > [ "snow tire" => NO_MATCH ],
    > [ "snow tire" => NO_MATCH ],
    > [ "some snowtires" => NO_MATCH ],
    > );
    > [...]


    I negated the test, to make the regex simpler:

    my $snow_tire = qr/
    snow [[:blank:]]* tire (?!.*tire)
    /x;

    my $fail;
    for (@tests) {
    my($str,$want) = @$_;
    my $got = $str !~ /$snow_tire/;
    my $pass = !!$want == !!$got;

    print "$str: ", ($pass ? "PASS" : "FAIL"), "\n";

    ++$fail unless $pass;
    }

    print "\n", (!$fail ? "PASS" : "FAIL"), "\n";

    __END__

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Jan 28, 2008
    #15
  16. Summercool

    Paul McGuire Guest

    On Jan 25, 7:16 pm, Summercool <> wrote:
    > somebody who is a regular expression guru... how do you negate a word
    > and grep for all words that is
    >
    >   tire
    >
    > but not
    >
    >   snow tire
    >
    > or
    >
    >   snowtire
    >


    Too bad pyparsing's not an option. Here's what it would look like:

    data = """
    Match:
    > winter tire
    > tire
    > retire
    > tired


    But not match:
    > snow tire
    > snow tire
    > some snowtires


    snowbird tire
    tired on a snow day
    snow tire and regular tire

    """

    from pyparsing import CaselessLiteral,Literal,line

    # caseless wasn't really necessary but you never know
    # when you'll run into a "Snow tire"
    snow = CaselessLiteral("snow")
    tire = Literal("tire")
    tire.ignore(snow + tire)

    for matchTokens,matchStart,matchEnd in tire.scanString(data):
    print line(matchStart, data)


    Prints:

    > winter tire
    > tire
    > retire
    > tired

    snowbird tire
    tired on a snow day
    snow tire and regular tire

    -- Paul
     
    Paul McGuire, Jan 28, 2008
    #16
  17. Summercool

    Greg Bacon Guest

    In article <>,
    Dr.Ruud <> wrote:

    : I negated the test, to make the regex simpler: [...]

    Yes, your approach is simpler. I assumed from the "need it all
    in one pattern" constraint that the OP is feeding the regular
    expression to some other program that is looking for matches.

    I dunno. Maybe it was the familiar compulsion with Perl to
    attempt to cram everything into a single pattern.

    Greg
    --
    What light is to the eyes -- what air is to the lungs -- what love is to
    the heart, liberty is to the soul of man.
    -- Robert Green Ingersoll
     
    Greg Bacon, Jan 29, 2008
    #17
  18. Since Ruby does not have a negative look *behind* operator, I just used
    the negative look *ahead* in a backwards way, et viola!

    >> puts a.reverse.gsub(/erit(?!.*wons)/, '>>>\&<<<').reverse

    somebody who is a regular expression guru... how do you negate a word
    and grep for all words that is

    <<<tire>>>

    but not

    snow tire

    or

    snowtire

    so for example, it will grep for

    winter <<<tire>>>
    <<<tire>>>
    re<<<tire>>>
    <<<tire>>>d

    but will not grep for

    snow tire
    snow tire
    some snowtires

    need to do it in one regular expression
    => nil
    --
    Posted via http://www.ruby-forum.com/.
     
    Suraj Kurapati, Jan 29, 2008
    #18
  19. I think I have a solution that matches the OP's request

    tests = ["winter tire", "tire", "retire", "tired", "snowbird tire",
    "tired on a snow day", "snow tire and regular tire", " tire", "snow
    tire", "snow tire", "some snowtires"]
    m,nm = tests.partition{ |str| str =~ /\A(?>snow *tire|.)*tire/ }
    p m
    => ["winter tire", "tire", "retire", "tired", "snowbird tire", "tired on
    a snow day", "snow tire and regular tire", " tire"]
    p nm
    => ["snow tire", "snow tire", "some snowtires"]

    How is that?

    Daniel
     
    Daniel DeLorme, Jan 30, 2008
    #19
  20. On 29 Jan 2008, at 23:35, Suraj Kurapati wrote:
    > Since Ruby does not have a negative look *behind* operator, I just
    > used
    > the negative look *ahead* in a backwards way, et viola!
    >
    >>> puts a.reverse.gsub(/erit(?!.*wons)/, '>>>\&<<<').reverse


    Aha! I like your style.

    James Edward Gray's when-all-else-fails-reverse-the-data triumphs again.

    Regards,
    Andy Stewart

    -------
    http://airbladesoftware.com
     
    Andrew Stewart, Jan 30, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Summercool
    Replies:
    13
    Views:
    704
    Dr.Ruud
    Feb 1, 2008
  2. Replies:
    11
    Views:
    281
    Raul Parolari
    Dec 2, 2007
  3. Neil Morris
    Replies:
    1
    Views:
    113
    Lasse Reichstein Nielsen
    Jul 15, 2003
  4. Sherm Pendley

    need to negate regex in middle of expression

    Sherm Pendley, Jun 20, 2005, in forum: Perl Misc
    Replies:
    8
    Views:
    160
    Tad McClellan
    Jun 20, 2005
  5. Summercool
    Replies:
    14
    Views:
    199
    Dr.Ruud
    Feb 1, 2008
Loading...

Share This Page