Regular expression to match only strings NOT containing particular words

Discussion in 'Perl Misc' started by Dylan Nicholson, Oct 19, 2007.

  1. I can write a regular expression that will only match strings that are
    NOT the word apple:

    ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$

    But is there a neater way, and how would I do it to match strings that
    are NOT the word apple OR banana? Then what would be needed to match
    only strings that do not CONTAIN the word "apple" or "banana" or
    "cherry"?

    I'd love it if the following worked:

    ^[^(apple)(banana)(cherry)]*$

    But it appears the parantheses are ignored, as

    ^[(apple)(banana)(cherry)]*$

    simply matches any string that consists entire of the characters
    a,b,c,e,h,l,n,r,p & y.
    Dylan Nicholson, Oct 19, 2007
    #1
    1. Advertising

  2. Dylan Nicholson wrote:
    > I can write a regular expression that will only match strings that are
    > NOT the word apple:
    >
    > ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
    >
    > But is there a neater way, and how would I do it to match strings that
    > are NOT the word apple OR banana? Then what would be needed to match
    > only strings that do not CONTAIN the word "apple" or "banana" or
    > "cherry"?


    !(/apple/ or /banana/ or /cherry/)

    jue
    Jürgen Exner, Oct 19, 2007
    #2
    1. Advertising

  3. 2007-10-18, 22:00(-07), Dylan Nicholson:
    > I can write a regular expression that will only match strings that are
    > NOT the word apple:
    >
    > ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
    >
    > But is there a neater way, and how would I do it to match strings that
    > are NOT the word apple OR banana? Then what would be needed to match
    > only strings that do not CONTAIN the word "apple" or "banana" or
    > "cherry"?
    >
    > I'd love it if the following worked:
    >
    > ^[^(apple)(banana)(cherry)]*$
    >
    > But it appears the parantheses are ignored, as
    >
    > ^[(apple)(banana)(cherry)]*$
    >
    > simply matches any string that consists entire of the characters
    > a,b,c,e,h,l,n,r,p & y.


    With perl regexps:

    perl -ne 'print if /^(?:(?!apple|banana).)*$/'
    or probably better:
    perl -ne 'print if /^(?!.*(?:apple|banana))/'

    But then, why not

    perl -ne 'print if !/apple|banana/'

    Note that vim's regexps have an equivalent negative look-ahead
    operator.

    --
    Stéphane
    Stephane CHAZELAS, Oct 19, 2007
    #3
  4. On Thu, 18 Oct 2007 22:00:28 -0700, Dylan Nicholson
    <> wrote:

    >I can write a regular expression that will only match strings that are
    >NOT the word apple:
    >
    >^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
    >
    >But is there a neater way, and how would I do it to match strings that
    >are NOT the word apple OR banana? Then what would be needed to match
    >only strings that do not CONTAIN the word "apple" or "banana" or
    >"cherry"?
    >
    >I'd love it if the following worked:
    >
    >^[^(apple)(banana)(cherry)]*$
    >
    >But it appears the parantheses are ignored, as
    >
    >^[(apple)(banana)(cherry)]*$
    >
    >simply matches any string that consists entire of the characters
    >a,b,c,e,h,l,n,r,p & y.


    A simple way is to write the regex to match apple or banana or cherry,
    do the match and then check the Success property of the match object.

    Execute the following mini program

    using System;
    using System.Collections.Generic;
    using System.Text.RegularExpressions;

    namespace ConsoleApplication1
    {
    class Program
    {
    static void Main(string[] args)
    {
    Regex r = new Regex(".*apple|banana|cherry.*");
    string[] strings =
    "apple,banana,cherry,applebanana,applebananacherry,fishapple,chips,chip
    and apple,apple pie".Split(',');
    foreach (string s in strings)
    {
    Console.WriteLine("{0} Match? {1}", s,
    r.Match(s).Success);
    }
    Console.ReadLine();
    }
    }
    }

    You should get this:

    apple Match? True
    banana Match? True
    cherry Match? True
    applebanana Match? True
    applebananacherry Match? True
    fishapple Match? True
    chips Match? False
    chip and apple Match? True
    apple pie Match? True

    --
    http://bytes.thinkersroom.com
    Rad [Visual C# MVP], Oct 19, 2007
    #4
  5. On Thu, 18 Oct 2007 22:00:28 -0700, Dylan Nicholson
    <> wrote:

    >But is there a neater way, and how would I do it to match strings that
    >are NOT the word apple OR banana? Then what would be needed to match
    >only strings that do not CONTAIN the word "apple" or "banana" or
    >"cherry"?


    The general answer is that you should use separate regexen and logical
    operators, or an explicit !~ but the subject of negating regexen is
    discussed to some depth in the following thread @ PM:

    http://perlmonks.org/?node_id=588315


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
    Michele Dondi, Oct 19, 2007
    #5
  6. Dylan Nicholson <> wrote in
    news::

    [
    newsgroup list trimmed, follow-ups set
    There is no reason to cross-post to both c.l.p.misc and m.p.d.l.csharp
    ]

    > I can write a regular expression that will only match strings that are
    > NOT the word apple:
    >
    > ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
    >
    > But is there a neater way, and how would I do it to match strings that
    > are NOT the word apple OR banana?



    When you say "are not" rather than does not contain, it means you should
    not be using regular expressions at all.


    unless ( $s eq 'apple' or $s eq 'banana' or $s eq 'cherry' ) {

    ....

    }


    > Then what would be needed to match only strings that do not
    > CONTAIN the word "apple" or "banana" or "cherry"?


    unless (
    index( $s, 'apple' ) > -1
    index( $s, 'banana' ) > -1
    index( $s, 'cherry' ) > -1
    ) {

    ....

    }

    If you have a long list of words, you could use


    #!/usr/bin/perl

    use strict;
    use warnings;

    use List::MoreUtils qw( first_index );

    my $text = <<EO_TEXT;
    Sed ut perspiciatis unde omnis iste natus error
    sit voluptatem accusantium doloremque laudantium,
    totam rem aperiam, eaque ipsa quae ab illo
    inventore veritatis et quasi architecto beatae
    vitae dicta sunt explicabo. Nemo enim ipsam
    voluptatem quia voluptas sit aspernatur aut odit
    aut fugit, sed quia consequuntur magni dolores eos
    qui ratione voluptatem sequi nesciunt. Neque porro
    quisquam est, qui dolorem ipsum quia dolor sit
    amet, consectetur, adipisci velit, sed quia non
    numquam eius modi tempora incidunt ut labore et
    dolore magnam aliquam quaerat voluptatem. Ut enim
    ad minima veniam, quis nostrum exercitationem
    ullam corporis suscipit laboriosam, nisi ut
    aliquid ex ea commodi consequatur? Quis autem vel
    eum iure reprehenderit qui in ea voluptate velit
    esse quam nihil molestiae consequatur, vel illum
    qui dolorem eum fugiat quo voluptas nulla pariatur
    EO_TEXT

    my @wordlist = qw( hello explicabo reprehenderit random );

    unless ( -1 == first_index { index( $text, $_ ) > -1 } @wordlist ) {
    print "One of the words in the word list appears in the text.\n";
    }

    __END__





    --
    A. Sinan Unur <>
    (remove .invalid and reverse each component for email address)
    clpmisc guidelines: <URL:http://www.augustmail.com/~tadmc/clpmisc.shtml>
    A. Sinan Unur, Oct 19, 2007
    #6
  7. "A. Sinan Unur" <> wrote in
    news:Xns99CE6A93E8341asu1cornelledu@127.0.0.1:


    >> Then what would be needed to match only strings that do not
    >> CONTAIN the word "apple" or "banana" or "cherry"?

    >
    > unless (
    > index( $s, 'apple' ) > -1
    > index( $s, 'banana' ) > -1
    > index( $s, 'cherry' ) > -1
    > ) {


    Oooops.

    unless (
    index( $s, 'apple' ) > -1
    or index( $s, 'banana' ) > -1
    or index( $s, 'cherry' ) > -1
    ) {

    Sinan

    --
    A. Sinan Unur <>
    (remove .invalid and reverse each component for email address)
    clpmisc guidelines: <URL:http://www.augustmail.com/~tadmc/clpmisc.shtml>
    A. Sinan Unur, Oct 19, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    729
  2. Replies:
    0
    Views:
    364
  3. MENTAT
    Replies:
    5
    Views:
    122
    John W. Krahn
    Feb 15, 2005
  4. Replies:
    7
    Views:
    129
    Peter J. Holzer
    Mar 25, 2006
  5. Raj
    Replies:
    5
    Views:
    100
    RedGrittyBrick
    Dec 13, 2007
Loading...

Share This Page