Simple pattern matching negation

Discussion in 'Perl Misc' started by Bob, Oct 4, 2005.

  1. Bob

    Bob Guest

    I'm a definite newbie at Perl and I need some basic syntax help.

    This works (checking for 4 numeric digits and doing something if
    it's _not- true:

    if ($zip5=~m/[0-9]{5}/ != 1) { print ("invalid zip code");}

    but I know there must be an easier syntax to negate the expression
    $zip5=~m/[0-9]{5}/ within the if statement. I tried a couple of
    things that I thought should work but I am missing something really
    basic in the use of "not" or "!".

    Thanks,
     
    Bob, Oct 4, 2005
    #1
    1. Advertising

  2. On 2005-10-04, Bob <> wrote:
    > This works (checking for 4 numeric digits and doing something if
    > it's _not- true:


    I'm assuming you mean 5 digits.

    > if ($zip5=~m/[0-9]{5}/ != 1) { print ("invalid zip code");}
    >
    > but I know there must be an easier syntax to negate the expression
    > $zip5=~m/[0-9]{5}/ within the if statement. I tried a couple of
    > things that I thought should work but I am missing something really
    > basic in the use of "not" or "!".


    if($zip5 !~ /[0-9]{5}/){
    print "invalid zip code";
    }

    or (depending on your needs)
    print "invalid zip code" unless $zip5 =~ /\d{5}/;
    or:
    print "invalid zip code" if $zip5 !~ /\d{5}/;

    the !~ operator is documented in perldoc perlop, the alternate
    if/unless statement modifiers are documented in perldoc perlsyn (look
    for 'Statement Modifiers').

    [You'll also note that I used \d instead of [0-9], same thing... a
    couple of characters shorter.]
    --
    Todd de Gruyl

    http://www.tdegruyl.com
     
    Todd de Gruyl, Oct 4, 2005
    #2
    1. Advertising

  3. Todd de Gruyl <> wrote in
    news::

    > On 2005-10-04, Bob <> wrote:
    >> This works (checking for 4 numeric digits and doing something if
    >> it's _not- true:

    >
    > I'm assuming you mean 5 digits.
    >
    >> if ($zip5=~m/[0-9]{5}/ != 1) { print ("invalid zip code");}
    >>
    >> but I know there must be an easier syntax to negate the expression
    >> $zip5=~m/[0-9]{5}/ within the if statement. I tried a couple of
    >> things that I thought should work but I am missing something really
    >> basic in the use of "not" or "!".

    >
    > if($zip5 !~ /[0-9]{5}/){
    > print "invalid zip code";
    > }
    >
    > or (depending on your needs)
    > print "invalid zip code" unless $zip5 =~ /\d{5}/;
    > or:
    > print "invalid zip code" if $zip5 !~ /\d{5}/;


    Assuming that the OP wants $zip5 to contain nothing but the 5 digits,
    you should use anchors:

    print "invalid zip code\n" if $zip5 !~ /^\d{5}$/;

    Sinan

    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
     
    A. Sinan Unur, Oct 4, 2005
    #3
  4. Bob

    Scott Bryce Guest

    Bob wrote:

    > but I know there must be an easier syntax to negate the expression
    > $zip5=~m/[0-9]{5}/


    if ($zip5 !~ /\d{5}/)

    But I have a feeling that that isn't what you really want to do.

    use strict;
    use warnings;

    my @zips = qw(12345 1234 ab12345 abcdefg 123456);

    for my $zip5 (@zips)
    {
    if ($zip5 !~ /\d{5}/)
    {
    print "$zip5 does not match (1).\n";
    }
    else
    {
    print "$zip5 does match (1).\n";
    }

    if ((length($zip5) != 5) || ($zip5 =~ /\D/))
    {
    print "$zip5 does not match (2).\n";
    }
    else
    {
    print "$zip5 does match (2).\n";
    }

    if ($zip5 !~ /^\d{5}$/)
    {
    print "$zip5 does not match (3).\n";
    }
    else
    {
    print "$zip5 does match (3).\n";
    }

    print "\n";
    }

    You may also want to account for zip + 4 or postal codes from outside
    the USA.
     
    Scott Bryce, Oct 4, 2005
    #4
  5. Bob <> wrote:

    > This works



    Completely by accident.

    It could stop working the next time you upgrade perl!


    > if ($zip5=~m/[0-9]{5}/ != 1) { print ("invalid zip code");}

    ^^^^
    ^^^^
    The value of a match in scalar context is true or false. Relying
    on it being any particular true value is asking for a bug.


    > but I know there must be an easier syntax to negate the expression
    > $zip5=~m/[0-9]{5}/ within the if statement.



    The basic idiom for validating data is: anchor the front, anchor
    the back, in between put a pattern that accounts for all you
    want to allow:

    if ( $zip5 !~ m/^\d{5}$/ ) { print "invalid zip code"}
    or
    unless ( $zip5 =~ m/^\d{5}$/ ) { print "invalid zip code"}


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Oct 4, 2005
    #5
  6. Bob

    Dr.Ruud Guest

    Todd de Gruyl:

    > [You'll also note that I used \d instead of [0-9], same thing... a
    > couple of characters shorter.]


    \d is not always the same as [0-9]. See \p{IsDigit} in `man perlre`.

    If you want your parser to keep limiting variable names to [A-Za-z0-9_],
    it might be time to stop coding that as \w, because \p{IsWord} can
    contain a lot more characters than [A-Za-z0-9_].

    AFAIK, `man perlre` doesn't explicitly say that \w and \p{IsWord} are
    equal.
    It does say this though: [:^word:] \W \P{IsWord}.

    From perllocale:
    Regular expression checks for safe file names or mail addresses
    using "\w" may be spoofed by an "LC_CTYPE" locale that claims that
    characters such as ">" and "|" are alphanumeric.

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Oct 4, 2005
    #6
  7. Dr.Ruud wrote:
    > Todd de Gruyl:
    >
    >>[You'll also note that I used \d instead of [0-9], same thing... a
    >>couple of characters shorter.]

    >
    > \d is not always the same as [0-9]. See \p{IsDigit} in `man perlre`.


    Where in the man page does it say that "\d is not always the same as [0-9]"?


    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Oct 4, 2005
    #7
  8. Bob

    Dr.Ruud Guest

    John W. Krahn:
    > Dr.Ruud:
    >> Todd de Gruyl:


    >>> [You'll also note that I used \d instead of [0-9], same thing... a
    >>> couple of characters shorter.]

    >>
    >> \d is not always the same as [0-9]. See \p{IsDigit} in `man perlre`.

    >
    > Where in the man page does it say that "\d is not always the same as
    > [0-9]"?


    I just did. Even \d and \p{IsDigit} aren't always the same test.

    Demonstration:

    use warnings;
    use strict;
    use charnames ':full';

    my $text = "\x{00030}"
    . "\x{00660}\x{006F0}"
    . "\x{02460}\x{02474}\x{02488}\x{024F5}"
    . "\x{02673}\x{02680}"
    . "\x{02776}\x{02780}\x{0278A}"
    . "\x{1D7CE}\x{1D7D8}\x{1D7E2}\x{1D7EC}\x{1D7F6}"
    . "\x{E0030}";

    my $n = length($text);

    print '-'x $n, "\n";

    for (my $i=0; $i<$n; $i++) {
    my $c = substr($text, $i, 1);
    printf "\\x\{%5.5X} %s\n", ord($c), charnames::viacode ord $c;
    print ' [0-9]' , "\n" if $c =~ /[0-9]/;
    print ' \d' , "\n" if $c =~ /\d/;
    print ' \p{IsNumber}', "\n" if $c =~ /\p{IsNumber}/;
    print '-'x $n, "\n";
    }


    Output:

    ------------------
    \x{00030} DIGIT ZERO
    [0-9]
    \d
    \p{IsNumber}
    ------------------
    \x{00660} ARABIC-INDIC DIGIT ZERO
    \d
    \p{IsNumber}
    ------------------
    \x{006F0} EXTENDED ARABIC-INDIC DIGIT ZERO
    \d
    \p{IsNumber}
    ------------------
    \x{02460} CIRCLED DIGIT ONE
    \p{IsNumber}
    ------------------
    \x{02474} PARENTHESIZED DIGIT ONE
    \p{IsNumber}
    ------------------
    \x{02488} DIGIT ONE FULL STOP
    \p{IsNumber}
    ------------------
    \x{024F5} DOUBLE CIRCLED DIGIT ONE
    \p{IsNumber}
    ------------------
    \x{02673} RECYCLING SYMBOL FOR TYPE-1 PLASTICS
    ------------------
    \x{02680} DIE FACE-1
    ------------------
    \x{02776} DINGBAT NEGATIVE CIRCLED DIGIT ONE
    \p{IsNumber}
    ------------------
    \x{02780} DINGBAT CIRCLED SANS-SERIF DIGIT ONE
    \p{IsNumber}
    ------------------
    \x{0278A} DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE
    \p{IsNumber}
    ------------------
    \x{1D7CE} MATHEMATICAL BOLD DIGIT ZERO
    \d
    \p{IsNumber}
    ------------------
    \x{1D7D8} MATHEMATICAL DOUBLE-STRUCK DIGIT ZERO
    \d
    \p{IsNumber}
    ------------------
    \x{1D7E2} MATHEMATICAL SANS-SERIF DIGIT ZERO
    \d
    \p{IsNumber}
    ------------------
    \x{1D7EC} MATHEMATICAL SANS-SERIF BOLD DIGIT ZERO
    \d
    \p{IsNumber}
    ------------------
    \x{1D7F6} MATHEMATICAL MONOSPACE DIGIT ZERO
    \d
    \p{IsNumber}
    ------------------
    \x{E0030} TAG DIGIT ZERO
    ------------------

    perl, v5.8.6 built for i386-freebsd-64int


    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Oct 4, 2005
    #8
  9. Bob

    Bob Guest

    Thanks folks. Lots of good info. I didn't realize (obviously)
    that the binding operation could use !~ in addition to =~.
    That solves my basic issue. The other suggestions are also
    helpful.

    Bob
     
    Bob, Oct 4, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. George Sakkis

    Negation in regular expressions

    George Sakkis, Sep 8, 2006, in forum: Python
    Replies:
    6
    Views:
    535
  2. joshc

    unary negation operator question

    joshc, Apr 1, 2005, in forum: C Programming
    Replies:
    17
    Views:
    576
    Keith Thompson
    Apr 1, 2005
  3. jimmij

    negation operator !

    jimmij, Dec 8, 2006, in forum: C++
    Replies:
    3
    Views:
    809
    John Carson
    Dec 9, 2006
  4. Marc Bissonnette

    Pattern matching : not matching problem

    Marc Bissonnette, Jan 8, 2004, in forum: Perl Misc
    Replies:
    9
    Views:
    237
    Marc Bissonnette
    Jan 13, 2004
  5. Bobby Chamness
    Replies:
    2
    Views:
    231
    Xicheng Jia
    May 3, 2007
Loading...

Share This Page