Simple pattern matching negation

B

Bob

I'm a definite newbie at Perl and I need some basic syntax help.

This works (checking for 4 numeric digits and doing something if
it's _not- true:

if ($zip5=~m/[0-9]{5}/ != 1) { print ("invalid zip code");}

but I know there must be an easier syntax to negate the expression
$zip5=~m/[0-9]{5}/ within the if statement. I tried a couple of
things that I thought should work but I am missing something really
basic in the use of "not" or "!".

Thanks,
 
T

Todd de Gruyl

This works (checking for 4 numeric digits and doing something if
it's _not- true:

I'm assuming you mean 5 digits.
if ($zip5=~m/[0-9]{5}/ != 1) { print ("invalid zip code");}

but I know there must be an easier syntax to negate the expression
$zip5=~m/[0-9]{5}/ within the if statement. I tried a couple of
things that I thought should work but I am missing something really
basic in the use of "not" or "!".

if($zip5 !~ /[0-9]{5}/){
print "invalid zip code";
}

or (depending on your needs)
print "invalid zip code" unless $zip5 =~ /\d{5}/;
or:
print "invalid zip code" if $zip5 !~ /\d{5}/;

the !~ operator is documented in perldoc perlop, the alternate
if/unless statement modifiers are documented in perldoc perlsyn (look
for 'Statement Modifiers').

[You'll also note that I used \d instead of [0-9], same thing... a
couple of characters shorter.]
 
A

A. Sinan Unur

This works (checking for 4 numeric digits and doing something if
it's _not- true:

I'm assuming you mean 5 digits.
if ($zip5=~m/[0-9]{5}/ != 1) { print ("invalid zip code");}

but I know there must be an easier syntax to negate the expression
$zip5=~m/[0-9]{5}/ within the if statement. I tried a couple of
things that I thought should work but I am missing something really
basic in the use of "not" or "!".

if($zip5 !~ /[0-9]{5}/){
print "invalid zip code";
}

or (depending on your needs)
print "invalid zip code" unless $zip5 =~ /\d{5}/;
or:
print "invalid zip code" if $zip5 !~ /\d{5}/;

Assuming that the OP wants $zip5 to contain nothing but the 5 digits,
you should use anchors:

print "invalid zip code\n" if $zip5 !~ /^\d{5}$/;

Sinan
 
S

Scott Bryce

Bob said:
but I know there must be an easier syntax to negate the expression
$zip5=~m/[0-9]{5}/

if ($zip5 !~ /\d{5}/)

But I have a feeling that that isn't what you really want to do.

use strict;
use warnings;

my @zips = qw(12345 1234 ab12345 abcdefg 123456);

for my $zip5 (@zips)
{
if ($zip5 !~ /\d{5}/)
{
print "$zip5 does not match (1).\n";
}
else
{
print "$zip5 does match (1).\n";
}

if ((length($zip5) != 5) || ($zip5 =~ /\D/))
{
print "$zip5 does not match (2).\n";
}
else
{
print "$zip5 does match (2).\n";
}

if ($zip5 !~ /^\d{5}$/)
{
print "$zip5 does not match (3).\n";
}
else
{
print "$zip5 does match (3).\n";
}

print "\n";
}

You may also want to account for zip + 4 or postal codes from outside
the USA.
 
T

Tad McClellan

Bob said:
This works


Completely by accident.

It could stop working the next time you upgrade perl!

if ($zip5=~m/[0-9]{5}/ != 1) { print ("invalid zip code");}
^^^^
^^^^
The value of a match in scalar context is true or false. Relying
on it being any particular true value is asking for a bug.

but I know there must be an easier syntax to negate the expression
$zip5=~m/[0-9]{5}/ within the if statement.


The basic idiom for validating data is: anchor the front, anchor
the back, in between put a pattern that accounts for all you
want to allow:

if ( $zip5 !~ m/^\d{5}$/ ) { print "invalid zip code"}
or
unless ( $zip5 =~ m/^\d{5}$/ ) { print "invalid zip code"}
 
D

Dr.Ruud

Todd de Gruyl:
[You'll also note that I used \d instead of [0-9], same thing... a
couple of characters shorter.]

\d is not always the same as [0-9]. See \p{IsDigit} in `man perlre`.

If you want your parser to keep limiting variable names to [A-Za-z0-9_],
it might be time to stop coding that as \w, because \p{IsWord} can
contain a lot more characters than [A-Za-z0-9_].

AFAIK, `man perlre` doesn't explicitly say that \w and \p{IsWord} are
equal.
It does say this though: [:^word:] \W \P{IsWord}.

From perllocale:
Regular expression checks for safe file names or mail addresses
using "\w" may be spoofed by an "LC_CTYPE" locale that claims that
characters such as ">" and "|" are alphanumeric.
 
J

John W. Krahn

Dr.Ruud said:
Todd de Gruyl:
[You'll also note that I used \d instead of [0-9], same thing... a
couple of characters shorter.]

\d is not always the same as [0-9]. See \p{IsDigit} in `man perlre`.

Where in the man page does it say that "\d is not always the same as [0-9]"?


John
 
D

Dr.Ruud

John W. Krahn:
Dr.Ruud:
Todd de Gruyl:
[You'll also note that I used \d instead of [0-9], same thing... a
couple of characters shorter.]

\d is not always the same as [0-9]. See \p{IsDigit} in `man perlre`.

Where in the man page does it say that "\d is not always the same as
[0-9]"?

I just did. Even \d and \p{IsDigit} aren't always the same test.

Demonstration:

use warnings;
use strict;
use charnames ':full';

my $text = "\x{00030}"
. "\x{00660}\x{006F0}"
. "\x{02460}\x{02474}\x{02488}\x{024F5}"
. "\x{02673}\x{02680}"
. "\x{02776}\x{02780}\x{0278A}"
. "\x{1D7CE}\x{1D7D8}\x{1D7E2}\x{1D7EC}\x{1D7F6}"
. "\x{E0030}";

my $n = length($text);

print '-'x $n, "\n";

for (my $i=0; $i<$n; $i++) {
my $c = substr($text, $i, 1);
printf "\\x\{%5.5X} %s\n", ord($c), charnames::viacode ord $c;
print ' [0-9]' , "\n" if $c =~ /[0-9]/;
print ' \d' , "\n" if $c =~ /\d/;
print ' \p{IsNumber}', "\n" if $c =~ /\p{IsNumber}/;
print '-'x $n, "\n";
}


Output:

------------------
\x{00030} DIGIT ZERO
[0-9]
\d
\p{IsNumber}
------------------
\x{00660} ARABIC-INDIC DIGIT ZERO
\d
\p{IsNumber}
------------------
\x{006F0} EXTENDED ARABIC-INDIC DIGIT ZERO
\d
\p{IsNumber}
------------------
\x{02460} CIRCLED DIGIT ONE
\p{IsNumber}
------------------
\x{02474} PARENTHESIZED DIGIT ONE
\p{IsNumber}
------------------
\x{02488} DIGIT ONE FULL STOP
\p{IsNumber}
------------------
\x{024F5} DOUBLE CIRCLED DIGIT ONE
\p{IsNumber}
------------------
\x{02673} RECYCLING SYMBOL FOR TYPE-1 PLASTICS
------------------
\x{02680} DIE FACE-1
------------------
\x{02776} DINGBAT NEGATIVE CIRCLED DIGIT ONE
\p{IsNumber}
------------------
\x{02780} DINGBAT CIRCLED SANS-SERIF DIGIT ONE
\p{IsNumber}
------------------
\x{0278A} DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE
\p{IsNumber}
------------------
\x{1D7CE} MATHEMATICAL BOLD DIGIT ZERO
\d
\p{IsNumber}
------------------
\x{1D7D8} MATHEMATICAL DOUBLE-STRUCK DIGIT ZERO
\d
\p{IsNumber}
------------------
\x{1D7E2} MATHEMATICAL SANS-SERIF DIGIT ZERO
\d
\p{IsNumber}
 
B

Bob

Thanks folks. Lots of good info. I didn't realize (obviously)
that the binding operation could use !~ in addition to =~.
That solves my basic issue. The other suggestions are also
helpful.

Bob
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,521
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top