Puzzled over rgexp

I

IanW

Can anyone tell me why the following doesn't return "Not standards
compliant"?

my $em = ',[email protected]'; # note the comma in fornt of the email
address
if ($em =~
/^(?!\.)[!\#\$%&'*+-\/=?^_`{|}~.a-zA-Z0-9]+(?<!\.)\@(?!\.)[a-zA-Z0-9-.]+$/)
{
print "Standards compliant";
}
else {
print "Not standards compliant";
}

If you put a '.' or ':' or ';' etc in front of the address it comes back
with not compliant but when it's a comma it comes back as compliant. Is this
a bug or is the fault in my code?

Thanks
Ian
 
T

thundergnat

IanW said:
Can anyone tell me why the following doesn't return "Not standards
compliant"?

my $em = ',[email protected]'; # note the comma in fornt of the email
address
if ($em =~
/^(?!\.)[!\#\$%&'*+-\/=?^_`{|}~.a-zA-Z0-9]+(?<!\.)\@(?!\.)[a-zA-Z0-9-.]+$/)
{
print "Standards compliant";
}
else {
print "Not standards compliant";
}

If you put a '.' or ':' or ';' etc in front of the address it comes back
with not compliant but when it's a comma it comes back as compliant. Is this
a bug or is the fault in my code?

Thanks
Ian

Hyphens are significant inside character classes.

Move the hyphen to the end or beginning of the character class or escape it.
You are including the range [+-\/] in your search which equates to +,-./
 
I

IanW

thundergnat said:
IanW wrote:
/^(?!\.)[!\#\$%&'*+-\/=?^_`{|}~.a-zA-Z0-9]+(?<!\.)\@(?!\.)[a-zA-Z0-9-.]+$/)

Hyphens are significant inside character classes.

I wondered about that except that when I tested each character within the
square brackets to see if it came back with "standards compliant", it did so
when I put a '-' into the local part of the email, which made me think it
hadn't treated it as a special charater on it's own. Like with the * which
would normally mean 0 or more apostrophes but in this case a * comes back as
standards compliant too.. I may not be explaining this very well! Is there
anything else in this context I should escape that I'm not doing?
Move the hyphen to the end or beginning of the character class or escape
it.

however, that works nicely.. thanks :)
You are including the range [+-\/] in your search which equates to +,-./

how coincidental is that? That is, it's not a V, it's an escaped forward
slash!

Regards
Ian
 
G

Gunnar Hjalmarsson

IanW said:
Can anyone tell me why the following doesn't return "Not standards
compliant"?

my $em = ',[email protected]'; # note the comma in fornt of the email
address
if ($em =~
/^(?!\.)[!\#\$%&'*+-\/=?^_`{|}~.a-zA-Z0-9]+(?<!\.)\@(?!\.)[a-zA-Z0-9-.]+$/)
--------------------^^^^

Those characters are interpreted as a range ( \053-\057 ), which happens
to include comma ( \054 ). Put the '-' at the beginning or end of the
character class.
{
print "Standards compliant";
}
else {
print "Not standards compliant";
}

Btw, which standard are you referring to?
 
D

DJ Stunks

IanW said:
Can anyone tell me why the following doesn't return "Not standards
compliant"?

my $em = ',[email protected]'; # note the comma in fornt of the email
address
if ($em =~
/^(?!\.)[!\#\$%&'*+-\/=?^_`{|}~.a-zA-Z0-9]+(?<!\.)\@(?!\.)[a-zA-Z0-9-.]+$/)
{
print "Standards compliant";
}
else {
print "Not standards compliant";
}

Hey, I hate to sound like a broken record, but why reinvent the wheel?

C:\tmp>cat tmp.pl
#!/usr/bin/perl

use strict;
use warnings;

use Regexp::Common qw{ Email::Address };

while ( my $em = <DATA> ) {
chomp $em;
printf "%-25s : ",$em;
if ($em =~ m{^ $RE{Email}{Address} $}x) {
print "Standards compliant\n";
}
else {
print "Not standards compliant\n";
}
}

__DATA__
,[email protected]
;[email protected]
:[email protected]
(e-mail address removed)
(e-mail address removed)

C:\tmp>tmp.pl
,[email protected] : Not standards compliant
;[email protected] : Not standards compliant
:[email protected] : Not standards compliant
(e-mail address removed) : Standards compliant
(e-mail address removed) : Standards compliant

-jp
 
I

IanW

Gunnar Hjalmarsson said:
IanW said:
Can anyone tell me why the following doesn't return "Not standards
compliant"?

my $em = ',[email protected]'; # note the comma in fornt of the email
address
if ($em =~
/^(?!\.)[!\#\$%&'*+-\/=?^_`{|}~.a-zA-Z0-9]+(?<!\.)\@(?!\.)[a-zA-Z0-9-.]+$/)
--------------------^^^^

Those characters are interpreted as a range ( \053-\057 ), which happens
to include comma ( \054 ). Put the '-' at the beginning or end of the
character class.

I see, thanks
Btw, which standard are you referring to?

RFC2822, according to wikipedia

Ian
 
I

IanW

Hey, I hate to sound like a broken record, but why reinvent the wheel?

C:\tmp>cat tmp.pl
#!/usr/bin/perl

use strict;
use warnings;

use Regexp::Common qw{ Email::Address };

while ( my $em = <DATA> ) {
chomp $em;
printf "%-25s : ",$em;
if ($em =~ m{^ $RE{Email}{Address} $}x) {
print "Standards compliant\n";
}
else {
print "Not standards compliant\n";
}
}

Well, thanks - could do it that way, but it's only a one-liner anyway and
serves as a useful reminder what characters are permitted.. also, I use a
variation of it to pull out email addresses from within larger strings.

Ian
 
G

Gunnar Hjalmarsson

IanW said:
RFC2822, according to wikipedia

There are certainly RFC 2822 compliant addresses that won't match your
regex. Quoted strings and domain literals come to mind. There are
probably others.
 
A

A. Sinan Unur

Gunnar Hjalmarsson said:
IanW said:
Can anyone tell me why the following doesn't return "Not standards
compliant"?

my $em = ',[email protected]'; # note the comma in fornt of the
email address
if ($em =~
/^(?!\.)[!\#\$%&'*+-\/=?^_`{|}~.a-zA-Z0-9]+(?<!\.)\@(?!\.)[a-zA-Z0-9-
.]+$/) --------------------^^^^

....
Btw, which standard are you referring to?

RFC2822, according to wikipedia

I have not examined your tried your code (or whatever you lifted from
Wikipedia), but knowing what Email::Address does, I doubt that the one-
liner above can be used to discern whether an email address is RFC2822
compliant.

You may choose to educate yourself by reading:

http://search.cpan.org/src/CWEST/Email-Address-1.80/lib/Email/Address.pm

as well as the Frequently Asked Questions list (which you should have
done before posting here):

perldoc -q "How do I check a valid mail address?"

Sinan
 
R

Rick Scott

(IanW said:
Can anyone tell me why the following doesn't return "Not standards
compliant"?

my $em = ',[email protected]'; # note the comma in fornt of the email
address
if ($em =~
/^(?!\.)[!\#\$%&'*+-\/=?^_`{|}~.a-zA-Z0-9]+(?<!\.)\@(?!\.)[a-zA-Z0-9-.]+$/)
...

Just out of curiousity, are you trying to validate the format of
email addresses? If so, you will be far better served by using a CPAN
module than by trying to roll your own.

Mail::RFC822::Address
Email::Valid
perldoc -q 'valid mail'




Rick
 
E

Eric Schwartz

IanW said:
Well, thanks - could do it that way, but it's only a one-liner anyway and
serves as a useful reminder what characters are permitted.

Which is all well and good, as long as you don't mind rejecting valid
addresses.
also, I use a
variation of it to pull out email addresses from within larger strings.

Which would also be a useful place to use Email::Address. Don't you
think?

-=Eric
 
R

Rick Scott

(IanW said:
Well, thanks - could do it that way, but it's only a one-liner
anyway and serves as a useful reminder what characters are
permitted.. also, I use a variation of it to pull out email
addresses from within larger strings.

You could do it your way and get the wrong answer in one line,
or use one of the CPAN modules and get the right answer in two.
Your call.

Have a look at your regex, have a look at all the ways RFC 2822 says
you can specify an address, then have a look at the correct regex:
http://www.faqs.org/rfcs/rfc2822.html
perl -MEmail::Address -e 'print $Email::Address::mailbox'

Email::Address even throws in the extraction you want to do, for free:

#!/usr/bin/perl

use strict;
use warnings;

use Email::Address;

while (<DATA>) {
print "Found: " . (join ' ', Email::Address->parse($_)) . "\n";
}

__DATA__
xyzzy etc,[email protected], garble someone@localhost
,,"Dr. Fred M. Bogo" <[email protected]>,asdf


Found: <[email protected]> <someone@localhost>
Found: "Dr. Fred M. Bogo" <[email protected]>




Rick
 
R

robic0

Can anyone tell me why the following doesn't return "Not standards
compliant"?

my $em = ',[email protected]'; # note the comma in fornt of the email
address
if ($em =~
/^(?!\.)[!\#\$%&'*+-\/=?^_`{|}~.a-zA-Z0-9]+(?<!\.)\@(?!\.)[a-zA-Z0-9-.]+$/)
{
print "Standards compliant";
}
else {
print "Not standards compliant";
}

If you put a '.' or ':' or ';' etc in front of the address it comes back
with not compliant but when it's a comma it comes back as compliant. Is this
a bug or is the fault in my code?

Thanks
Ian
Came late, just read a sample,
If anyone hasn't already answered with the definative solution, say so
and I will.
 
R

robic0

IanW ([email protected]) wrote on MMMMDLIII September MCMXCIII in
<URL::)
:) :) > IanW wrote:
:) >> Can anyone tell me why the following doesn't return "Not standards
:) >> compliant"?
:) >>
:) >> my $em = ',[email protected]'; # note the comma in fornt of the email
:) >> address
:) >> if ($em =~
:) >> /^(?!\.)[!\#\$%&'*+-\/=?^_`{|}~.a-zA-Z0-9]+(?<!\.)\@(?!\.)[a-zA-Z0-9-.]+$/)
:) > --------------------^^^^
:) >
:) > Those characters are interpreted as a range ( \053-\057 ), which happens
:) > to include comma ( \054 ). Put the '-' at the beginning or end of the
:) > character class.
:)
:) I see, thanks
:)
:) >> {
:) >> print "Standards compliant";
:) >> }
:) >> else {
:) >> print "Not standards compliant";
:) >> }
:) >
:) > Btw, which standard are you referring to?
:)
:) RFC2822, according to wikipedia


"perl misc"@abigail.nl

is a valid email address according to RFC2822 (and if you mail to it, you
should get an autorespond), but it won't match your regexp.


Abigail

Hey big gurl, what in the **** in this in your sig? :

perl -e '* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %;
BEGIN {% % = ($ _ = " " => print "Just Another Perl Hacker\n")}'
 
I

IanW

Gunnar Hjalmarsson said:
There are certainly RFC 2822 compliant addresses that won't match your
regex. Quoted strings and domain literals come to mind. There are probably
others.

Well, quoted strings are discouraged according to RFC 2821 and so I'm quite
happy to ignore such addresses.

Ian
 
I

IanW

You could do it your way and get the wrong answer in one line,
or use one of the CPAN modules and get the right answer in two.
Your call.

Have a look at your regex, have a look at all the ways RFC 2822 says
you can specify an address, then have a look at the correct regex:
http://www.faqs.org/rfcs/rfc2822.html
perl -MEmail::Address -e 'print $Email::Address::mailbox'

Email::Address even throws in the extraction you want to do, for free:

#!/usr/bin/perl

use strict;
use warnings;

use Email::Address;

while (<DATA>) {
print "Found: " . (join ' ', Email::Address->parse($_)) . "\n";
}

__DATA__
xyzzy etc,[email protected], garble someone@localhost
,,"Dr. Fred M. Bogo" <[email protected]>,asdf


Found: <[email protected]> <someone@localhost>
Found: "Dr. Fred M. Bogo" <[email protected]>

Hmm, yes it does seem to do the trick, though this variation more suits my
needs:

use strict;
use warnings;

use Email::Address;

my @addresses = Email::Address->parse(<DATA>);
foreach (@addresses) {
print $_->address . "\n";
}

__DATA__
xyzzy etc,[email protected], garble someone@localhost
,,"Dr. Fred M. Bogo" <[email protected]>,asdf

Thanks
Ian
 
E

Eric Schwartz

IanW said:
Hmm, yes it does seem to do the trick, though this variation more suits my
needs:

use strict;
use warnings;

use Email::Address;

my @addresses = Email::Address->parse(<DATA>);
foreach (@addresses) {
print $_->address . "\n";
}

__DATA__
xyzzy etc,[email protected], garble someone@localhost
,,"Dr. Fred M. Bogo" <[email protected]>,asdf

It's all in your needs, of course, but I would point out that your
version reads the entire file into memory at once, which could get
very expensive as the file gets very large, whereas Rick's original
only reads a line at a time, and so scales very nicely as the size of
the file increases.

-=Eric
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top