Regular expression help

B

Benedict White

I need a bit of help cleaning up a mess.

I need to write a regular expression that looks for invalid email
addresses. I know that the email addresses do not contain numerical
charictors or unusual combinations of letters.

What I was hoping for was something that would locate all emails with
say 2 before the at addressed to the domain I am looking after,
example.com.

I tried ^[A-Za-z2._%+-][email protected] however the 2 is not required.
The numbers could be anywhere in string before the @. It will miss
email addresses with other numbers in them, but will also pick up any
without out. I need to have all with numbers in them, to the
example.com domain.

Then I want to extend it to look for odd combinations of letters, like
xb, which would then have to appear together but anywhere in the
string.

Kind regards


Benedict White
 
B

Ben Morrow

Quoth Benedict White said:
I need a bit of help cleaning up a mess.

I need to write a regular expression that looks for invalid email
addresses. I know that the email addresses do not contain numerical
charictors or unusual combinations of letters.

It would be better to use a module that knows how to parse email
addresses.

It may be better to start with a list of valid address, and proceed from
there; however, this may not be possible.
What I was hoping for was something that would locate all emails with
say 2 before the at addressed to the domain I am looking after,
example.com.

I tried ^[A-Za-z2._%+-][email protected] however the 2 is not required.
The numbers could be anywhere in string before the @. It will miss
email addresses with other numbers in them, but will also pick up any
without out. I need to have all with numbers in them, to the
example.com domain.

Then I want to extend it to look for odd combinations of letters, like
xb, which would then have to appear together but anywhere in the
string.

Something like

#!/usr/bin/perl

use warnings;
use strict;

use Email::Address;

my $domain = 'example.com';

my $invalid = qr/ \d | xb /x;

my @addrs = qw{
abc%
a@a<[email protected]>
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
};

sub result {
my ($reason) = @_;
warn "$reason\n";
no warnings 'exiting';
next ADDR;
}

ADDR: for my $addr (@addrs) {
my ($parsed) = Email::Address->parse($addr)
or result "invalid address: $addr";

$parsed->original eq $addr
or result "extra gunk around address in '$addr'";

$parsed->host eq $domain
or result "'$addr' not at '$domain'";

$parsed->user =~ $invalid
and result "'$addr' contains a forbidden string";

result "'$addr' is valid";
}

__END__

should work.

Ben
 
B

Benedict White

Many thanks.

Is there no simple regex for saying that a part of the text (the bit
before the @ in this case) can contain anything you like, as long as
it contains say the number 2?

Kind regards


Benedict White
 
T

Tad McClellan

Benedict White said:
Many thanks.


To who?

For what?

Please quote some context in followups.

Is there no simple regex for saying that a part of the text (the bit
before the @ in this case) can contain anything you like, as long as
it contains say the number 2?


print "matched\n" if $text =~ /.*2.*\@/s;
 
G

Gunnar Hjalmarsson

Benedict said:
I need to write a regular expression that looks for invalid email
addresses. I know that the email addresses do not contain numerical
charictors
/\d[^@]*@/

or unusual combinations of letters.

Then I want to extend it to look for odd combinations of letters, like
xb, which would then have to appear together but anywhere in the
string.

Seems like an unusual requirement... What if there is some Max Borke,
with the address (e-mail address removed) ?
 
J

Jürgen Exner

Benedict said:
Is there no simple regex for saying that a part of the text (the bit
before the @ in this case) can contain anything you like, as long as
it contains say the number 2?

/.*2.*\@/

will do that.

jue
 
P

Paul Lalli

/.*2.*\@/

will do that.

Interesting that both you and Tad put the useless starting .* in your
regexp. It makes me wonder if it's not as useless as I think it is.
Is there any difference between that and
/2.*\@/
?

Paul Lalli
 
J

Jürgen Exner

Paul said:
Interesting that both you and Tad put the useless starting .* in your
regexp. It makes me wonder if it's not as useless as I think it is.
Is there any difference between that and
/2.*\@/

Well, you are right. I can't see any reason to put it there, either.

jue
 
J

Josef Moellers

Jürgen Exner said:
Well, you are right. I can't see any reason to put it there, either.

I feel like carrying owls to Athens, but in principle there is a
difference between the two: In the former case (/.*2.*\@/), $PREMATCH
will be empty, in the latter case, (/2.*\@/) it won't.
 
C

Charlton Wilbur

PL> Interesting that both you and Tad put the useless starting .*
PL> in your regexp. It makes me wonder if it's not as useless as
PL> I think it is. Is there any difference between [ /.*2.*\@/ ]
PL> and /2.*\@/ ?

To me, it indicates that the author of the regular expression is
thinking of the 2 as happening somewhere to the left of the @ sign in
the part of the string he cares about, as opposed to the 2 being at
the beginning of the part of the string he cares about.

Starting with the 2 is technically correct, and is only going to make
a difference if $& and company are involved somewhere.

Charlton
 
B

Benedict White

I need a bit of help cleaning up a mess.

I need to write a regular expression that looks for invalid email
addresses. I know that the email addresses do not contain numerical
charictors or unusual combinations of letters.

What I was hoping for was something that would locate all emails with
say 2 before the at addressed to the domain I am looking after,
example.com.

I tried ^[A-Za-z2._%+-][email protected] however the 2 is not required.
The numbers could be anywhere in string before the @. It will miss
email addresses with other numbers in them, but will also pick up any
without out. I need to have all with numbers in them, to the
example.com domain.

I seem to have found a regex that works:
[A-Za-Z]?[0-9][A-Za-Z][email protected]

Which matches emails to the example.com domain containing numbers.

Kind regards


Benedict White
 
T

Tad McClellan

Benedict White said:
I seem to have found a regex that works:


It has at least 4 different problems.

What does "works" mean when you say it?

[A-Za-Z]?[0-9][A-Za-Z][email protected]

Which matches emails to the example.com domain containing numbers.


a-Z is not a valid range.

at-signs need to be escaped in double-quotish contexts such
as a pattern.

There are 2 ".com" substrings required by the pattern.

It only allows a single character between the digit and
the at-sign. It doens't match '(e-mail address removed)' ...
 
B

Benedict White

Benedict White said:
I seem to have found a regex that works:

It has at least 4 different problems.

What does "works" mean when you say it?
[A-Za-Z]?[0-9][A-Za-Z][email protected]
Which matches emails to the example.com domain containing numbers.

a-Z is not a valid range.

at-signs need to be escaped in double-quotish contexts such
as a pattern.

There are 2 ".com" substrings required by the pattern.

It only allows a single character between the digit and
the at-sign. It doens't match '(e-mail address removed)' ...


Oops, that was a typo. It should have read:

[A-Za-z]?[0-9][A-Za-z][email protected]

Which does work with egrep without escaping the @.


However you are right it will only match 1 letter after the last
number before the att. Adding a * makes it two, but making it totally
wild matches everything :(.




Kind regards

Benedict White
 
T

Tad McClellan

Benedict White said:
Benedict White said:
I seem to have found a regex that works:

It has at least 4 different problems.

What does "works" mean when you say it?
[A-Za-Z]?[0-9][A-Za-Z][email protected]
Which matches emails to the example.com domain containing numbers.

a-Z is not a valid range.

at-signs need to be escaped in double-quotish contexts such
as a pattern.

There are 2 ".com" substrings required by the pattern.

It only allows a single character between the digit and
the at-sign. It doens't match '(e-mail address removed)' ...


Oops, that was a typo.


Please copy/paste code rather than attempting to rekey it, and
introducing typos that are not in your actual code.

It should have read:

[A-Za-z]?[0-9][A-Za-z][email protected]

Which does work with egrep without escaping the @.


Silly me.

I thought we were talking Perl here in the Perl newsgroup.

However you are right it will only match 1 letter after the last
number before the att. Adding a * makes it two,


Adding an asterisk where?

Got code?

Did you mean to say "replacing the ? with *" like:

[A-Za-z]?[0-9][A-Za-z]*@example.com

??

That makes it makes it both less and more than two.

but making it totally
wild matches everything :(.


What does "totally wild" mean when you say it?

Got code?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,070
Latest member
BiogenixGummies

Latest Threads

Top