Validate EMail with RegExp...

D

Dag Sunde

My understanding of regular expressions is rudimentary,
at best.

I have this RegExp to to a very simple validation of an
email-address, but it turns out that it refuses to
accept mail-addresses with hypens in them.

Can anybody please help me adjust it so it will accept
addresses like (e-mail address removed) too?

Here's what got:

function validateEmail(eMail) {
return /^(\w+\.)*(\w+)@(\w+\.)+([a-zA-Z]{2,4})$/.test(eMail);
}

TIA...
 
J

Julian Turner

Dag said:
My understanding of regular expressions is rudimentary,
at best.

I have this RegExp to to a very simple validation of an
email-address, but it turns out that it refuses to
accept mail-addresses with hypens in them.

Can anybody please help me adjust it so it will accept
addresses like (e-mail address removed) too?

Here's what got:

function validateEmail(eMail) {
return /^(\w+\.)*(\w+)@(\w+\.)+([a-zA-Z]{2,4})$/.test(eMail);
}

TIA...

--
Dag.

Work is the curse of the drinking classes
-- Oscar Wilde


Hi. A quick search on google threw up a million (well perhaps a
billion) examples of e-mail validation reg-exps.

Here is one I found to tinker with (worked when I tested it):-

var
r=/^([a-zA-Z][\w\.-]*[a-zA-Z0-9])@([a-zA-Z0-9][\w-]*[a-zA-Z0-9])\.([a-zA-Z][a-zA-Z\.]*[a-zA-Z])$/;

I suspect it has limits, such as, needs at least 2 characters in each
part of the e-mail address.

Julian
 
T

Thomas 'PointedEars' Lahn

Dag said:
I have this RegExp to to a very simple validation of an
email-address, but it turns out that it refuses to
accept mail-addresses with hypens in them.

Can anybody please help me adjust it so it will accept
addresses like (e-mail address removed) too?

Here's what got:

function validateEmail(eMail) {
return /^(\w+\.)*(\w+)@(\w+\.)+([a-zA-Z]{2,4})$/.test(eMail);
}

/\w/ (word character) is equivalent to /[A-Za-z0-9_]/ which does not
include the hyphen.

function validateEmail(eMail)
{
return /^(\w+\.)*([\w-]+)@([\w-]+\.)+([a-zA-Z]{2,4})$/.test(eMail);
}

See also:
<http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:RegExp>
<http://msdn.microsoft.com/library/en-us/script56/html/js56jsobjregexp.asp>
and <http://www.mozilla.org/js/language/E262-3.pdf>, section 15.10.

To understand Regular Expressions better, you should
definitely read <http://www.oreilly.com/catalog/regex/> pp.
At least the example chapters of "Understanding Regular
Expressions" have proven very useful to me in that regard.

Note that client-side scripting may not be supported,
so you need server-side validation as well.


PointedEars
 
D

Dag Sunde

Thomas 'PointedEars' Lahn said:
Dag Sunde wrote:
function validateEmail(eMail) {
return /^(\w+\.)*(\w+)@(\w+\.)+([a-zA-Z]{2,4})$/.test(eMail);
}

/\w/ (word character) is equivalent to /[A-Za-z0-9_]/ which does not
include the hyphen.

function validateEmail(eMail)
{
return /^(\w+\.)*([\w-]+)@([\w-]+\.)+([a-zA-Z]{2,4})$/.test(eMail);
}

Thank you!
Brilliant! I really need to read that.
To understand Regular Expressions better, you should
definitely read <http://www.oreilly.com/catalog/regex/> pp.
At least the example chapters of "Understanding Regular
Expressions" have proven very useful to me in that regard.

It is one of my pet shames that I never get around to read up on
Regular expressions... :-(
Note that client-side scripting may not be supported,
so you need server-side validation as well.

Here luckily, I'm God, and decides what must be supported.
It is an intranet site, and I've been given full freedom concerning
requirements like that.
 
D

Dag Sunde

Julian Turner said:
Dag Sunde wrote:
Hi. A quick search on google threw up a million (well perhaps a
billion) examples of e-mail validation reg-exps.

Typical... So didt I... but only after asking here. (Like a
bloody newbie) :-\
Here is one I found to tinker with (worked when I tested it):-

var
r=/^([a-zA-Z][\w\.-]*[a-zA-Z0-9])@([a-zA-Z0-9][\w-]*[a-zA-Z0-9])\.([a-zA-Z][a-zA-Z\.]*[a-zA-Z])$/;
I suspect it has limits, such as, needs at least 2 characters in each
part of the e-mail address.

It is very similar to the one I found...

Thank you for the response, Julian.
 
T

Thomas 'PointedEars' Lahn

Julian said:
Dag said:
function validateEmail(eMail) {
return /^(\w+\.)*(\w+)@(\w+\.)+([a-zA-Z]{2,4})$/.test(eMail);
}

[...]

Hi. A quick search on google threw up a million (well perhaps a
billion) examples of e-mail validation reg-exps.

Here is one I found to tinker with (worked when I tested it):-

var
r=/^([a-zA-Z][\w\.-]*[a-zA-Z0-9])@([a-zA-Z0-9][\w-]*[a-zA-Z0-9])\
([a-zA-Z][a-zA-Z\.]*[a-zA-Z])$/;

I suspect it has limits, such as, needs at least 2 characters in each
part of the e-mail address.

If I'm not mistaken, a Regular Expression for e-mail addresses adhering
closely to "3.4.1. Addr-spec specification" in RFC2822 "Internet Message
Format" would be:

addr-spec
=°=> local-part "@" domain
=°=> dot-atom / quoted-string / obs-local-part "@" domain
=°=> [CFWS] dot-atom-text [CFWS] / quoted-string / obs-local-part "@" domain

(We ignore comments and folding white space.)

=°=> 1*atext *("." 1*atext) / quoted-string / obs-local-part "@" domain

(We ignore quoted strings and obsolete addressing.)

=°=> 1*atext *("." 1*atext) "@" domain
=°=> [!#$%&'*+"/=?^_`{|}~0-9A-Za-z-]+(\.[!#$%&'*+"/=?^_`{|}~0-9A-Za-z-]+)*
"@" dot-atom / domain-literal / obs-domain

(We ignore obsolete addressing again; and I have yet to see an e-mail
address produced by the domain-literal production, so I ignore this as
well.)

=°=> [!#$%&'*+"/=?^_`{|}~0-9A-Za-z-]+(\.[!#$%&'*+"/=?^_`{|}~0-9A-Za-z-]+)*
@
[!#$%&'*+"/=?^_`{|}~0-9A-Za-z-]+(\.[!#$%&'*+"/=?^_`{|}~0-9A-Za-z-]+)*

Which leaves us with the fairly simple :)

/^[!#$%&'*+"\/=?^_`{|}~0-9A-Za-z-]+(\
[!#$%&'*+"\/=?^_`{|}~0-9A-Za-z-]+)*@[!#$%&'*+"\/=?^_`{|}~0-9A-Za-z-]+(\
[!#$%&'*+"\/=?^_`{|}~0-9A-Za-z-]+)*/

(Watch for word wrap!)

Several characters in the above range have adjacent code points in
ASCII, so we can compact it a little bit.

/^[!-'*+=?{-~\/-9A-Z^-z-]+(\.[!-'*+=
{-~\/-9A-Z^-z-]+)*@[!-'*+=?^-`{-~\/-9A-Z^-z-]+(\.[!-'*+=
{-~\/-9A-Z^-z-]+)*/

However, since we are on the Internet, I don't think this will suffice.
For example, I don't know a top-level domain with only one character
(I know such a second-level domain, though) and on today's Internet,
the domain-part should be a FQDN (fully qualified domain name):

/^[!-'*+=?{-~\/-9A-Z^-z-]+(\.[!-'*+=
{-~\/-9A-Z^-z-]+)*@[!#$%&'*+"\/=?^_`{|}~0-9A-Za-z-]+\.[!-'*+=
{-~\/-9A-Z^-z-]{2,}/

It could be further refined, allowing only existing top-level domains
and possible second-level domains, allowing especially for IDN domain
names aso., but I think this will suffice for now.

Of course, my tests showed addresses with hyphens are recognized as
correct and no false positives to date (using rather strange examples
and sender addresses of posters here). I'll be happy if you provide
me with one that does not fit :)


HTH

PointedEars
 
T

Thomas 'PointedEars' Lahn

Thomas said:
/^[!-'*+=?{-~\/-9A-Z^-z-]+(\.[!-'*+=
{-~\/-9A-Z^-z-]+)*@[!#$%&'*+"\/=?^_`{|}~0-9A-Za-z-]+\.[!-'*+=
{-~\/-9A-Z^-z-]{2,}/

Consequently, it should be

/^[!-'*+=?{-~\/-9A-Z^-z-]+(\.[!-'*+=?{-~\/-9A-Z^-z-]+)*
@[!-'*+=?{-~\/-9A-Z^-z-]+\.[!-'*+=?{-~\/-9A-Z^-z-]{2,}/

(Watch for word wrap, of course.)


PointedEars
 
T

Thomas 'PointedEars' Lahn

David said:

It is to be noted that RFC822 has already been obsoleted by RFC2822
in April 2001 (the document referred to is dated 13/04/2002) where
the Address Specification this Regular Expression is based on differs
in certain regards.

Not to mention that the above referred no longer qualifies as being
simple (maybe exhaustive, but still OBSOLETE) and that it validates
the Address Specification, including sender name, address comments
and folding whitespaces not usually found in e-mail addresses input
in Web form controls, not the up-to-date addr-spec specification for
the e-mail address much more often to be found there.


PointedEars
 
T

Thomas 'PointedEars' Lahn

Dag said:
You're kidding me, right?

That is the longest monster of a regular expression I've ever seen!

And probably the most useless one (in this newsgroup), taking into account
that it is based on an obsolete RFC, is highly unlikely to be needed in
forms, and uses features no JS/ECMAScript engine supports.

Nice try, though.


PointedEars
 
D

Dr John Stockton

JRS: In article <[email protected]>, dated Thu, 3
Nov 2005 09:39:29, seen in Dag Sunde
I have this RegExp to to a very simple validation of an
email-address, but it turns out that it refuses to
accept mail-addresses with hypens in them.

Can anybody please help me adjust it so it will accept
addresses like (e-mail address removed) too?

Here's what got:

function validateEmail(eMail) {
return /^(\w+\.)*(\w+)@(\w+\.)+([a-zA-Z]{2,4})$/.test(eMail);
}

It is pointless to attempt to validate an E-mail address format in
detail, unless you are writing an E-mail system, or unless the task is
purely pedagogical. A RegExp that will accept (e-mail address removed) will also
accept (e-mail address removed), and for any given message at least one of those
is wrong. There's no way round it; someone typing an E-address just has
to be careful to get it right.

If a test does not accept any RFC-permitted E-address, there will be
complaints; if an allegedly-full test fails to reject any non-permitted
address, there will be complaints. And if the RFCs change the format
range, full-check code will have to be changed too.

Testing with /.+@.+\..+/ or /^.+@.+\..+$/, to ensure at least (e-mail address removed),
does make sense as a check that something like an E-address has been put
in the field, rather than say a price or an arbitrary vulgarity.

See <URL:http://www.merlyn.demon.co.uk/js-valid.htm#VEmA>.
 
D

Dag Sunde

Dr John Stockton said:
JRS: In article <[email protected]>, dated Thu, 3
Nov 2005 09:39:29, seen in Dag Sunde
I have this RegExp to to a very simple validation of an
email-address, but it turns out that it refuses to
accept mail-addresses with hypens in them.

Can anybody please help me adjust it so it will accept
addresses like (e-mail address removed) too?

Here's what got:

function validateEmail(eMail) {
return /^(\w+\.)*(\w+)@(\w+\.)+([a-zA-Z]{2,4})$/.test(eMail);
}

It is pointless to attempt to validate an E-mail address format in
detail, unless you are writing an E-mail system, or unless the task is
purely pedagogical. A RegExp that will accept (e-mail address removed) will also
accept (e-mail address removed), and for any given message at least one of those
is wrong. There's no way round it; someone typing an E-address just has
to be careful to get it right.

Of course.
If a test does not accept any RFC-permitted E-address, there will be
complaints; if an allegedly-full test fails to reject any non-permitted
address, there will be complaints. And if the RFCs change the format
range, full-check code will have to be changed too.

Testing with /.+@.+\..+/ or /^.+@.+\..+$/, to ensure at least (e-mail address removed),
does make sense as a check that something like an E-address has been put
in the field, rather than say a price or an arbitrary vulgarity.

See <URL:http://www.merlyn.demon.co.uk/js-valid.htm#VEmA>.

That is the only thing i want, to verify that the input at least can
be interpreted as an email-address. My problem was that my former regexp.
balked at 'x' or 'z' with hypens in them.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,265
Latest member
TodLarocca

Latest Threads

Top