Regular Expression for Email Validation

J

jmhmaine

I use the following Regular Expression to check the format of the text in a
textbox when requesting an email:

ValidationExpression="^[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]@[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]$"

It works great, but I just discovered a bug if someone provides a single
character before the @ sign. The following doesn't pass:

(e-mail address removed)

Is there a quick fix or better Expression to use? Thanks.
 
S

Steven Cheng[MSFT]

Hi Jmhmaine,

I also think www.regexlib.com is a good place to find some exsiting
sophisticate regex.
In addition, since it would be a bit hard to modify an existing complex
regex just to fit some small requirement, I suggest you consider change the
validtion logic of your control into two case test. For example, in your
validation control's checking code,

first validate the input through the regex test , then if failed, check the
particular " a single character before the @ sign" scenario. How do you
think of this?

Thanks.

Regards,

Steven Cheng
Microsoft Online Support

Get Secure! www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)
 
D

David Alexander

I am not sure you want to be using \w, because it is equivalent to
[a-zA-Z_0-9], which includes underscores. I don't think that underscores are
allowed in either mailbox or domain names, and your regular expression
allows first_last@subdomain_domain.com. I think that mailbox names must be
alphanumeric letter clusters separated by either periods or hyphens, (not
both), and domain names must be clusters of alphanumerics separated by
hyphens, separated by periods. Your regular expression excludes
(e-mail address removed) and allows (e-mail address removed).

I have a really long, ugly one that I am using, but if anyone can improve
it, or knows of a mail server that allows mailbox names with underscores, I
would appreciate being corrected. My regular expression limits the domain
type to 2-4 alpha characters. I have not heard of any longer than 4, i.e.,
".info", but if anyone knows of longer legal ones, please speak up

My regular expression is,

@"^(((([0-9A-Za-z]+(\-[0-9A-Za-z]+)*)|([0-9A-Za-z]+(\.[0-9A-Za-z]+)*))@([0-9
A-Za-z]+([-][0-9A-Za-z]+)*)(\.([0-9A-Za-z]+([-][0-9A-Za-z]+)*))*\.[a-zA-Z]{2
,4}))$";
 
J

jmhmaine

Good points, I didn't create this expression, I found it a year ago as a
sample. A few comments on your points:

There is a new top level domain .museum
http://musedoma.museum/
So you should allow up to 7 characters.

I believe I have seen _ underscores in emails before, not sure if legal per
RFC. I jusr ran a test sending a email (e-mail address removed) and Outlook
2003 sent it and my ISP forwarded it back to my POP3.

David Alexander said:
I am not sure you want to be using \w, because it is equivalent to
[a-zA-Z_0-9], which includes underscores. I don't think that underscores are
allowed in either mailbox or domain names, and your regular expression
allows first_last@subdomain_domain.com. I think that mailbox names must be
alphanumeric letter clusters separated by either periods or hyphens, (not
both), and domain names must be clusters of alphanumerics separated by
hyphens, separated by periods. Your regular expression excludes
(e-mail address removed) and allows (e-mail address removed).

I have a really long, ugly one that I am using, but if anyone can improve
it, or knows of a mail server that allows mailbox names with underscores, I
would appreciate being corrected. My regular expression limits the domain
type to 2-4 alpha characters. I have not heard of any longer than 4, i.e.,
".info", but if anyone knows of longer legal ones, please speak up

My regular expression is,

@"^(((([0-9A-Za-z]+(\-[0-9A-Za-z]+)*)|([0-9A-Za-z]+(\.[0-9A-Za-z]+)*))@([0-9
A-Za-z]+([-][0-9A-Za-z]+)*)(\.([0-9A-Za-z]+([-][0-9A-Za-z]+)*))*\.[a-zA-Z]{2
,4}))$";

jmhmaine said:
I use the following Regular Expression to check the format of the text in a
textbox when requesting an email:

ValidationExpression="^[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]@[a-zA-Z0-9][\w\.-]*[a-
zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]$"

It works great, but I just discovered a bug if someone provides a single
character before the @ sign. The following doesn't pass:

(e-mail address removed)

Is there a quick fix or better Expression to use? Thanks.
 
J

jmhmaine

I would like use URL:
http://www.regexlib.com/REDetails.aspx?regexp_id=711

But how do incorporate the expression:
^((?>[a-zA-Z\d!#$%&'*+\-/=?^_`{|}~]+\x20*|"((?=[\x01-\x7f])[
^"\\]|\\[\x01-\x7f])*"\x20*)*(?<angle><))?((?!\.)(?>\.?[a-zA
-Z\d!#$%&'*+\-/=?^_`{|}~]+)+|"((?=[\x01-\x7f])[^"\\]|\\[\x01
-\x7f])*")@(((?!-)[a-zA-Z\d\-]+(?<!-)\.)+[a-zA-Z]{2,}|\[(((?
(?<!\[)\.)(25[0-5]|2[0-4]\d|[01]?\d?\d)){4}|[a-zA-Z\d\-]*[a-
zA-Z\d]:((?=[\x01-\x7f])[^\\\[\]]|\\[\x01-\x7f])+)\])(?(angl
e)>)$

Into the ValidationExpression statement? The problem is that this expression
contains both single and double quotes, so I can't assign the it to the
ValidationExpression attribute. Is there a way to escape within ASPX Web
Controls statements? Thanks.
 
S

Steven Cheng[MSFT]

Hi Jmhmaine,

Thanks for your followup. I think the problem is caused by the clientside
script has different escape char for single and double quote from the
serverside .net code.

In dot.net string we can use \" \' to replace all the " and ' , but this
will be incorrectly escaped by the clientside browser when parsing this
expression.

Currently I think we may need to use custom validation control for such
complex regex scenario, manualy generate the regular expression for
serverside .net code and clientside script code separately.

Thanks.

Regards,

Steven Cheng
Microsoft Online Support

Get Secure! www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)
 
J

jmhmaine

The problem I have is an ASP.NET compile error because I can't assign the
expression.

I tried with double quotes:
<asp:RegularExpressionValidator ID="valRegExprEmail"
Runat="server"
ControlToValidate="txtEmail"
ValidationExpression="^((?>[a-zA-Z\d!#$%&'*+\-/=?^_`{|}~]+\x20*|"((?=[\x01-\x7f])[^"\\]|\\[\x01-\x7f])*"\x20*)*(?<angle><))?((?!\.)(?>\.?[a-zA-Z\d!#$%&'*+\-/=?^_`{|}~]+)+|"((?=[\x01-\x7f])[^"\\]|\\[\x01-\x7f])*")@(((?!-)[a-zA-Z\d\-]+(?<!-)\.)+[a-zA-Z]{2,}|\[(((?(?<!\[)\.)(25[0-5]|2[0-4]\d|[01]?\d?\d)){4}|[a-zA-Z\d\-]*[a-zA-Z\d]:((?=[\x01-\x7f])[^\\\[\]]|\\[\x01-\x7f])+)\])(?(angl
e)>)$"
ErrorMessage="The email address provided is not formatted correctly."
Display="None" />

And Single Quotes:
<asp:RegularExpressionValidator ID="valRegExprEmail"
Runat="server"
ControlToValidate="txtEmail"
ValidationExpression='^((?>[a-zA-Z\d!#$%&'*+\-/=?^_`{|}~]+\x20*|"((?=[\x01-\x7f])[^"\\]|\\[\x01-\x7f])*"\x20*)*(?<angle><))?((?!\.)(?>\.?[a-zA-Z\d!#$%&'*+\-/=?^_`{|}~]+)+|"((?=[\x01-\x7f])[^"\\]|\\[\x01-\x7f])*")@(((?!-)[a-zA-Z\d\-]+(?<!-)\.)+[a-zA-Z]{2,}|\[(((?(?<!\[)\.)(25[0-5]|2[0-4]\d|[01]?\d?\d)){4}|[a-zA-Z\d\-]*[a-zA-Z\d]:((?=[\x01-\x7f])[^\\\[\]]|\\[\x01-\x7f])+)\])(?(angl
e)>)$'
ErrorMessage="The email address provided is not formatted correctly."
Display="None" />
 
D

David Alexander

Since underscores are supported by some mail programs, I will revise my
regular expression to allow them. Thanks for letting me know. I spent a fair
amount of time reading RFC 821
(http://www.networksorcery.com/enp/rfc/rfc821.txt) which claims that only
alphanumerics and periods are allowed. The version I looked at was written
in 1982, though. Current practice probably differs a lot.

As a mail server admin, I wouldn't allow underscores, though, because so
many programs (like Outlook) display email addresses formatted as underlined
links. Except for technical people who are familiar with syntactic
requirements, a lot of people would end up thinking the underscore was a
space. (I still talk to people who ask me "No space?" when I am providing a
website URL or email address over the phone.)

jmhmaine said:
Good points, I didn't create this expression, I found it a year ago as a
sample. A few comments on your points:

There is a new top level domain .museum
http://musedoma.museum/
So you should allow up to 7 characters.

I believe I have seen _ underscores in emails before, not sure if legal per
RFC. I jusr ran a test sending a email (e-mail address removed) and Outlook
2003 sent it and my ISP forwarded it back to my POP3.

David Alexander said:
I am not sure you want to be using \w, because it is equivalent to
[a-zA-Z_0-9], which includes underscores. I don't think that underscores are
allowed in either mailbox or domain names, and your regular expression
allows first_last@subdomain_domain.com. I think that mailbox names must be
alphanumeric letter clusters separated by either periods or hyphens, (not
both), and domain names must be clusters of alphanumerics separated by
hyphens, separated by periods. Your regular expression excludes
(e-mail address removed) and allows (e-mail address removed).

I have a really long, ugly one that I am using, but if anyone can improve
it, or knows of a mail server that allows mailbox names with underscores, I
would appreciate being corrected. My regular expression limits the domain
type to 2-4 alpha characters. I have not heard of any longer than 4, i.e.,
".info", but if anyone knows of longer legal ones, please speak up

My regular expression is,

@"^(((([0-9A-Za-z]+(\-[0-9A-Za-z]+)*)|([0-9A-Za-z]+(\.[0-9A-Za-z]+)*))@([0-9
A-Za-z]+([-][0-9A-Za-z]+)*)(\.([0-9A-Za-z]+([-][0-9A-Za-z]+)*))*\.[a-zA-Z]{2
,4}))$";

jmhmaine said:
I use the following Regular Expression to check the format of the text
in
a
textbox when requesting an email:
ValidationExpression="^[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]@[a-zA-Z0-9][\w\.-]*[a-
zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]$"
It works great, but I just discovered a bug if someone provides a single
character before the @ sign. The following doesn't pass:

(e-mail address removed)

Is there a quick fix or better Expression to use? Thanks.
 
S

Steven Cheng[MSFT]

Hi Jmhmaine,

Thanks for your followup. I think the compile error is also due to the
compile system incorrectly escape such a complex expression when we set it
in the aspx page inline. Currently what I do is assign the
validtionExpression in code behind and escaping all the single , double
quote and back slash. For example:

private void Page_Load(object sender, System.EventArgs e)
{
rvEmail.ValidationExpression =
"^((?>[a-zA-Z\\d!#$%&\'*+\\-/=?^_`{|}~]+\\x20*|\"((?=[\\x01-\\x7f])[^\"\\\\]
|\\\\[\\x01-\\x7f])*\"\\x20*)*(?<angle><))?((?!\\.)(?>\\.?[a-zA-Z\\d!#$%&\'*
+\\-/=?^_`{|}~]+)+|\"((?=[\\x01-\\x7f])[^\"\\\\]|\\\\[\\x01-\\x7f])*\")@(((?
!-)[a-zA-Z\\d\\-]+(?<!-)\\.)+[a-zA-Z]{2,}|\\[(((?(?<!\\[)\\.)(25[0-5]|2[0-4]
\\d|[01]?\\d?\\d)){4}|[a-zA-Z\\d\\-]*[a-zA-Z\\d]:((?=[\\x01-\\x7f])[^\\\\\\[
\\]]|\\\\[\\x01-\\x7f])+)\\])(?(angle)>)$";

}

But we still can't make both the clientside and serverside work together,
need to turn off the clientside validation.
Hope this helps.

Thanks & Regards,

Steven Cheng
Microsoft Online Support

Get Secure! www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top