regexp validator - wrong?

D

Dmitry Korolyov

ASP.NET app using c# and framework version 1.1.4322.573 on a IIS 6.0 web server.

A single-line asp:textbox control and regexp validator attached to it.

^\d+$ expression does match an empty string (when you don't enter any values) - this is wrong
d+ expression does not match, for example "g24" string - this is also wrong

www.regexplib.com test validator works fine for both cases, i.e. it is reporting "not match" for the first one and "match" for the second one. I am suspecting using different framework version from regexplib, and this being the source of the error. Do you have any other ideas?
 
S

Steve Jansen

Per MSDN on the RegularExpressionValidatorControl:

"Note: Validation succeeds if the input control is empty. If a value is
required for the associated input control, use a RequiredFieldValidator
control in addition to the RegularExpressionValidator control."

This is why it appears ^\d+$ is matched with an empty string.

Also, "d+" means match one or more "d" characters, which is why it does not
match "g24". You probably intended "^\w+$", meaning a single line string
with only alphanumerics [a-zA-Z_0-9].

-Steve Jansen

---------------------------
ASP.NET app using c# and framework version 1.1.4322.573 on a IIS 6.0 web
server.

A single-line asp:textbox control and regexp validator attached to it.

^\d+$ expression does match an empty string (when you don't enter any
values) - this is wrong
d+ expression does not match, for example "g24" string - this is
also wrong

www.regexplib.com test validator works fine for both cases, i.e. it is
reporting "not match" for the first one and "match" for the second one. I am
suspecting using different framework version from regexplib, and this being
the source of the error. Do you have any other ideas?
 
D

Dmitry Korolyov [MVP]

Thanks Steve.

1) Why the first example works the opposite (to what I see and what you have explained) at www.regexplib.com ? They have set up a testing area where you can test various regexps and see if they match or not the strings you enter.

2) \d+ means one or more digits. They can be anywhere within the string. This means "g2323" should match the regular expression, but it doesn't (although it does on the testing area of www.regexplib.com and in any other regexp-compatible language). Note that if I wanted a string which contains digits only, I'd use ^\d+$ regexp.

So I guess there should be some more ideas...

P.S. I'm still thinking of the different versions of the framework.

--
Dmitry Korolyov [[email protected]]
MVP: Windows Server - Active Directory


Per MSDN on the RegularExpressionValidatorControl:

"Note: Validation succeeds if the input control is empty. If a value is
required for the associated input control, use a RequiredFieldValidator
control in addition to the RegularExpressionValidator control."

This is why it appears ^\d+$ is matched with an empty string.

Also, "d+" means match one or more "d" characters, which is why it does not
match "g24". You probably intended "^\w+$", meaning a single line string
with only alphanumerics [a-zA-Z_0-9].

-Steve Jansen

---------------------------
ASP.NET app using c# and framework version 1.1.4322.573 on a IIS 6.0 web
server.

A single-line asp:textbox control and regexp validator attached to it.

^\d+$ expression does match an empty string (when you don't enter any
values) - this is wrong
d+ expression does not match, for example "g24" string - this is
also wrong

www.regexplib.com test validator works fine for both cases, i.e. it is
reporting "not match" for the first one and "match" for the second one. I am
suspecting using different framework version from regexplib, and this being
the source of the error. Do you have any other ideas?
 
D

Dmitry Korolyov [MVP]

Obviously I made a typo in the initial message, there should be \d+ instead of just d+

--
Dmitry Korolyov [[email protected]]
MVP: Windows Server - Active Directory


Per MSDN on the RegularExpressionValidatorControl:

"Note: Validation succeeds if the input control is empty. If a value is
required for the associated input control, use a RequiredFieldValidator
control in addition to the RegularExpressionValidator control."

This is why it appears ^\d+$ is matched with an empty string.

Also, "d+" means match one or more "d" characters, which is why it does not
match "g24". You probably intended "^\w+$", meaning a single line string
with only alphanumerics [a-zA-Z_0-9].

-Steve Jansen

---------------------------
ASP.NET app using c# and framework version 1.1.4322.573 on a IIS 6.0 web
server.

A single-line asp:textbox control and regexp validator attached to it.

^\d+$ expression does match an empty string (when you don't enter any
values) - this is wrong
d+ expression does not match, for example "g24" string - this is
also wrong

www.regexplib.com test validator works fine for both cases, i.e. it is
reporting "not match" for the first one and "match" for the second one. I am
suspecting using different framework version from regexplib, and this being
the source of the error. Do you have any other ideas?
 
M

Marina

1) .NET defines what it means for the regular expression validator to fire. And it ignores the empty string. The documentation says so - and it says to use a required field validator in addition, if your requirements are that the field be filled in.

How others chose to implement their regular expression validator is irrelevant.

2) '\d+' means that the entire string is just one or more digits. Not that there exists a substring of the original string with one or more digits.

The link you keep referring to, seems to see if there is a substring that matches.

For example, I put in '\d' for the expression, and 'asdf2sdf' for the test string.

Now, in reality, '\d' should only match a 1 character string that contains a digit. However, my string matched!

I disagree that this is actually a regular expression match for the string. There is a substring of my string that matches - but not the entire thing. In fact, absolutely anything will match, as long as there is at least one digit somewhere in it.

So the result from this web site is very misleading, and as far as I am concerned incorrect. If I am validating that someone enters a 5 digit zip code, but something 'a string 12345' is allowed to match - well, that's just plain wrong!
Thanks Steve.

1) Why the first example works the opposite (to what I see and what you have explained) at www.regexplib.com ? They have set up a testing area where you can test various regexps and see if they match or not the strings you enter.

2) \d+ means one or more digits. They can be anywhere within the string. This means "g2323" should match the regular expression, but it doesn't (although it does on the testing area of www.regexplib.com and in any other regexp-compatible language). Note that if I wanted a string which contains digits only, I'd use ^\d+$ regexp.

So I guess there should be some more ideas...

P.S. I'm still thinking of the different versions of the framework.

--
Dmitry Korolyov [[email protected]]
MVP: Windows Server - Active Directory


Per MSDN on the RegularExpressionValidatorControl:

"Note: Validation succeeds if the input control is empty. If a value is
required for the associated input control, use a RequiredFieldValidator
control in addition to the RegularExpressionValidator control."

This is why it appears ^\d+$ is matched with an empty string.

Also, "d+" means match one or more "d" characters, which is why it does not
match "g24". You probably intended "^\w+$", meaning a single line string
with only alphanumerics [a-zA-Z_0-9].

-Steve Jansen

---------------------------
ASP.NET app using c# and framework version 1.1.4322.573 on a IIS 6.0 web
server.

A single-line asp:textbox control and regexp validator attached to it.

^\d+$ expression does match an empty string (when you don't enter any
values) - this is wrong
d+ expression does not match, for example "g24" string - this is
also wrong

www.regexplib.com test validator works fine for both cases, i.e. it is
reporting "not match" for the first one and "match" for the second one. I am
suspecting using different framework version from regexplib, and this being
the source of the error. Do you have any other ideas?
 
D

Dmitry Korolyov [MVP]

Agree on the first one - if that is how it works by design. Yet it is still wrong from the position of the common sense. Since an empty string does not contain one or more digits, it should not match the regexp, still it does.

"How others chose to implement their regular expression validator is irrelevant." umm well if you say so. I've used regexp with perl some time before the whole .NET thing was here. And

And disagree on the second. '\d+' means a sting containing 1 or more digits anywhere. An entire string which contains only digits - that would be '^\d+$'. Therefore, regexp is handled incorrect here also. You're making wrong assumtions regarding the entire string thing. The entire string consists of:
1. Start of the string. This is '^' character in regular expressions syntax (if placed at the very beginning of the regular expression)
2. The string itself. This is the '\d+' pattern we use for "one or more digits"
3. End of the string. This is '$' character in regular expressions sytnax (if placed at the very end of the regular expression).
This is documented in .NET regexp help so you can look yourself.

In other words, 'asdf2sdf' will match the '\d' regexp as this is a string which contains a digit. If you need a regexp for 5-digit zip code, you should use ^\d\d\d\d\d$ or ^\d{5}$ pattern. The test area at www.regexplib.com returns absolutely correct results - in terms of what is referred to as "regular expressions" by anyone who works with regular expressions. In my initial message I was wondering why my .NET framework gives incorrect results, and my suggestion is that the website uses some different version.

--
Dmitry Korolyov [[email protected]]
MVP: Windows Server - Active Directory


"Marina" <nospam> wrote in message 1) .NET defines what it means for the regular expression validator to fire. And it ignores the empty string. The documentation says so - and it says to use a required field validator in addition, if your requirements are that the field be filled in.

How others chose to implement their regular expression validator is irrelevant.

2) '\d+' means that the entire string is just one or more digits. Not that there exists a substring of the original string with one or more digits.

The link you keep referring to, seems to see if there is a substring that matches.

For example, I put in '\d' for the expression, and 'asdf2sdf' for the test string.

Now, in reality, '\d' should only match a 1 character string that contains a digit. However, my string matched!

I disagree that this is actually a regular expression match for the string. There is a substring of my string that matches - but not the entire thing. In fact, absolutely anything will match, as long as there is at least one digit somewhere in it.

So the result from this web site is very misleading, and as far as I am concerned incorrect. If I am validating that someone enters a 5 digit zip code, but something 'a string 12345' is allowed to match - well, that's just plain wrong!
Thanks Steve.

1) Why the first example works the opposite (to what I see and what you have explained) at www.regexplib.com ? They have set up a testing area where you can test various regexps and see if they match or not the strings you enter.

2) \d+ means one or more digits. They can be anywhere within the string. This means "g2323" should match the regular expression, but it doesn't (although it does on the testing area of www.regexplib.com and in any other regexp-compatible language). Note that if I wanted a string which contains digits only, I'd use ^\d+$ regexp.

So I guess there should be some more ideas...

P.S. I'm still thinking of the different versions of the framework.

--
Dmitry Korolyov [[email protected]]
MVP: Windows Server - Active Directory


Per MSDN on the RegularExpressionValidatorControl:

"Note: Validation succeeds if the input control is empty. If a value is
required for the associated input control, use a RequiredFieldValidator
control in addition to the RegularExpressionValidator control."

This is why it appears ^\d+$ is matched with an empty string.

Also, "d+" means match one or more "d" characters, which is why it does not
match "g24". You probably intended "^\w+$", meaning a single line string
with only alphanumerics [a-zA-Z_0-9].

-Steve Jansen

---------------------------
ASP.NET app using c# and framework version 1.1.4322.573 on a IIS 6.0 web
server.

A single-line asp:textbox control and regexp validator attached to it.

^\d+$ expression does match an empty string (when you don't enter any
values) - this is wrong
d+ expression does not match, for example "g24" string - this is
also wrong

www.regexplib.com test validator works fine for both cases, i.e. it is
reporting "not match" for the first one and "match" for the second one. I am
suspecting using different framework version from regexplib, and this being
the source of the error. Do you have any other ideas?
 
M

mikeb

Dmitry said:
Thanks Steve.

1) Why the first example works the opposite (to what I see and what you
have explained) at www.regexplib.com <http://www.regexplib.com> ? They
have set up a testing area where you can test various regexps and see if
they match or not the strings you enter.

The RegularExpressionValidator is documented to succeed if the input
control is empty (ie., the Regexp is not even run in this case). That
might not be intuitive for you, but it's how MS decided it should work.
2) \d+ means one or more digits. They can be anywhere within the string.
This means "g2323" should match the regular expression, but it doesn't
(although it does on the testing area of
<http://www.regexplib.com> www.regexplib.com <http://www.regexplib.com>
and in any other regexp-compatible language). Note that if I wanted a
string which contains digits only, I'd use ^\d+$ regexp.

So I guess there should be some more ideas...

Looking at the IL for the RegularExpressionValidator, it appears that MS
made an undocumented decision such that it will return a successful
validation only when the regex matches the entire contents of the control.

I'd agree with you that this is counter-intuitive, and it appears to be
undocumented. I'm not sure whether MS would consider this a bug in
implementation or a bug in documentation. In any case, the behavior
exists in both current versions of the Framework (1.0 SP2 and 1.1).

So, if you want your Regex's to match with the behavior that MS
hard-coded for RegularExpressionValidator, the ValidationExpression
should always be bounded by the ^ and $ characters.
P.S. I'm still thinking of the different versions of the framework.

--
Dmitry Korolyov [[email protected]]
MVP: Windows Server - Active Directory



message Per MSDN on the RegularExpressionValidatorControl:

"Note: Validation succeeds if the input control is empty. If a
value is
required for the associated input control, use a RequiredFieldValidator
control in addition to the RegularExpressionValidator control."

This is why it appears ^\d+$ is matched with an empty string.

Also, "d+" means match one or more "d" characters, which is why it
does not
match "g24". You probably intended "^\w+$", meaning a single line
string
with only alphanumerics [a-zA-Z_0-9].

-Steve Jansen

---------------------------
"Dmitry Korolyov" <[email protected]
ASP.NET app using c# and framework version 1.1.4322.573 on a IIS 6.0 web
server.

A single-line asp:textbox control and regexp validator attached to it.

^\d+$ expression does match an empty string (when you don't
enter any
values) - this is wrong
d+ expression does not match, for example "g24" string - this is
also wrong

www.regexplib.com <http://www.regexplib.com> test validator works
fine for both cases, i.e. it is
reporting "not match" for the first one and "match" for the second
one. I am
suspecting using different framework version from regexplib, and
this being
the source of the error. Do you have any other ideas?
 
D

Dmitry Korolyov [MVP]

That's warmer Mike. But regexplib website shows us absolutely correct behavior - or do they use custom handling?

--
Dmitry Korolyov [[email protected]]
MVP: Windows Server - Active Directory


mikeb said:
Thanks Steve.

1) Why the first example works the opposite (to what I see and what you
have explained) at www.regexplib.com <http://www.regexplib.com> ? They
have set up a testing area where you can test various regexps and see if
they match or not the strings you enter.

The RegularExpressionValidator is documented to succeed if the input
control is empty (ie., the Regexp is not even run in this case). That
might not be intuitive for you, but it's how MS decided it should work.
2) \d+ means one or more digits. They can be anywhere within the string.
This means "g2323" should match the regular expression, but it doesn't
(although it does on the testing area of
<http://www.regexplib.com> www.regexplib.com <http://www.regexplib.com>
and in any other regexp-compatible language). Note that if I wanted a
string which contains digits only, I'd use ^\d+$ regexp.

So I guess there should be some more ideas...

Looking at the IL for the RegularExpressionValidator, it appears that MS
made an undocumented decision such that it will return a successful
validation only when the regex matches the entire contents of the control.

I'd agree with you that this is counter-intuitive, and it appears to be
undocumented. I'm not sure whether MS would consider this a bug in
implementation or a bug in documentation. In any case, the behavior
exists in both current versions of the Framework (1.0 SP2 and 1.1).

So, if you want your Regex's to match with the behavior that MS
hard-coded for RegularExpressionValidator, the ValidationExpression
should always be bounded by the ^ and $ characters.
P.S. I'm still thinking of the different versions of the framework.

--
Dmitry Korolyov [[email protected]]
MVP: Windows Server - Active Directory



message Per MSDN on the RegularExpressionValidatorControl:

"Note: Validation succeeds if the input control is empty. If a
value is
required for the associated input control, use a RequiredFieldValidator
control in addition to the RegularExpressionValidator control."

This is why it appears ^\d+$ is matched with an empty string.

Also, "d+" means match one or more "d" characters, which is why it
does not
match "g24". You probably intended "^\w+$", meaning a single line
string
with only alphanumerics [a-zA-Z_0-9].

-Steve Jansen

---------------------------
"Dmitry Korolyov" <[email protected]
ASP.NET app using c# and framework version 1.1.4322.573 on a IIS 6.0 web
server.

A single-line asp:textbox control and regexp validator attached to it.

^\d+$ expression does match an empty string (when you don't
enter any
values) - this is wrong
d+ expression does not match, for example "g24" string - this is
also wrong

www.regexplib.com <http://www.regexplib.com> test validator works
fine for both cases, i.e. it is
reporting "not match" for the first one and "match" for the second
one. I am
suspecting using different framework version from regexplib, and
this being
the source of the error. Do you have any other ideas?
 
M

mikeb

Dmitry said:
That's warmer Mike. But regexplib website shows us absolutely correct
behavior - or do they use custom handling?

There's extra code in the handling of the RegularExpressionValidator.
Pseudo-code looks something like:

string len = controlToValidate.Text.Length;

if (len == 0) {
// nothing in the control - automatically validated
return( true);
}

Match m = regex.Match( controlToValidate.Text);

if (m.Success && (m.Length == len)) {
return( true);
}

return( false);



--
Dmitry Korolyov [[email protected]]
MVP: Windows Server - Active Directory



"mikeb" <[email protected]
Dmitry said:
Thanks Steve.

1) Why the first example works the opposite (to what I see and what you
have explained) at www.regexplib.com <http://www.regexplib.com>
have set up a testing area where you can test various regexps and see if
they match or not the strings you enter.

The RegularExpressionValidator is documented to succeed if the input
control is empty (ie., the Regexp is not even run in this case). That
might not be intuitive for you, but it's how MS decided it should work.
2) \d+ means one or more digits. They can be anywhere within the string.
This means "g2323" should match the regular expression, but it doesn't
(although it does on the testing area of
<http://www.regexplib.com> www.regexplib.com
and in any other regexp-compatible language). Note that if I wanted a
string which contains digits only, I'd use ^\d+$ regexp.

So I guess there should be some more ideas...

Looking at the IL for the RegularExpressionValidator, it appears
that MS
made an undocumented decision such that it will return a successful
validation only when the regex matches the entire contents of the
control.

I'd agree with you that this is counter-intuitive, and it appears to be
undocumented. I'm not sure whether MS would consider this a bug in
implementation or a bug in documentation. In any case, the behavior
exists in both current versions of the Framework (1.0 SP2 and 1.1).

So, if you want your Regex's to match with the behavior that MS
hard-coded for RegularExpressionValidator, the ValidationExpression
should always be bounded by the ^ and $ characters.
P.S. I'm still thinking of the different versions of the framework.

--
Dmitry Korolyov [[email protected]]
MVP: Windows Server - Active Directory



"Steve Jansen" <[email protected] <mailto:[email protected]>
message Per MSDN on the RegularExpressionValidatorControl:

"Note: Validation succeeds if the input control is empty. If a
value is
required for the associated input control, use a RequiredFieldValidator
control in addition to the RegularExpressionValidator control."

This is why it appears ^\d+$ is matched with an empty string.

Also, "d+" means match one or more "d" characters, which is why it
does not
match "g24". You probably intended "^\w+$", meaning a single line
string
with only alphanumerics [a-zA-Z_0-9].

-Steve Jansen
ASP.NET app using c# and framework version 1.1.4322.573 on a IIS 6.0 web
server.

A single-line asp:textbox control and regexp validator attached to it.

^\d+$ expression does match an empty string (when you don't
enter any
values) - this is wrong
d+ expression does not match, for example "g24" string - this is
also wrong

www.regexplib.com <http://www.regexplib.com>
fine for both cases, i.e. it is
reporting "not match" for the first one and "match" for the second
one. I am
suspecting using different framework version from regexplib, and
this being
the source of the error. Do you have any other ideas?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top