regexp test function behavior

H

HopfZ

I coudn't understand some behavior of RegExp.test function.

Example html code:
----------------
<html><head></head><body><script type="text/javascript">
var r = /^https?:\/\//g;
document.write( [
r.test('http://a'),
r.test('http://b'),
r.test('http://c'),
r.test('http://d')
]);
</script></body></html>
---------------------

The page displays true, false, true, false. (in Opera, Firefox and IE)
This is strange because I expected it would display true, true, true,
true. There must be something I didn't know about the function
RegExp.test.
 
M

Michael Winter

HopfZ wrote:

[snip]
var r = /^https?:\/\//g;
document.write( [
r.test('http://a'),
r.test('http://b'),
r.test('http://c'),
r.test('http://d')
]);
[snip]

The page displays true, false, true, false. (in Opera, Firefox and IE)
This is strange because I expected it would display true, true, true,
true. There must be something I didn't know about the function
RegExp.test.

The global flag is the cause of your confusion. It doesn't even make
sense for it to be included: you're using an expression with an
input-start assertion[1] (^) and that could only ever match once.

The RegExp.prototype.test method is equivalent to the expression,

re.exec(str) != null

and the global flag is significant when the RegExp.prototype.exec method
is used. After a match, the lastIndex property of the regular expression
object is modified to point just beyond the end of the previously
matched sub-string. On the next invocation of the exec method, this
position is used to begin the next search.

At the end of the first call, the lastIndex property will point beyond
the end of the match (to the character, 'a'). Whilst attempting to match
the input-start assertion (^) in the second call, the assertion will
fail (the match is attempted after the start of the string). These
attempts will continue until the end of the string is reached, at which
point the lastIndex property is reset to zero and null is returned. With
the lastIndex property reset, the third call can proceed normally like
the first. The fourth call will be a repeat of the second.

Mike


[1] With the multi-line flag, it also acts as a line-start
assertion, but that doesn't apply here.
 
E

Evertjan.

Michael Winter wrote on 29 okt 2006 in comp.lang.javascript:
The global flag is the cause of your confusion. It doesn't even make
sense for it to be included: you're using an expression with an
input-start assertion[1] (^) and that could only ever match once.

Even more so, setting the global flag in a test() never makes any sense.
At the end of the first call, the lastIndex property will point beyond
the end of the match

A good explanation.

Even so it is a bug!!!!

The global flag should either lead to an error,
or be disregarded in test().

===================================

Testing:

<script type='text/javascript'>

// IE7 tested

var r = /x/g;
document.write(r.test('x')+'<br>'); // true
document.write(r.test('x')+'<br>'); // false
document.write(r.test('x')+'<br>'); // true
document.write(r.test('x')+'<br>'); // false
document.write('<br>');
r = /x/g;
document.write(r.test('x')+'<br>'); // true
r = /x/g;
document.write(r.test('x')+'<br>'); // true
r = /x/g;
document.write(r.test('x')+'<br>'); // true
r = /x/g;
document.write(r.test('x')+'<br>'); // true

document.write('<br>');
r = /x/;
document.write(r.test('x')+'<br>'); // true
document.write(r.test('x')+'<br>'); // true
document.write(r.test('x')+'<br>'); // true
document.write(r.test('x')+'<br>'); // true


</script>
 
M

Michael Winter

Evertjan. wrote:

[snip]
Even more so, setting the global flag in a test() never makes any
sense.

Never? I don't know about that. Rare, certainly.

For example, one way to count the number of occurrences of a pattern
within a string is to use the String.prototype.match method[1]:

var result = string.match(regExp),
count = result ? result.length : 0;

where regExp is a regular expression object with the global flag set.
However, one could also do the same with the RegExp.prototype.test method:

function countMatches(string, pattern) {
var count = 0,
index = pattern.lastIndex = 0;

while (pattern.test(string)) {
++count;
if (pattern.lastIndex == index) {++pattern.lastIndex;}
}
return count;
}

var count = countMatches(string, regExp);

Marginally easier to use, and slightly more efficient in Fx and Op - the
test was simple: a case-sensitive, single character search. Even though
it's slower in MSIE, performance is still better than in either Fx or Op.

[snip]
Even so it is a bug!!!!

The global flag should either lead to an error,
or be disregarded in test().

Not at all. The blame would fall on the developer who used a global flag
where it didn't belong, or failed to reset the lastIndex property after
a previous invocation.

[snip]

Mike


[1] Browsers return null from the RegExp.prototype.match method
when both the global flag is set for the regular expression
object and no matches are found. It seems to me that
15.5.4.10, ECMA-262 3rd Ed. would call for an empty array.
Not that big a deal, but it would make the example above a
bit simpler.
 
E

Evertjan.

Michael Winter wrote on 30 okt 2006 in comp.lang.javascript:
Evertjan. wrote:

[snip]
Even more so, setting the global flag in a test() never makes any
sense.

Never? I don't know about that. Rare, certainly.
Never!

For example, one way to count the number of occurrences of a pattern
within a string is to use the String.prototype.match method[1]:

var result = string.match(regExp),
count = result ? result.length : 0;

Michael, I said: in "test()"

"match()" is not "test()"

[snap]
[snip]
Even so it is a bug!!!!

The global flag should either lead to an error,
or be disregarded in test().

Not at all.

But it is.

The subject of this thread is:
"regexp test function behavior"
not:
"regexp match function behavior"

"match()" is not "test()"

[snap]
[1] Browsers return null from the RegExp.prototype.match method

[snip]

"match()" is not "test()"
 
M

Michael Winter

Evertjan. said:
Michael Winter wrote on 30 okt 2006 in comp.lang.javascript:
Evertjan. wrote:

[snip]
Even more so, setting the global flag in a test() never makes any
sense.

Never? I don't know about that. Rare, certainly.

Never!

Care to state a reason?

The information that can be gleaned from using the RegExp.prototype.test
method in this way is limited[1], which is why such usage would be rare.
However, that is a far cry from claiming that it makes no sense. Indeed,
my previous post demonstrated a reasonable use.

The point of the global flag is to allow repetitive processing, where
the lastIndex property indicates the position from which the next
invocation starts. This would allow the test method to assert that there
is more than one match, or even that one begins after a certain point
should the lastIndex property be set explicitly. If that's all that's
required, then there's no need to use a method that would return more
information (and be wasteful, in the process).
For example, one way to count the number of occurrences of a pattern
within a string is to use the String.prototype.match method[1]:

var result = string.match(regExp),
count = result ? result.length : 0;

Michael, I said: in "test()"

I read what you wrote.
"match()" is not "test()"

If you read past the part that you quoted, you would notice that I go on
to present an equivalent using the test method and a regular expression
with the global flag set. Mentioning the String.prototype.match method
was merely a comparison.

I wrote a little more than that.
But it is.

Again, would you like to actually provide an explanation?

The test method is /defined/ in terms of the exec method; it is the
behavioural equivalent of:

regExp.exec(string) == null

/including/ all of the side effects that the exec method introduces. The
method should be used with that in mind, and if it's not, then it's the
fault of the developer and nobody else.

Note that an implementation doesn't have to use that exact expression.
Instead, it might copy the algorithm of the exec method (see 15.10.6.2),
except returning false instead of null in step 6, and returning true
instead of steps 12 and 13. This would save some time whilst providing
the same behaviour, however this latter issue is the most significant.
The subject of this thread is:
"regexp test function behavior"
not:
"regexp match function behavior"

I know. I answered the OP's question, did I not? Even so, threads drift.
"match()" is not "test()"

I hope you're going to feel a little silly now after banging on about
that so irrationally.
[1] Browsers return null from the RegExp.prototype.match method

[snip]

"match()" is not "test()"

That comment was an aside, which was why I presented it as an endnote.

Mike


[1] As far as I can see, only three facts can be obtained:

1. Whether the string matched the pattern (the return value
of the method itself),
2. The location of first character to follow the match just
obtained (the value of the lastIndex property), and
3. Whether the pattern matched a zero-length string (the
lastIndex property will not have changed).
 
E

Evertjan.

Michael Winter wrote on 30 okt 2006 in comp.lang.javascript:
I wrote a little more than that.


Again, would you like to actually provide an explanation?

The test method is /defined/ in terms of the exec method; it is the
behavioural equivalent of:

regExp.exec(string) == null

That is neither here nor there. A method is not defined as a behavioural
equivalent, it's behaviour is described. It's implementation could be
defined as a behavioural equivalent, but that could make the method
buggy,
as it does in this case.

Having a global flag in a test makes no sense, since the result is
stable at the first match, and further searching should be aborted.

The possible "defining" of test() in the sense of having a search
starting point left over by an earlier test(), only if the regex string
variable is not refreshed, is so strange, we can only call that a bug.
I know. I answered the OP's question, did I not? Even so, threads
drift.

You specificly said that my assertion was wrong, by stating an unrelated
code, not using test() but match().
I hope you're going to feel a little silly now after banging on about
that so irrationally.

Shall we keep on subject, Michael, or do you feel attacked in person?
 
M

Michael Winter

Evertjan. said:
Michael Winter wrote on 30 okt 2006 in comp.lang.javascript:
[snip]
The test method is /defined/ in terms of the exec method; it is the
behavioural equivalent of:

regExp.exec(string) == null

A typo on my part: the comparison operator should be not-equal (!=), of
course.
That is neither here nor there.

How so? It is a very succinct description of the behaviour of the method.
A method is not defined as a behavioural equivalent, it's behaviour
is described.

And it is: if the exec method were to return null or undefined, the test
method should return false. By examining the algorithm for the former,
one can ascertain precisely what is returned, where, and for what
reason, and how to modify the process to return booleans instead.
It's implementation could be defined as a behavioural equivalent, but
that could make the method buggy, as it does in this case.

I fail to see how.
Having a global flag in a test makes no sense, since the result is
stable at the first match, and further searching should be aborted.

That depends on what the test method is meant to do. Clearly, you have
decided upon a very limited definition. That does not make something the
language faulty; it means that your expectations are. The global flag
changes the behaviour of several methods related to regular expressions,
so it should only be used where that behaviour is desired.
The possible "defining" of test() in the sense of having a search
starting point left over by an earlier test(), only if the regex ^^^^^^^^^^^^^^^^^
string variable is not refreshed, is so strange, we can only call ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
that a bug.
What?

[snip]

You specificly said that my assertion was wrong, by stating an
unrelated code, not using test() but match().

If you really believe that, you didn't read my second post properly. In
fact, that would seem to indicate that you didn't read the previous one
properly, either, where I wrote (emphasis added):

... I go on to present an equivalent *using the test method*
and a regular expression with the global flag set. Mentioning
the String.prototype.match method was merely a comparison.

[snip]

Mike
 
E

Evertjan.

Michael Winter wrote on 31 okt 2006 in comp.lang.javascript:

As I wrote about before in this thread:

var r = /x/g;
// r, the regex string variable. will not be refreshed here:
document.write(r.test('x')+'<br>'); // true
document.write(r.test('x')+'<br>'); // false
document.write(r.test('x')+'<br>'); // true
document.write(r.test('x')+'<br>'); // false
document.write('<br>');
r = /x/g;
// r, will NOW be refreshed every time:
document.write(r.test('x')+'<br>'); // true
r = /x/g;
document.write(r.test('x')+'<br>'); // true
r = /x/g;
document.write(r.test('x')+'<br>'); // true
r = /x/g;
document.write(r.test('x')+'<br>'); // true
// if a litteral regex string is used, that works like refreshed:
document.write(/x/g.test('x')+'<br>'); // true
document.write(/x/g.test('x')+'<br>'); // true
document.write(/x/g.test('x')+'<br>'); // true

[..]
... I go on to present an equivalent *using the test method*
and a regular expression with the global flag set. Mentioning
the String.prototype.match method was merely a comparison.

You could perhaps be correctly explaining the behavour of test(),
I still fail to see why an explanation of a behavour of a js-method does
prevent that behavour to be a bug.

I certainly helps understanding a bug, so that we can programme "around"
it.

However, the above mentioned refreshing of the regex string variable
behavour difference is not explained, methinks.

Either way, I am still convinced we should call this a bug.
 
M

Michael Winter

Evertjan. wrote:

[snip]
var r = /x/g;
// r, the regex string variable. will not be refreshed here:
document.write(r.test('x')+'<br>'); // true
document.write(r.test('x')+'<br>'); // false
document.write(r.test('x')+'<br>'); // true
document.write(r.test('x')+'<br>'); // false
document.write('<br>');
r = /x/g;
// r, will NOW be refreshed every time:
document.write(r.test('x')+'<br>'); // true
r = /x/g;

[snip]

Ah, I see. If you wrote "regex object", that would have been more obvious.

Creating a new regular expression object is hardly necessary. Just set
the lastIndex property to zero:

var re = /x/g;

document.write(re.test('x') + '<br>'); // true
document.write(re.test('x') + '<br>'); // false
document.write(re.test('x') + '<br>'); // true
re.lastIndex = 0;
document.write(re.test('x') + '<br>'); // true

[snip]
You could perhaps be correctly explaining the behavour of test(), I
still fail to see why an explanation of a behavour of a js-method
does prevent that behavour to be a bug.

It doesn't, not automatically. Specifications can be badly thought out,
but, in my opinion, that doesn't apply in this case.

[snip]
However, the above mentioned refreshing of the regex string variable
behavour difference is not explained, methinks.

Each literal evaluates to an object reference (the object itself is
created before execution begins as the literal is scanned), and each of
those objects are completely different - they do not compare as equal
even if the literal is exactly the same. The test method will alter the
lastIndex property of the referenced object, but that object will
eventually be discarded and replaced by a new one.

[snip]

Mike
 
E

Evertjan.

Michael Winter wrote on 01 nov 2006 in comp.lang.javascript:
[..]
Each literal evaluates to an object reference (the object itself is
created before execution begins as the literal is scanned), and each of
those objects are completely different - they do not compare as equal
even if the literal is exactly the same. The test method will alter the
lastIndex property of the referenced object, but that object will
eventually be discarded and replaced by a new one.

I begin to see.

However, I think this construction while being useful in match() and exec
(), is a bad one in test(). I would never have allowed test() to change any
property of the regex object, even it's lastIndex property.
 
M

Michael Winter

Michael Winter wrote:

[snip]
function countMatches(string, pattern) {
var count = 0,
index = pattern.lastIndex = 0;

while (pattern.test(string)) {
++count;
if (pattern.lastIndex == index) {++pattern.lastIndex;} index = pattern.lastIndex;
}
return count;
}

Forgot to update the index variable.

[snip]

Mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,278
Latest member
BuzzDefenderpro

Latest Threads

Top