What's wrong with this regexp????

R

Ronald Fischer

I have a server-side JavaScript function returning a string.
I would like to test wheather or not the string contains the following pattern:

- an equal sign,
- followed by one or more characters which are neither an ampersand nor an
equal sign,
- followed by another equal sign.

That is: A return value of that function of
X=ABCY=DEF should match, but
X=ABC&Y=DEF should not match

This is what I came up with:

if(((/=[^&=]+=/).test(get_query_string())) != null)
{
// matches
}
else
{
// does not match
}

The problem is that the function matches too much. For example, if
get_query_string() returns "LANG=EN", it matches too, although the
string contains only a single equal sign!

Any idea of what could be wrong here?

Ronald
 
G

Grant Wagner

Ronald said:
I have a server-side JavaScript function returning a string.
I would like to test wheather or not the string contains the following pattern:

- an equal sign,
- followed by one or more characters which are neither an ampersand nor an
equal sign,
- followed by another equal sign.

That is: A return value of that function of
X=ABCY=DEF should match, but
X=ABC&Y=DEF should not match

This is what I came up with:

if(((/=[^&=]+=/).test(get_query_string())) != null)
{
// matches
}
else
{
// does not match
}

The problem is that the function matches too much. For example, if
get_query_string() returns "LANG=EN", it matches too, although the
string contains only a single equal sign!

Any idea of what could be wrong here?

Ronald

I don't know if there's anything wrong with the regex, I haven't gotten that far.
The reason it's matching everything is because RegExp.test() returns a boolean (two
possible values, true or false). It can _never_ return null, so the "else" code
block is _never_ executed, even when test() returns false. You also don't need so
many brackets around stuff.

Change: if(((/=[^&=]+=/).test(get_query_string())) != null)

to: if (/=[^&=]+=/.test(get_query_string()))

....

Now I've had a chance to look at the regex, and it seems right given the criteria
you've specified.

--
| Grant Wagner <[email protected]>

* Client-side Javascript and Netscape 4 DOM Reference available at:
*
http://devedge.netscape.com/library/manuals/2000/javascript/1.3/reference/frames.html

* Internet Explorer DOM Reference available at:
*
http://msdn.microsoft.com/workshop/author/dhtml/reference/dhtml_reference_entry.asp

* Netscape 6/7 DOM Reference available at:
* http://www.mozilla.org/docs/dom/domref/
* Tips for upgrading JavaScript for Netscape 7 / Mozilla
* http://www.mozilla.org/docs/web-developer/upgrade_2.html
 
T

Thomas 'PointedEars' Lahn

Ronald said:
I have a server-side JavaScript function returning a string.
I would like to test wheather or not the string contains the
following pattern:

- an equal sign,
- followed by one or more characters which are neither an
ampersand nor an equal sign,
- followed by another equal sign.

That is: A return value of that function of
X=ABCY=DEF should match, but
X=ABC&Y=DEF should not match

This is what I came up with:

if(((/=[^&=]+=/).test(get_query_string())) != null)

For the sake of legibility, omit some parantheses, then read the
documentation of the test() method. It returns a *boolean* value
(`true' or `false') which is always not equal to `null' which is
why your test fails. You are looking for

if (/=[^&=]+=/.test(get_query_string()))

However, there are better ways to parse the query part of an URI.
[...]
The problem is that the function matches too much.

No, it does not.


PointedEars
 
D

Dr John Stockton

JRS: In article <[email protected]>, seen in
news:comp.lang.javascript said:

Does "contains" mean "consists of only" or "has somewhere in itself" ?
If the former, change the RegExp from /=[^&=]+=/ to /^=[^&=]+=$/

But apparently not.
That is: A return value of that function of
X=ABCY=DEF should match, but
X=ABC&Y=DEF should not match

This is what I came up with:

if(((/=[^&=]+=/).test(get_query_string())) != null)
. ...

Better to write just

OK = /=[^&=]+=/.test("test string")

for initial test, and

OK = /=[^&=]+=/.test(get_query_string())
if (OK) { ...

for actual use; it seems clearer.
Now I've had a chance to look at the regex, and it seems right given the
criteria
you've specified.

OK by <URL:http://www.merlyn.demon.co.uk/js-quick.htm>
OK by <URL:http://www.merlyn.demon.co.uk/js-valid.htm#RT>
 
R

Ronald Fischer

Grant Wagner said:
Ronald said:
I would like to test wheather or not the string contains the following pattern:

- an equal sign,
- followed by one or more characters which are neither an ampersand nor an
equal sign,
- followed by another equal sign.

That is: A return value of that function of
X=ABCY=DEF should match, but
X=ABC&Y=DEF should not match

This is what I came up with:

if(((/=[^&=]+=/).test(get_query_string())) != null)
{
// matches
}
else
{
// does not match
}
The reason it's matching everything is because RegExp.test() returns a boolean (two
possible values, true or false). It can _never_ return null, so the "else" code
block is _never_ executed, even when test() returns false.

OK, got that.
Now I've had a chance to look at the regex, and it seems right given the criteria
you've specified.

Interestingly, it seems to be NEARLY right. The problem is that we need
to catch strings where some of the characters are not in the 7-Bit ASCII
character set. One example which occurs in our case is the character
with code 0xA4 (represented on our system as the so-called "international
currency symbol"). It turns out that this character does NOT match the
pattern [^&=]. Obviously, the JavaScript regexp pattern engine bails out
for those characters (maybe because of the settings of the current locale).

I wonder weather there is a portable way to catch such cases too with
a regexp.... I think that, as a temporary solution, I will have to
loop throught the string first and replace every occurence of the
offending character 0xA4 by something more harmless (fortunately, this
"loss of information" does not have any impact in my case, but it can't
be regarded as a general solution, though).

Ronald
 
T

Thomas 'PointedEars' Lahn

Ronald said:
[...] The problem is that we need
to catch strings where some of the characters are not in the 7-Bit ASCII
character set. One example which occurs in our case is the character
with code 0xA4 (represented on our system as the so-called "international
currency symbol"). It turns out that this character does NOT match the
pattern [^&=].

It matches here. alert(/[^&=]/.test("\xA4")) yields `true' in
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8a2) Gecko/20040630
Firefox/0.8.0+.
Obviously, the JavaScript regexp pattern engine bails out
for those characters (maybe because of the settings of the
current locale).

Possibly.

J(ava)Script strings are Unicode strings (more exact: UTF-16 strings,
as [W3C] DOMStrings are), but only from JavaScript version 1.3 on and
AIUI from JScript version 5.5 on. The Unicode character \u00A4 is the
same as \xA4 in ISO-8859-1 (Latin-1) because Unicode shares code points
\xA0 (\u00A0) to \xFF (\u00FF) with that encoding. However, the two
characters should differ if your locale is not UTF-xx and not
ISO-8859-1. For example, \xA4 should equal \u20AC (the Euro sign) in
ISO-8859-15 (Latin-9).

Interestingly, I have LC_ALL=de_DE@euro here, yet \xA4 and \u20AC differ
in my UA which is said to interpret JavaScript 1.5. In that language,
AFAIS in contrast to ECMAScript 3, it is specified that \xA4 means code
point 0xA4 in ISO-8859-1 which is not equal to \u20AC (so my Mozilla is
correct here, however the implementation is IMHO not standards
compliant in this regard as it is not locale-aware). According to the
JScript Reference, \xhh refers to "ASCII characters" there which would
mean only \x00 to \x7F to be valid escape sequences. That would remove
the locale dependency but I am afraid that they meant "Extended ASCII
characters" rather than "US-ASCII characters", which would re-introduce it.
I wonder weather there is a portable way to catch such cases too
with a regexp....

You can use alternation to include characters you require to be matched:

/=([^&=]|\xA4)+=/.test(...)

Use character classes if there is more than one character, e.g.:

/=([^&=]|[\xA0-\xFF])+=/.test(...)


PointedEars
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top