What's wrong with this regexp????

Discussion in 'Javascript' started by Ronald Fischer, Jun 25, 2004.

  1. I have a server-side JavaScript function returning a string.
    I would like to test wheather or not the string contains the following pattern:

    - an equal sign,
    - followed by one or more characters which are neither an ampersand nor an
    equal sign,
    - followed by another equal sign.

    That is: A return value of that function of
    X=ABCY=DEF should match, but
    X=ABC&Y=DEF should not match

    This is what I came up with:

    if(((/=[^&=]+=/).test(get_query_string())) != null)
    {
    // matches
    }
    else
    {
    // does not match
    }

    The problem is that the function matches too much. For example, if
    get_query_string() returns "LANG=EN", it matches too, although the
    string contains only a single equal sign!

    Any idea of what could be wrong here?

    Ronald
     
    Ronald Fischer, Jun 25, 2004
    #1
    1. Advertising

  2. Ronald Fischer

    Grant Wagner Guest

    Ronald Fischer wrote:

    > I have a server-side JavaScript function returning a string.
    > I would like to test wheather or not the string contains the following pattern:
    >
    > - an equal sign,
    > - followed by one or more characters which are neither an ampersand nor an
    > equal sign,
    > - followed by another equal sign.
    >
    > That is: A return value of that function of
    > X=ABCY=DEF should match, but
    > X=ABC&Y=DEF should not match
    >
    > This is what I came up with:
    >
    > if(((/=[^&=]+=/).test(get_query_string())) != null)
    > {
    > // matches
    > }
    > else
    > {
    > // does not match
    > }
    >
    > The problem is that the function matches too much. For example, if
    > get_query_string() returns "LANG=EN", it matches too, although the
    > string contains only a single equal sign!
    >
    > Any idea of what could be wrong here?
    >
    > Ronald


    I don't know if there's anything wrong with the regex, I haven't gotten that far.
    The reason it's matching everything is because RegExp.test() returns a boolean (two
    possible values, true or false). It can _never_ return null, so the "else" code
    block is _never_ executed, even when test() returns false. You also don't need so
    many brackets around stuff.

    Change: if(((/=[^&=]+=/).test(get_query_string())) != null)

    to: if (/=[^&=]+=/.test(get_query_string()))

    ....

    Now I've had a chance to look at the regex, and it seems right given the criteria
    you've specified.

    --
    | Grant Wagner <>

    * Client-side Javascript and Netscape 4 DOM Reference available at:
    *
    http://devedge.netscape.com/library/manuals/2000/javascript/1.3/reference/frames.html

    * Internet Explorer DOM Reference available at:
    *
    http://msdn.microsoft.com/workshop/author/dhtml/reference/dhtml_reference_entry.asp

    * Netscape 6/7 DOM Reference available at:
    * http://www.mozilla.org/docs/dom/domref/
    * Tips for upgrading JavaScript for Netscape 7 / Mozilla
    * http://www.mozilla.org/docs/web-developer/upgrade_2.html
     
    Grant Wagner, Jun 25, 2004
    #2
    1. Advertising

  3. Ronald Fischer wrote:
    > I have a server-side JavaScript function returning a string.
    > I would like to test wheather or not the string contains the
    > following pattern:
    >
    > - an equal sign,
    > - followed by one or more characters which are neither an
    > ampersand nor an equal sign,
    > - followed by another equal sign.
    >
    > That is: A return value of that function of
    > X=ABCY=DEF should match, but
    > X=ABC&Y=DEF should not match
    >
    > This is what I came up with:
    >
    > if(((/=[^&=]+=/).test(get_query_string())) != null)


    For the sake of legibility, omit some parantheses, then read the
    documentation of the test() method. It returns a *boolean* value
    (`true' or `false') which is always not equal to `null' which is
    why your test fails. You are looking for

    if (/=[^&=]+=/.test(get_query_string()))

    However, there are better ways to parse the query part of an URI.

    > [...]
    > The problem is that the function matches too much.


    No, it does not.


    PointedEars
     
    Thomas 'PointedEars' Lahn, Jun 25, 2004
    #3
  4. JRS: In article <>, seen in
    news:comp.lang.javascript, Grant Wagner <>
    posted at Fri, 25 Jun 2004 15:28:18 :
    >Ronald Fischer wrote:
    >
    >> I have a server-side JavaScript function returning a string.
    >> I would like to test wheather or not the string contains the following

    >pattern:


    Does "contains" mean "consists of only" or "has somewhere in itself" ?
    If the former, change the RegExp from /=[^&=]+=/ to /^=[^&=]+=$/

    But apparently not.

    >> That is: A return value of that function of
    >> X=ABCY=DEF should match, but
    >> X=ABC&Y=DEF should not match
    >>
    >> This is what I came up with:
    >>
    >> if(((/=[^&=]+=/).test(get_query_string())) != null)

    >. ...


    Better to write just

    OK = /=[^&=]+=/.test("test string")

    for initial test, and

    OK = /=[^&=]+=/.test(get_query_string())
    if (OK) { ...

    for actual use; it seems clearer.

    >Now I've had a chance to look at the regex, and it seems right given the
    >criteria
    >you've specified.


    OK by <URL:http://www.merlyn.demon.co.uk/js-quick.htm>
    OK by <URL:http://www.merlyn.demon.co.uk/js-valid.htm#RT>

    --
    © John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4 ©
    <URL:http://jibbering.com/faq/> JL / RC : FAQ for news:comp.lang.javascript
    <URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
    <URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
     
    Dr John Stockton, Jun 25, 2004
    #4
  5. Grant Wagner <> wrote in message news:<>...
    > Ronald Fischer wrote:
    > > I would like to test wheather or not the string contains the following pattern:
    > >
    > > - an equal sign,
    > > - followed by one or more characters which are neither an ampersand nor an
    > > equal sign,
    > > - followed by another equal sign.
    > >
    > > That is: A return value of that function of
    > > X=ABCY=DEF should match, but
    > > X=ABC&Y=DEF should not match
    > >
    > > This is what I came up with:
    > >
    > > if(((/=[^&=]+=/).test(get_query_string())) != null)
    > > {
    > > // matches
    > > }
    > > else
    > > {
    > > // does not match
    > > }

    > The reason it's matching everything is because RegExp.test() returns a boolean (two
    > possible values, true or false). It can _never_ return null, so the "else" code
    > block is _never_ executed, even when test() returns false.


    OK, got that.

    > Now I've had a chance to look at the regex, and it seems right given the criteria
    > you've specified.


    Interestingly, it seems to be NEARLY right. The problem is that we need
    to catch strings where some of the characters are not in the 7-Bit ASCII
    character set. One example which occurs in our case is the character
    with code 0xA4 (represented on our system as the so-called "international
    currency symbol"). It turns out that this character does NOT match the
    pattern [^&=]. Obviously, the JavaScript regexp pattern engine bails out
    for those characters (maybe because of the settings of the current locale).

    I wonder weather there is a portable way to catch such cases too with
    a regexp.... I think that, as a temporary solution, I will have to
    loop throught the string first and replace every occurence of the
    offending character 0xA4 by something more harmless (fortunately, this
    "loss of information" does not have any impact in my case, but it can't
    be regarded as a general solution, though).

    Ronald
     
    Ronald Fischer, Jul 7, 2004
    #5
  6. Ronald Fischer wrote:
    > [...] The problem is that we need
    > to catch strings where some of the characters are not in the 7-Bit ASCII
    > character set. One example which occurs in our case is the character
    > with code 0xA4 (represented on our system as the so-called "international
    > currency symbol"). It turns out that this character does NOT match the
    > pattern [^&=].


    It matches here. alert(/[^&=]/.test("\xA4")) yields `true' in
    Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8a2) Gecko/20040630
    Firefox/0.8.0+.

    > Obviously, the JavaScript regexp pattern engine bails out
    > for those characters (maybe because of the settings of the
    > current locale).


    Possibly.

    J(ava)Script strings are Unicode strings (more exact: UTF-16 strings,
    as [W3C] DOMStrings are), but only from JavaScript version 1.3 on and
    AIUI from JScript version 5.5 on. The Unicode character \u00A4 is the
    same as \xA4 in ISO-8859-1 (Latin-1) because Unicode shares code points
    \xA0 (\u00A0) to \xFF (\u00FF) with that encoding. However, the two
    characters should differ if your locale is not UTF-xx and not
    ISO-8859-1. For example, \xA4 should equal \u20AC (the Euro sign) in
    ISO-8859-15 (Latin-9).

    Interestingly, I have LC_ALL=de_DE@euro here, yet \xA4 and \u20AC differ
    in my UA which is said to interpret JavaScript 1.5. In that language,
    AFAIS in contrast to ECMAScript 3, it is specified that \xA4 means code
    point 0xA4 in ISO-8859-1 which is not equal to \u20AC (so my Mozilla is
    correct here, however the implementation is IMHO not standards
    compliant in this regard as it is not locale-aware). According to the
    JScript Reference, \xhh refers to "ASCII characters" there which would
    mean only \x00 to \x7F to be valid escape sequences. That would remove
    the locale dependency but I am afraid that they meant "Extended ASCII
    characters" rather than "US-ASCII characters", which would re-introduce it.

    > I wonder weather there is a portable way to catch such cases too
    > with a regexp....


    You can use alternation to include characters you require to be matched:

    /=([^&=]|\xA4)+=/.test(...)

    Use character classes if there is more than one character, e.g.:

    /=([^&=]|[\xA0-\xFF])+=/.test(...)


    PointedEars
     
    Thomas 'PointedEars' Lahn, Jul 7, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Greg Hurrell
    Replies:
    4
    Views:
    163
    James Edward Gray II
    Feb 14, 2007
  2. Mikel Lindsaar
    Replies:
    0
    Views:
    490
    Mikel Lindsaar
    Mar 31, 2008
  3. Joao Silva
    Replies:
    16
    Views:
    363
    7stud --
    Aug 21, 2009
  4. Uldis  Bojars
    Replies:
    2
    Views:
    192
    Janwillem Borleffs
    Dec 17, 2006
  5. Matìj Cepl

    new RegExp().test() or just RegExp().test()

    Matìj Cepl, Nov 24, 2009, in forum: Javascript
    Replies:
    3
    Views:
    181
    Matěj Cepl
    Nov 24, 2009
Loading...

Share This Page