validate a text input field. (again)

Discussion in 'Javascript' started by Eddie, Dec 6, 2003.

  1. Eddie

    Eddie Guest

    I need to validate a text input field.

    I just want to say if user enters

    93101 or 93102 or 93103 or 93105 or 93106 or 93107 or 93108 or 93109
    or 93110 or 93111 or 93116 or 93117 or 93118 or 93120 or 93121 or
    93130 or 93140 or 93150 or 93160 or 93190 or 93199 or 93199 or 93401
    or 93402 or 93403 or 93405 or 93406 or 93407 or 93408 or 93409 or
    93410 or 93412

    he can not submit the form. (because we do not service that area)

    Any help would be greatly appreciated.
     
    Eddie, Dec 6, 2003
    #1
    1. Advertising

  2. > I need to validate a text input field.
    >
    > I just want to say if user enters
    >
    > 93101 or 93102 or 93103 or 93105 or 93106 or 93107 or 93108 or 93109
    > or 93110 or 93111 or 93116 or 93117 or 93118 or 93120 or 93121 or
    > 93130 or 93140 or 93150 or 93160 or 93190 or 93199 or 93199 or 93401
    > or 93402 or 93403 or 93405 or 93406 or 93407 or 93408 or 93409 or
    > 93410 or 93412
    >
    > he can not submit the form. (because we do not service that area)
    >
    > Any help would be greatly appreciated.


    var zone = {
    '93101': 1,
    '93102': 1,
    '93103': 1,
    '93105': 1,
    '93106': 1,
    '93107': 1,
    '93108': 1,
    '93109': 1,
    '93110': 1,
    '93111': 1,
    '93116': 1,
    '93117': 1,
    '93118': 1,
    '93120': 1,
    '93121': 1,
    '93130': 1,
    '93140': 1,
    '93150': 1,
    '93160': 1,
    '93190': 1,
    '93199': 1,
    '93199': 1,
    '93401': 1,
    '93402': 1,
    '93403': 1,
    '93405': 1,
    '93406': 1,
    '93407': 1,
    '93408': 1,
    '93409': 1,
    '93410': 1,
    '93412': 1};


    if (zone[input] == 1) {
    // reject
    } else {
    // accept
    }

    http://www.JSON.org/
     
    Douglas Crockford, Dec 6, 2003
    #2
    1. Advertising

  3. Douglas Crockford wrote:

    >> I just want to say if user enters
    >>
    >> 93101 or 93102 or 93103 or 93105 or 93106 or 93107 or 93108 or 93109
    >> or 93110 or 93111 or 93116 or 93117 or 93118 or 93120 or 93121 or
    >> 93130 or 93140 or 93150 or 93160 or 93190 or 93199 or 93199 or 93401
    >> or 93402 or 93403 or 93405 or 93406 or 93407 or 93408 or 93409 or
    >> 93410 or 93412
    >>
    >> he can not submit the form. (because we do not service that area)
    >>
    >> Any help would be greatly appreciated.

    >
    > [Lengthy object definition]
    >
    > if (zone[input] == 1) {
    > // reject
    > } else {
    > // accept
    > }


    OMG. Have you just forgot that there are RegExp?

    function checkMe(o)
    {

    return(!/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|40[1-35-9]|41[02])$/.test(o.value));
    }

    <form ... onsubmit="return checkMe(this.elements['bla'])">
    <input name="bla">
    </form>


    PointedEars
     
    Thomas 'PointedEars' Lahn, Dec 6, 2003
    #3
  4. Thomas 'PointedEars' Lahn <> writes:

    > OMG. Have you just forgot that there are RegExp?


    Most likely not. He gave a generic way to test for a finite number of
    strings. It wokrs whether there are structure to the strings or not.

    Regexps take more work to make, and are harder to read. And *much*
    harder to extend with new numbers, if it becomes necessary

    > return(!/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|40[1-35-9]|41[02])$/.test(o.value));
    > }


    Are you sure that your regexp matches exactly the correct strings? :)
    (It probably does, but comparing RegExps to string is ExpSpace complete
    in general, so very hard to do).

    /L
    --
    Lasse Reichstein Nielsen -
    DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
    'Faith without judgement merely degrades the spirit divine.'
     
    Lasse Reichstein Nielsen, Dec 6, 2003
    #4
  5. JRS: In article <>, seen
    in news:comp.lang.javascript, Eddie <> posted at Fri,
    5 Dec 2003 16:38:46 :-

    >I just want to say if user enters
    >
    >93101 or 93102 or 93103 or 93105 or 93106 or 93107 or 93108 or 93109
    >or 93110 or 93111 or 93116 or 93117 or 93118 or 93120 or 93121 or
    >93130 or 93140 or 93150 or 93160 or 93190 or 93199 or 93199 or 93401
    >or 93402 or 93403 or 93405 or 93406 or 93407 or 93408 or 93409 or
    >93410 or 93412
    >
    >he can not submit the form. (because we do not service that area)



    It seems likely, from the above, that all outside 93xxx are likely to
    remain serviceable; OTOH, the list may change.

    To save repetitive typing and gain run-time efficiency, one can first
    test for the 93; after that, it is well to minimise the size of the
    code. Consider, but with the full test list,

    S = '93103'

    OK = S.substring(0, 2) != "93" ||
    '101 102 103 105 106 107 108 109 110'.indexOf(S.substring(2))<0


    If those are postal codes, what do you do if someone enters "SW1A 1AA" ?

    --
    © John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4 ©
    <URL:http://jibbering.com/faq/> Jim Ley's FAQ for news:comp.lang.javascript
    <URL:http://www.merlyn.demon.co.uk/js-index.htm> JS maths, dates, sources.
    <URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/JS/&c., FAQ topics, links.
     
    Dr John Stockton, Dec 6, 2003
    #5
  6. Eddie

    @SM Guest

    Eddie a ecrit :

    > I need to validate a text input field.
    >
    > I just want to say if user enters
    >
    > 93101 or 93102 or 93103 or 93105 or 93106 or 93107 or 93108 or 93109
    > or 93110 or 93111 or 93116 or 93117 or 93118 or 93120 or 93121 or
    > 93130 or 93140 or 93150 or 93160 or 93190 or 93199 or 93199 or 93401
    > or 93402 or 93403 or 93405 or 93406 or 93407 or 93408 or 93409 or
    > 93410 or 93412
    >
    > he can not submit the form. (because we do not service that area)
    >
    > Any help would be greatly appreciated.


    an easy way to actualize this list of numbers could be

    <script type="text/javascript"><!--

    N = '93101,93102,93103,93105,93106,93107,93108,93109,93110,93111';
    N += '93116,93117,93118,93120,93121';
    N += '93130,93140,93150,93160,93190';
    N += '93199,93199'; // funny '93199' 2 times ! ?
    N += '93401,93402,93403,93405,93406,93407,93408,93409,93410,93412';

    // or, if prefered,
    // N = ' all the numbers separted by coma on an alone line ';
    // ( don't forget 1st and last quotte )

    N = N.split(','); // N = array of all numbers
    var ok=0; // needed for the onsubmit

    function validTextField(the_value) {
    ok = 0;
    for(var i=0:i<N.length;i++)
    if(the_value==N) ok=1;

    // next lines for example, not needed
    if(ok==1)
    alert('You win !');
    else
    alert('Sorry! Lost ...');
    }

    // --></script>

    <form action="code.php"
    onsubmit="ok==1? return true :
    {alert('uncorrect code !); return false ;}">
    Enter your code here :
    <input type=text onchange="validTextField(this.value);">
    <input type=submit value="Validate">
    </form>
     
    @SM, Dec 6, 2003
    #6
  7. Lasse Reichstein Nielsen wrote:

    > Thomas 'PointedEars' Lahn <> writes:
    >> OMG. Have you just forgot that there are RegExp?

    >
    > Most likely not. He gave a generic way to test for a finite number of
    > strings. It wokrs whether there are structure to the strings or not.


    Undoubtedly. But his method consumes much more memory and computing
    time than mine, no matter if the strings are structured or not. IOW:
    Compared to my method, his is highly inefficient in *every* case.

    > Regexps take more work to make, and are harder to read.


    Not generally, no.

    > And *much* harder to extend with new numbers, if it becomes necessary


    No, see below.

    >> return(!/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|40[1-35-9]|41[02])$/.test(o.value));
    >> }

    >
    > Are you sure that your regexp matches exactly the correct strings? :)


    Pretty sure.

    > (It probably does, but comparing RegExps to string is ExpSpace complete

    ^^^^^^^^^^^^^^^^
    Define that.

    > in general, so very hard to do).


    Not at all. It is primarily a matter of structured building of the
    RegExp, finding similarities first.

    See the numbers again the RegExp should match. I remove the duplicate
    93199 and group the numbers so one sees clearly what they have in common.

    93101, 93102, 93103, 93105, 93106, 93107, 93108, 93109,
    93110, 93111, 93116, 93117, 93118,
    93120, 93121,
    93130, 93140, 93150, 93160, 93190,
    93199,

    93401, 93402, 93403, 93405, 93406, 93407, 93408, 93409
    93410, 93412

    Obviously all numbers begin with 93:

    /^93/

    There are numbers continuing with 1 and with 4:

    /^93(1|4)/

    Numbers continuing with 1 continue with either 0 to 6, or 9:

    /^93(1(0|1|2|3|4|5|6|9)|4)/

    Numbers continuing from there with 0 continue with digits from 1 to 9,
    except of 4:

    /^93(1(0[1-35-9]|1|2|3|4|5|6|9)|4)/

    Numbers continuing from there with 1 continue with 0, 1, and 6 to 8:

    /^93(1(0[1-35-9]|1[016-8]|2|3|4|5|6|9)|4)/

    Numbers continuing from there with 2 continue with either 0 or 1:

    /^93(1(0[1-35-9]|1[016-8]|2[01]|3|4|5|6|9)|4)/

    Numbers continuing from there with 3 to 6 and 9 continue with 0:

    /^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0)|4)/

    If the fourth digit was 9, also 9 can follow:

    /^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4)/

    (One could have also grouped 93190 and 93199 together:
    ...|[3-6]0|9[09])...)

    Numbers having a 4 as third digit continue with either 0 or 1:

    /^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4[01])/

    Because the fourth digit is followed by different sets of digits we write

    /^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0|1))/

    instead.

    If the third digit is 4 and the fourth digit is 0, digits from 1 to 3
    and 5 to 9 may follow:

    /^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1))/

    If the third digit is 4 and the fourth digit is 1, the fifth may be
    only 0 and 2:

    /^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02]))/

    For we match whole numbers, we finally add the end-of-text meta character:

    /^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02]))$/

    Now compare that to my RegExp which was built (but only in mind)
    using the same procedure:

    /^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|40[1-35-9]|41[02])$/

    The only difference is that I wrote `40[...]|41[...]' instead of
    `4(0[...]|1[...])' which is semantically equal, though.

    You will not tell me that the above was hard work, will you?

    Because the RegExp was built *this* way, it is easy as well to find out
    the strings it will match, going from left to right, creating a branch
    in a build tree every time we find an alternative (including sets of
    characters):

    93
    931
    9310
    93101
    93102
    93103
    93105
    93106
    93107
    93108
    93109
    9311
    93110
    93111
    93116
    93117
    93118
    9312
    93120
    93121
    9313
    93130
    9314
    93140
    9315
    93150
    9316
    93160
    9319
    93190
    93199
    934
    9340
    93401
    93402
    93403
    93405
    93406
    93407
    93408
    93409
    9341
    93410
    93412

    We take only the leaves of the build tree:

    93101
    93102
    93103
    93105
    93106
    93107
    93108
    93109
    93110
    93111
    93116
    93117
    93118
    93120
    93121
    93130
    93140
    93150
    93160
    93190
    93199
    93401
    93402
    93403
    93405
    93406
    93407
    93408
    93409
    93410
    93412

    Group them:

    93101, 93102, 93103, 93105, 93106, 93107, 93108, 93109
    93110, 93111, 93116, 93117, 93118,
    93120, 93121
    93130, 93140, 93150, 93160, 93190,
    93199,

    93401, 93402, 93403, 93405, 93406, 93407, 93408, 93409
    93410, 93412

    And compare with what was provided (already grouped here and removed
    dupes):

    93101, 93102, 93103, 93105, 93106, 93107, 93108, 93109,
    93110, 93111, 93116, 93117, 93118,
    93120, 93121,
    93130, 93140, 93150, 93160, 93190,
    93199,

    93401, 93402, 93403, 93405, 93406, 93407, 93408, 93409
    93410, 93412

    q.e.d.

    We have only five-digit numbers with few linear exceptions here, one
    should manage it to see that the above RegExp matches without writing
    the matches down, especially if one has built the RegExp by themselves
    as described above.

    If reading the entire RegExp is still too difficult, one can also manage
    it to divide the RegExp into many (say each for every third or fourth
    digit) and have the tests combined with `&&'.

    So new numbers are not be a problem at all. If in doubt, one can simply
    add another alternative: If 93429 should be forbidden, too, the RegExp
    can be simply changed to

    /^(93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02])|93429)$/
    ^ ^^^^^^^

    which, of course, could (later) be optimized to

    /^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02]|29))$/

    An additional test may be as well combined with `&&' without wasting to
    much computing time.


    PointedEars
     
    Thomas 'PointedEars' Lahn, Dec 7, 2003
    #7
  8. Thomas 'PointedEars' Lahn <> writes:

    > Lasse Reichstein Nielsen wrote:



    > But his method consumes much more memory and computing time than
    > mine, no matter if the strings are structured or not. IOW: Compared
    > to my method, his is highly inefficient in *every* case.


    Have you tested it? It consumes much less computing time, and the
    memory is constant. I think you underestimate the complexity of
    interpreting a regular expression (or more likey: running the finite
    state automaton it has been compiled into) on a string.

    I created a test that made an array of 100000 random umbers in the
    range 80000-99999. Then it tested both methods against that table, using
    result1 = !!table[data];
    and
    result2 = re.test(data);
    (with a base check of
    result0 = data,false;
    to find the overhead of the other parts of the code not used in the
    actual test)
    The entire test is included below.

    The results were (in milliseconds):
    base table regexp
    IE 6: 1212 1733 23373
    Opera 7.23: 601 681 2944
    Moz FB 0.7 471 540 2304

    So, your method is, by far, less efficient than a table lookup, and
    even in IE, which (IIRC) uses a linear lookup for object properties.

    The only case where the table lookup loses is in size. It can be made
    better by building the table dynamically:

    var numbers = [101,102,103,105,106,107,108,109,110,111,116,117,118,120,
    121,130,140,150,160,190,199,401,402,403,405,406,407,408,409,410,412];
    var table = {};
    for (var i in numbers) {table[93000+numbers]=true;}

    Still larger than a regular expression, but not significantly.

    >> Regexps take more work to make, and are harder to read.

    >
    > Not generally, no.


    Definitly yes.
    I am very familiar with regular expressions, but I still have to think
    to read and understand one. The table is obvious. And while the table
    might take more space (that is a relevant parameter), it is easier to
    write. Given the list of numbers, it won't take long in Emacs to turn
    it into a table.

    >> And *much* harder to extend with new numbers, if it becomes necessary

    >
    > No, see below.


    To extend the regular expression, you have to either find the place in it
    that requires changing, or rebuild it from scratch. In a (sorted) table,
    you just have to find the correct place and add the line (or add it anywhere
    if you don't sort the table).

    It might not be a big difference, but it is definitly there.
    Regular expressions requires thought. The table can be automated.

    >> (It probably does, but comparing RegExps to string is ExpSpace complete

    > ^^^^^^^^^^^^^^^^
    > Define that.


    It's a complexity class.

    The *genereal* problem of, given a regular expression and another
    efficient description of a language (where language := set of strings
    not necessarily finite), decide whether the regular expression
    recognizes exactly the strings of the language, can (worst case)
    require memory space that is exponential in the size of the regular
    expression. I.e., it's bloody slow.

    As a comparison, factorizing (large) numbers only requires polynomial
    space and exponential time, and it's considered too inefficient to
    use in practice.

    > Not at all. It is primarily a matter of structured building of the
    > RegExp, finding similarities first.


    It takes thought and familiarity with regular expressions. You can do
    it. I can do it. Many other people here can too, but there are lots of
    people writing Javascript for web pages that considers regular expressions
    black magic, and just uses what they are given. If one of them is going
    to maintain the page with your regular expression, he'll be back here
    to ask for help in changing it when the numbers change.

    > You will not tell me that the above was hard work, will you?


    Hard, no. Work, yes. Building the table was *no* work at all.

    > Because the RegExp was built *this* way, it is easy as well to find out
    > the strings it will match,


    This regular expression is also special in that it only recognizes a
    finite number of strings. That makes it easier to handle than ones
    with "*" or "+" in them. So, the general hardness of the problem
    doesn't necessarily apply to this case.

    > We have only five-digit numbers with few linear exceptions here, one
    > should manage it to see that the above RegExp matches without writing
    > the matches down, especially if one has built the RegExp by themselves
    > as described above.


    Yes. It's (fairly) easy.

    > So new numbers are not be a problem at all. If in doubt, one can simply
    > add another alternative: If 93429 should be forbidden, too, the RegExp
    > can be simply changed to
    >
    > /^(93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02])|93429)$/
    > ^ ^^^^^^^
    >
    > which, of course, could (later) be optimized to
    >
    > /^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02]|29))$/


    Yes. It's a relatively simple case.
    But regualr expressions are not as obvious to a lot of other people.

    The test:
    ---
    //<script>
    function test(){
    var table = {
    93101 : true, 93102 : true, 93103 : true, 93105 : true, 93106 : true,
    93107 : true, 93108 : true, 93109 : true, 93110 : true, 93111 : true,
    93116 : true, 93117 : true, 93118 : true, 93120 : true, 93121 : true,
    93130 : true, 93140 : true, 93150 : true, 93160 : true, 93190 : true,
    93199 : true, 93401 : true, 93402 : true, 93403 : true, 93405 : true,
    93406 : true, 93407 : true, 93408 : true, 93409 : true, 93410 : true,
    93412 : true
    };
    var re = /^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|40[1-35-9]|41[02])$/;
    var testsize = 100000;
    var testdata = [];
    for (var i = 0;i<testsize;i++) {
    testdata=Math.floor(Math.random()*20000+80000);
    }

    var result0 = new Array(testsize);
    var d1 = new Date();
    for(var i =0;i<testsize;i++) {
    result0 = testdata,false;
    }
    var d2 = new Date();
    var timebase = d2-d1;

    var result1 = new Array(testsize);
    var d1 = new Date();
    for(var i =0;i<testsize;i++) {
    result1 = !!table[testdata];
    }
    var d2 = new Date();
    var timetable = d2-d1;

    var result2 = new Array(testsize);
    var d1 = new Date();
    for(var i =0;i<testsize;i++) {
    result2 = re.test(testdata);
    }
    var d2 = new Date();
    var timere = d2-d1;

    alert([timebase,timetable,timere]);
    }
    test();
    //</script>
    ---

    /L
    --
    Lasse Reichstein Nielsen -
    DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
    'Faith without judgement merely degrades the spirit divine.'
     
    Lasse Reichstein Nielsen, Dec 7, 2003
    #8
  9. Lasse Reichstein Nielsen wrote:

    > Thomas 'PointedEars' Lahn <> writes:
    >> Lasse Reichstein Nielsen wrote:
    >> But his method consumes much more memory and computing time than
    >> mine, no matter if the strings are structured or not. IOW: Compared
    >> to my method, his is highly inefficient in *every* case.

    >
    > Have you tested it?


    I do not have with JavaScript, and I must admit that `*every* case'
    was a bit exaggerated.

    > It consumes much less computing time,


    Well, apparently that depends on the implementation and on the
    complexity of the RegExp. AFAIS Mozilla/5.0's engine is on the
    average much faster on RegExp than other engines of ECMAScript
    implementations.

    > and the memory is constant. I think you underestimate the complexity of
    > interpreting a regular expression (or more likey: running the finite
    > state automaton it has been compiled into) on a string.
    > [...]
    > So, your method is, by far, less efficient than a table lookup, and
    > even in IE, which (IIRC) uses a linear lookup for object properties.


    It depends on how you build the RegExp, i.e. on how it is composed.

    What you overlook here is that I used an RegExp optimized for
    length. Of course, the matching can be also done with the longer
    /^(93101|93102|93103|...)$/, respectively, where the RegExp *wins*
    in matters of speed, size and amount of maintenance effort.

    >>> Regexps take more work to make, and are harder to read.

    >>
    >> Not generally, no.

    >
    > Definitly yes.


    Wrong, see above and below.

    > I am very familiar with regular expressions, but I still have to think
    > to read and understand one. The table is obvious.


    A list of simple-formed alternatives separated by `|' is obvious, too,
    if not even more than a table solution.

    > And while the table might take more space (that is a relevant parameter),
    > it is easier to write.


    /(foo|bar)/ *is* easy to write.

    > Given the list of numbers, it won't take long in Emacs to turn
    > it into a table.


    Although I'd prefer `vi', same goes for RegExps.

    >>> And *much* harder to extend with new numbers, if it becomes necessary

    >>
    >> No, see below.

    >
    > To extend the regular expression, you have to either find the place in it
    > that requires changing, or rebuild it from scratch.


    You do not have to. As I wrote, when in doubt, simply add another
    alternative expression at the lowest subexpression level. Since it
    is not evaluated if the previous does match, it then only takes a
    little bit more of memory, not really of computing or maintenance
    time.

    > In a (sorted) table, you just have to find the correct place and add the
    > line (or add it anywhere if you don't sort the table).


    In a RegExp, you just have to find the place where the `(' and `)'
    for alternatives must be placed and add another alternative. Or for
    testing matches you simply AND-combine the previous test with another
    one testing the new number expression (or a subexpression of it).

    > It might not be a big difference, but it is definitly there.
    > Regular expressions requires thought. The table can be automated.


    You can do that with RegExps, too. Using the RegExp(...) constructor
    function and a string argument, you can even accomplish that with
    JavaScript.

    >>> (It probably does, but comparing RegExps to string is ExpSpace complete

    >> ^^^^^^^^^^^^^^^^
    >> Define that.

    >
    > It's a complexity class.
    > [...]


    Thanks.

    >> Not at all. It is primarily a matter of structured building of the
    >> RegExp, finding similarities first.

    >
    > It takes thought and familiarity with regular expressions.


    It takes thought and at least average familiarity with JavaScript to
    create an object/array (literal) from a given set of strings. Your
    turn.

    >> You will not tell me that the above was hard work, will you?

    >
    > [...] Building the table was *no* work at all.


    I seriously doubt that ;-)

    >> Because the RegExp was built *this* way, it is easy as well to find out
    >> the strings it will match,

    >
    > This regular expression is also special in that it only recognizes a
    > finite number of strings. That makes it easier to handle than ones
    > with "*" or "+" in them. So, the general hardness of the problem
    > doesn't necessarily apply to this case.


    You can easily add alternatives or additional tests no matter how the
    original RegExp was composed.

    >> So new numbers are not be a problem at all. If in doubt, one can simply
    >> add another alternative: If 93429 should be forbidden, too, the RegExp
    >> can be simply changed to
    >>
    >> /^(93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02])|93429)$/
    >> ^ ^^^^^^^
    >>
    >> which, of course, could (later) be optimized to
    >>
    >> /^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02]|29))$/

    >
    > Yes. It's a relatively simple case.
    > But regualr expressions are not as obvious to a lot of other people.


    It is all about how to add another alternative. The optimization
    for length (which turned out as the opposite regarding computing
    speed) that I performed here is _not_ required.

    > The test:
    > ---
    > //<script>


    Why have you added the `//' *in* *front* of the `<script>' tag?

    > [...]


    Thanks. I tried that and got about the same as you did in the
    mentioned UAs.

    Now guess what changed when using number atoms as alternative
    expressions: The RegExp solution then proved to be about 4 to 6
    (according to repeated tests) times faster than the table solution,
    but (surprisingly) only with Mozilla/5.0. (Seems that IE's and
    Opera's RegExp engines need a little bit of tuning :))

    For Regular Expressions are widely known as *the* efficient method for
    matching strings, the opposite would have been very surprising to me.


    Note:
    Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0; Q312461) requires
    the property identifiers here to be quoted as they do not conform to
    valid identifiers in the version of JScript it supports by default.
    For Mozilla also accepts string literals, without having the access
    method to differ from numeric ones, one should always quote the
    identifier if in doubt.


    PointedEars
     
    Thomas 'PointedEars' Lahn, Dec 7, 2003
    #9
  10. JRS: In article <>, seen in
    news:comp.lang.javascript, Lasse Reichstein Nielsen <>
    posted at Sun, 7 Dec 2003 15:43:59 :-
    >
    >Definitly yes.
    >I am very familiar with regular expressions, but I still have to think
    >to read and understand one. The table is obvious. And while the table
    >might take more space (that is a relevant parameter), it is easier to
    >write. Given the list of numbers, it won't take long in Emacs to turn
    >it into a table.


    Indeed. While all available ability can be used to generate the initial
    code, one should allow for a possible future change, and the need to
    implement it with inferior staff. Table-based methods are easy to read,
    and fairly easy to make minor modifications to. Complex RegExps are
    not, and would need extra bolt-on tests or complete redesign.

    --
    © John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 MIME. ©
    <URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/&c., FAQqy topics & links;
    <URL:http://www.merlyn.demon.co.uk/clpb-faq.txt> RAH Prins : c.l.p.b mFAQ;
    <URL:ftp://garbo.uwasa.fi/pc/link/tsfaqp.zip> Timo Salmi's Turbo Pascal FAQ.
     
    Dr John Stockton, Dec 7, 2003
    #10
  11. Thomas 'PointedEars' Lahn <> writes:

    > Lasse Reichstein Nielsen wrote:


    >> It consumes much less computing time,

    >
    > Well, apparently that depends on the implementation and on the
    > complexity of the RegExp.


    Obviously. But I still haven't seen a single example where the RegExp
    was even close to being more efficient. More like an order of magnitude
    slower.

    > AFAIS Mozilla/5.0's engine is on the average much faster on RegExp
    > than other engines of ECMAScript implementations.


    It seems so from my results (actually Opera is faster on RegExps), but
    it is also faster at property lookup.

    > It depends on how you build the RegExp, i.e. on how it is composed.


    Probably. But not trivially.

    > What you overlook here is that I used an RegExp optimized for
    > length. Of course, the matching can be also done with the longer
    > /^(93101|93102|93103|...)$/, respectively, where the RegExp *wins*
    > in matters of speed, size and amount of maintenance effort.


    Test it. I did, and it was even slower than the "size optimized"
    version. The RegExp:
    var re = /^(?:93101|93102|93103|93105|93106|93107|93108|93109|93110|93111|93116|93117|93118|93120|93121|93130|93140|93150|93160|93190|93199|93401|93402|93403|93405|93406|93407|93408|93409|93410|93412)$/;

    The results:
    base table short regexp long regexp
    IE6 1032 1482 20980 27239
    O7.23 570 591 1853 2103
    Moz FB 0.7 431 489 2053 2434

    > A list of simple-formed alternatives separated by `|' is obvious, too,
    > if not even more than a table solution.


    If you build the table from an array of names, the array is simpler.

    > Although I'd prefer `vi', same goes for RegExps.


    If you use simple regular expressions, yes.

    >> In a (sorted) table, you just have to find the correct place and add the
    >> line (or add it anywhere if you don't sort the table).

    >
    > In a RegExp, you just have to find the place where the `(' and `)'
    > for alternatives must be placed and add another alternative.


    I.e., same complexity.

    > You can do that with RegExps, too. Using the RegExp(...) constructor
    > function and a string argument, you can even accomplish that with
    > JavaScript.


    Correct.

    > It takes thought and at least average familiarity with JavaScript to
    > create an object/array (literal) from a given set of strings. Your
    > turn.


    Ok. Let's call it a draw. If we use simepl "|"-separated regular
    expressions, writing them are equally simple.

    >> [...] Building the table was *no* work at all.

    >
    > I seriously doubt that ;-)


    It took time, not work :)

    > It is all about how to add another alternative. The optimization
    > for length (which turned out as the opposite regarding computing
    > speed) that I performed here is _not_ required.


    I find that the "long" regExp is slower than the size-optimized
    version in all my browsers.

    >> The test:
    >> ---
    >> //<script>

    >
    > Why have you added the `//' *in* *front* of the `<script>' tag?


    I used the same code and either evaluated it with "eval" or inserted
    it into a new page. This way, it's legal either way :)

    > Thanks. I tried that and got about the same as you did in the
    > mentioned UAs.



    > Now guess what changed when using number atoms as alternative
    > expressions: The RegExp solution then proved to be about 4 to 6
    > (according to repeated tests) times faster than the table solution,
    > but (surprisingly) only with Mozilla/5.0. (Seems that IE's and
    > Opera's RegExp engines need a little bit of tuning :))


    I don't get that. In Mozilla FB 0.7, using the above "long" regular
    expression and the original "size-optimized" regexp, I find that the long
    one is slower (200 ms on 100000 runs, but slower).

    > For Regular Expressions are widely known as *the* efficient method for
    > matching strings, the opposite would have been very surprising to me.


    They are very efficient for *complex*

    > Note:
    > Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0; Q312461) requires
    > the property identifiers here to be quoted as they do not conform to
    > valid identifiers in the version of JScript it supports by default.
    > For Mozilla also accepts string literals, without having the access
    > method to differ from numeric ones, one should always quote the
    > identifier if in doubt.


    That led me to one (stupid) mistake I made in my test. I created the
    test data as numbers, not strings, which forced a toString conversion
    in some cases.

    I changed the test data to be strings, saving later toString
    conversions. It helped the time for the regular expression tests, but
    not enough to be faster than the table lookup. To make it compatible
    with Netscape 4, I build the table from an array (of strings, not
    numbers). I also build the long regular expression from the same
    data, using RegExp("^("+tableData.join("|")+")$") .


    New results:

    base tabel short re long re
    IE 6 1612 1983 2694 3525
    O7.23 591 902 1572 1772
    Moz 511 641 1051 1342
    NS 4* 831 1713 1762 1572

    Much better performance for regular expressions (due to less toString
    conversion). Still slower than table look up (but not as much), and
    long RE still slower than short. Except for Netscape 4, where the
    long re is the fastest.

    (Instead of posting the code again, I have uploaded it to
    <URL:http://www.infimum.dk/privat/numberLookup.html>)

    My conclusion stands: Regular expressions are not more efficient than
    table lookup. They might be as simple to write, but then they are not
    as efficient as they can be.

    /L
    --
    Lasse Reichstein Nielsen -
    DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
    'Faith without judgement merely degrades the spirit divine.'
     
    Lasse Reichstein Nielsen, Dec 7, 2003
    #11
  12. Eddie

    Grant Wagner Guest

    Thomas 'PointedEars' Lahn wrote:

    > Lasse Reichstein Nielsen wrote:
    >
    > > Thomas 'PointedEars' Lahn <> writes:
    > >> OMG. Have you just forgot that there are RegExp?

    > >
    > > Most likely not. He gave a generic way to test for a finite number of
    > > strings. It wokrs whether there are structure to the strings or not.

    >
    > Undoubtedly. But his method consumes much more memory and computing
    > time than mine, no matter if the strings are structured or not. IOW:
    > Compared to my method, his is highly inefficient in *every* case.
    >
    > > Regexps take more work to make, and are harder to read.

    >
    > Not generally, no.
    >
    > > And *much* harder to extend with new numbers, if it becomes necessary

    >
    > No, see below.
    >
    > >> return(!/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|40[1-35-9]|41[02])$/.test(o.value));
    > >> }

    > >
    > > Are you sure that your regexp matches exactly the correct strings? :)

    >
    > Pretty sure.
    >
    > > (It probably does, but comparing RegExps to string is ExpSpace complete

    > ^^^^^^^^^^^^^^^^
    > Define that.
    >
    > > in general, so very hard to do).

    >
    > Not at all. It is primarily a matter of structured building of the
    > RegExp, finding similarities first.
    >


    Tiy say, "Not at all" then proceed with a hundred line explanation of how to compose a RegExp
    to match his set of numbers.

    > An additional test may be as well combined with `&&' without wasting to
    > much computing time.
    >
    > PointedEars


    How about this:

    function isValidZIP(theZIP) {
    switch (theZIP) {
    case 93101: case 93102: case 93103: case 93105:
    case 93106: case 93107: case 93108: case 93109:
    case 93110: case 93111: case 93116: case 93117:
    case 93118: case 93120: case 93121: case 93130:
    case 93140: case 93150: case 93160: case 93190:
    case 93199: case 93199: case 93401: case 93402:
    case 93403: case 93405: case 93406: case 93407:
    case 93408: case 93409: case 93410: case 93412:
    // ZIP is invalid
    return false;
    break;
    default:
    // ZIP is valid
    return true;
    break;
    }
    }

    Self-documenting, you can see AT A GLANCE which ZIP codes are valid, and you can easily add or
    remove additional ZIP codes without having to reconstruct your RegExp.

    With your RegExp, you'd have to add a comment similar to:

    /*
    matches 93101, or 93102 or 93103, or 93105 ....
    or 93412
    */

    because when you come back to work on it in 6 months, you won't remember what it does, and
    you'll have to waste time decoding it to figure out which ZIP codes are valid.

    --
    | Grant Wagner <>

    * Client-side Javascript and Netscape 4 DOM Reference available at:
    * http://devedge.netscape.com/library/manuals/2000/javascript/1.3/reference/frames.html
    * Internet Explorer DOM Reference available at:
    * http://msdn.microsoft.com/workshop/author/dhtml/reference/dhtml_reference_entry.asp
    * Netscape 6/7 DOM Reference available at:
    * http://www.mozilla.org/docs/dom/domref/
    * Tips for upgrading JavaScript for Netscape 7 / Mozilla
    * http://www.mozilla.org/docs/web-developer/upgrade_2.html
     
    Grant Wagner, Dec 10, 2003
    #12
  13. Grant Wagner wrote:

    > Thomas 'PointedEars' Lahn wrote:
    >> Lasse Reichstein Nielsen wrote:
    >>> Thomas 'PointedEars' Lahn <> writes:
    >>>> return(!/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|40[1-35-9]|41[02])$/.test(o.value));
    >>>> }
    >>>
    >>> Are you sure that your regexp matches exactly the correct
    >>> strings? :) (It probably does, but comparing RegExps to string is
    >>> ExpSpace complete in general, so very hard to do).

    >>
    >> Not at all. It is primarily a matter of structured building of the
    >> RegExp, finding similarities first.

    >
    > Tiy say, "Not at all" then proceed with a hundred line explanation of
    > how to compose a RegExp to match his set of numbers.


    That was how I built mine and once you are used to it, RegExp are no
    longer difficult (so I explained it in detail for others to learn).
    You can build yours far simpler, and I explained that, too.

    >> An additional test may be as well combined with `&&' without
    >> wasting to much computing time.
    >> [...]

    >
    > How about this:
    > [switch-case-default-example]
    > Self-documenting, you can see AT A GLANCE which
    > ZIP codes are valid, and you can easily add or remove additional ZIP
    > codes without having to reconstruct your RegExp.
    >
    > With your RegExp, you'd have to add a comment similar to:
    >
    > /* matches 93101, or 93102 or 93103, or 93105 .... or 93412 */
    >
    > because when you come back to work on it in 6 months, you won't
    > remember what it does, and you'll have to waste time decoding it
    > to figure out which ZIP codes are valid.


    OK, you wanted it, you get it:

    function isValidZIP(
    /** @argument number|string */ sInput,
    /** @argument Array of number|string */ aInvalidZIPs)
    /**
    * @author (C) 2003 Thomas Lahn &lt;&gt;
    * @param sInput ZIP code to be checked.
    * @returns <code>true</code> if <code>sInput</code> is
    * a valid ZIP code, <code>false</code> otherwise.
    */
    {
    var rxInvalidZIPs =
    new RegExp("^(" + aInvalidZIPs.join("|") + ")$");
    return !rxInvalidZIPs.test(sInput);
    }

    // Array of invalid ZIP codes
    var aInvZIPs =
    [93101, 93102, 93103, 93105, 93106, 93107, 93108, 93109,
    93110, 93111, 93116, 93117, 93118, 93140, 93150, 93199,
    93199, 93401, 93402, 93403, 93405, 93406, 93407, 93408,
    93409, 93410, 93412];

    var r = String(Math.floor(Math.random() * 1000)); // integer 0..999
    while (r.length < 3) // add leading zeroes
    r = "0" + r;
    var z = "93" + r; // add prefix
    alert(
    z + " is "
    + (isValidZIP(z, aInvZIPs) ? "" : "NOT")
    + " a valid ZIP code.");

    Happy testing!

    > --


    Your signature separator is borken, do not use Mozillas HTML editor to
    avoid that. Besides, your signature is far too long. Appropriate is
    a signature of up to 4 lines with up to 80 characters.

    80 characters per line is also the allowed maximum for Usenet messages
    which your posting exceeds by far. Set your automagic linebreak
    function to a recommended value between 72 to 76 characters per line so
    that a few quoting levels do not extend the 80th.

    And please trim your quotes to the absolute necessary. Especially, do
    not quote signatures (names and so-called signatures as well) if you do
    not refer to them.


    PointedEars
     
    Thomas 'PointedEars' Lahn, Dec 10, 2003
    #13
  14. JRS: In article <>, seen in
    news:comp.lang.javascript, Thomas 'PointedEars' Lahn
    <> posted at Wed, 10 Dec 2003 21:54:35 :-
    >
    >80 characters per line is also the allowed maximum for Usenet messages
    >which your posting exceeds by far. Set your automagic linebreak
    >function to a recommended value between 72 to 76 characters per line so
    >that a few quoting levels do not extend the 80th.


    I know of no reference for an allowed maximum, except at of the order of
    1000 characters. If you know of a lower one, in an authoritative
    document which takes evident cognisance of posting non-text material,
    then cite it.

    There is a strong recommendation that paragraphs of text should be sent
    properly wrapped with hard returns; figures vary from about 64 to 76
    characters. But where a line which ought be long is to be sent, it
    should not be arbitrarily broken.

    Script for News, therefore, should be composed with that limit in mind;
    anyhow, it seems more readable that way. But script which is longer
    must not be machine-wrapped, unless the machine understands the wrapping
    of indented script.

    Material which is transmitted with lines longer than 70-80 characters
    may be broken by displaying software, but it may be possible for the
    reader to extend those margins, and it should be possible to copy the
    material as transmitted into a file.

    --
    © John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 MIME ©
    Web <URL:http://www.uwasa.fi/~ts/http/tsfaq.html> -> Timo Salmi: Usenet Q&A.
    Web <URL:http://www.merlyn.demon.co.uk/news-use.htm> : about usage of News.
    No Encoding. Quotes before replies. Snip well. Write clearly. Don't Mail News.
     
    Dr John Stockton, Dec 11, 2003
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. shyam shukla

    How to validate text field with JSP

    shyam shukla, May 14, 2004, in forum: Java
    Replies:
    2
    Views:
    5,184
    Johan Poppe
    May 14, 2004
  2. Arnold Peters
    Replies:
    1
    Views:
    1,594
    Tomer Ben-David
    Oct 27, 2004
  3. Mike
    Replies:
    5
    Views:
    118
    Janwillem Borleffs
    Dec 1, 2003
  4. Sound
    Replies:
    2
    Views:
    498
    Randy Webb
    Sep 28, 2006
  5. jr
    Replies:
    3
    Views:
    495
Loading...

Share This Page