validate a text input field. (again)

Eddie · Dec 6, 2003

I need to validate a text input field.

I just want to say if user enters

93101 or 93102 or 93103 or 93105 or 93106 or 93107 or 93108 or 93109
or 93110 or 93111 or 93116 or 93117 or 93118 or 93120 or 93121 or
93130 or 93140 or 93150 or 93160 or 93190 or 93199 or 93199 or 93401
or 93402 or 93403 or 93405 or 93406 or 93407 or 93408 or 93409 or
93410 or 93412

he can not submit the form. (because we do not service that area)

Any help would be greatly appreciated.

Douglas Crockford · Dec 6, 2003

I need to validate a text input field.

I just want to say if user enters

93101 or 93102 or 93103 or 93105 or 93106 or 93107 or 93108 or 93109
or 93110 or 93111 or 93116 or 93117 or 93118 or 93120 or 93121 or
93130 or 93140 or 93150 or 93160 or 93190 or 93199 or 93199 or 93401
or 93402 or 93403 or 93405 or 93406 or 93407 or 93408 or 93409 or
93410 or 93412

he can not submit the form. (because we do not service that area)

Any help would be greatly appreciated.

var zone = {
'93101': 1,
'93102': 1,
'93103': 1,
'93105': 1,
'93106': 1,
'93107': 1,
'93108': 1,
'93109': 1,
'93110': 1,
'93111': 1,
'93116': 1,
'93117': 1,
'93118': 1,
'93120': 1,
'93121': 1,
'93130': 1,
'93140': 1,
'93150': 1,
'93160': 1,
'93190': 1,
'93199': 1,
'93199': 1,
'93401': 1,
'93402': 1,
'93403': 1,
'93405': 1,
'93406': 1,
'93407': 1,
'93408': 1,
'93409': 1,
'93410': 1,
'93412': 1};

if (zone[input] == 1) {
// reject
} else {
// accept
}

http://www.JSON.org/

Thomas 'PointedEars' Lahn · Dec 6, 2003

Douglas said:
I just want to say if user enters

93101 or 93102 or 93103 or 93105 or 93106 or 93107 or 93108 or 93109
or 93110 or 93111 or 93116 or 93117 or 93118 or 93120 or 93121 or
93130 or 93140 or 93150 or 93160 or 93190 or 93199 or 93199 or 93401
or 93402 or 93403 or 93405 or 93406 or 93407 or 93408 or 93409 or
93410 or 93412

he can not submit the form. (because we do not service that area)

Any help would be greatly appreciated.

Click to expand...

[Lengthy object definition]

if (zone[input] == 1) {
// reject
} else {
// accept
}

OMG. Have you just forgot that there are RegExp?

function checkMe(o)
{

return(!/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|40[1-35-9]|41[02])$/.test(o.value));
}

<form ... onsubmit="return checkMe(this.elements['bla'])">
<input name="bla">
</form>

PointedEars

Lasse Reichstein Nielsen · Dec 6, 2003

Thomas 'PointedEars' Lahn said:
OMG. Have you just forgot that there are RegExp?

Most likely not. He gave a generic way to test for a finite number of
strings. It wokrs whether there are structure to the strings or not.

Regexps take more work to make, and are harder to read. And *much*
harder to extend with new numbers, if it becomes necessary

return(!/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|40[1-35-9]|41[02])$/.test(o.value));
}

Are you sure that your regexp matches exactly the correct strings?

(It probably does, but comparing RegExps to string is ExpSpace complete
in general, so very hard to do).

/L

Dr John Stockton · Dec 6, 2003

JRS: In article <[email protected]>, seen

I just want to say if user enters

93101 or 93102 or 93103 or 93105 or 93106 or 93107 or 93108 or 93109
or 93110 or 93111 or 93116 or 93117 or 93118 or 93120 or 93121 or
93130 or 93140 or 93150 or 93160 or 93190 or 93199 or 93199 or 93401
or 93402 or 93403 or 93405 or 93406 or 93407 or 93408 or 93409 or
93410 or 93412

he can not submit the form. (because we do not service that area)

It seems likely, from the above, that all outside 93xxx are likely to
remain serviceable; OTOH, the list may change.

To save repetitive typing and gain run-time efficiency, one can first
test for the 93; after that, it is well to minimise the size of the
code. Consider, but with the full test list,

S = '93103'

OK = S.substring(0, 2) != "93" ||
'101 102 103 105 106 107 108 109 110'.indexOf(S.substring(2))<0

If those are postal codes, what do you do if someone enters "SW1A 1AA" ?

@SM · Dec 6, 2003

Eddie a ecrit :

I need to validate a text input field.

I just want to say if user enters

93101 or 93102 or 93103 or 93105 or 93106 or 93107 or 93108 or 93109
or 93110 or 93111 or 93116 or 93117 or 93118 or 93120 or 93121 or
93130 or 93140 or 93150 or 93160 or 93190 or 93199 or 93199 or 93401
or 93402 or 93403 or 93405 or 93406 or 93407 or 93408 or 93409 or
93410 or 93412

he can not submit the form. (because we do not service that area)

Any help would be greatly appreciated.

an easy way to actualize this list of numbers could be

<script type="text/javascript"></script>

<form action="code.php"
onsubmit="ok==1? return true :
{alert('uncorrect code !); return false ;}">
Enter your code here :
<input type=text onchange="validTextField(this.value);">
<input type=submit value="Validate">
</form>

Thomas 'PointedEars' Lahn · Dec 7, 2003

Lasse said:
Most likely not. He gave a generic way to test for a finite number of
strings. It wokrs whether there are structure to the strings or not.

Undoubtedly. But his method consumes much more memory and computing
time than mine, no matter if the strings are structured or not. IOW:
Compared to my method, his is highly inefficient in *every* case.

Regexps take more work to make, and are harder to read.

Not generally, no.

And *much* harder to extend with new numbers, if it becomes necessary

No, see below.

return(!/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|40[1-35-9]|41[02])$/.test(o.value));
}

Click to expand...

Are you sure that your regexp matches exactly the correct strings?

Pretty sure.

(It probably does, but comparing RegExps to string is ExpSpace complete

^^^^^^^^^^^^^^^^
Define that.

in general, so very hard to do).

Not at all. It is primarily a matter of structured building of the
RegExp, finding similarities first.

See the numbers again the RegExp should match. I remove the duplicate
93199 and group the numbers so one sees clearly what they have in common.

93101, 93102, 93103, 93105, 93106, 93107, 93108, 93109,
93110, 93111, 93116, 93117, 93118,
93120, 93121,
93130, 93140, 93150, 93160, 93190,
93199,

93401, 93402, 93403, 93405, 93406, 93407, 93408, 93409
93410, 93412

Obviously all numbers begin with 93:

/^93/

There are numbers continuing with 1 and with 4:

/^93(1|4)/

Numbers continuing with 1 continue with either 0 to 6, or 9:

/^93(1(0|1|2|3|4|5|6|9)|4)/

Numbers continuing from there with 0 continue with digits from 1 to 9,
except of 4:

/^93(1(0[1-35-9]|1|2|3|4|5|6|9)|4)/

Numbers continuing from there with 1 continue with 0, 1, and 6 to 8:

/^93(1(0[1-35-9]|1[016-8]|2|3|4|5|6|9)|4)/

Numbers continuing from there with 2 continue with either 0 or 1:

/^93(1(0[1-35-9]|1[016-8]|2[01]|3|4|5|6|9)|4)/

Numbers continuing from there with 3 to 6 and 9 continue with 0:

/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0)|4)/

If the fourth digit was 9, also 9 can follow:

/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4)/

(One could have also grouped 93190 and 93199 together:
...|[3-6]0|9[09])...)

Numbers having a 4 as third digit continue with either 0 or 1:

/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4[01])/

Because the fourth digit is followed by different sets of digits we write

/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0|1))/

instead.

If the third digit is 4 and the fourth digit is 0, digits from 1 to 3
and 5 to 9 may follow:

/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1))/

If the third digit is 4 and the fourth digit is 1, the fifth may be
only 0 and 2:

/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02]))/

For we match whole numbers, we finally add the end-of-text meta character:

/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02]))$/

Now compare that to my RegExp which was built (but only in mind)
using the same procedure:

/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|40[1-35-9]|41[02])$/

The only difference is that I wrote `40[...]|41[...]' instead of
`4(0[...]|1[...])' which is semantically equal, though.

You will not tell me that the above was hard work, will you?

Because the RegExp was built *this* way, it is easy as well to find out
the strings it will match, going from left to right, creating a branch
in a build tree every time we find an alternative (including sets of
characters):

93
931
9310
93101
93102
93103
93105
93106
93107
93108
93109
9311
93110
93111
93116
93117
93118
9312
93120
93121
9313
93130
9314
93140
9315
93150
9316
93160
9319
93190
93199
934
9340
93401
93402
93403
93405
93406
93407
93408
93409
9341
93410
93412

We take only the leaves of the build tree:

93101
93102
93103
93105
93106
93107
93108
93109
93110
93111
93116
93117
93118
93120
93121
93130
93140
93150
93160
93190
93199
93401
93402
93403
93405
93406
93407
93408
93409
93410
93412

Group them:

93101, 93102, 93103, 93105, 93106, 93107, 93108, 93109
93110, 93111, 93116, 93117, 93118,
93120, 93121
93130, 93140, 93150, 93160, 93190,
93199,

93401, 93402, 93403, 93405, 93406, 93407, 93408, 93409
93410, 93412

And compare with what was provided (already grouped here and removed
dupes):

93101, 93102, 93103, 93105, 93106, 93107, 93108, 93109,
93110, 93111, 93116, 93117, 93118,
93120, 93121,
93130, 93140, 93150, 93160, 93190,
93199,

93401, 93402, 93403, 93405, 93406, 93407, 93408, 93409
93410, 93412

q.e.d.

We have only five-digit numbers with few linear exceptions here, one
should manage it to see that the above RegExp matches without writing
the matches down, especially if one has built the RegExp by themselves
as described above.

If reading the entire RegExp is still too difficult, one can also manage
it to divide the RegExp into many (say each for every third or fourth
digit) and have the tests combined with `&&'.

So new numbers are not be a problem at all. If in doubt, one can simply
add another alternative: If 93429 should be forbidden, too, the RegExp
can be simply changed to

/^(93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02])|93429)$/
^ ^^^^^^^

which, of course, could (later) be optimized to

/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02]|29))$/

An additional test may be as well combined with `&&' without wasting to
much computing time.

PointedEars

Lasse Reichstein Nielsen · Dec 7, 2003

Thomas 'PointedEars' Lahn said:
Lasse Reichstein Nielsen wrote:

But his method consumes much more memory and computing time than
mine, no matter if the strings are structured or not. IOW: Compared
to my method, his is highly inefficient in *every* case.

Have you tested it? It consumes much less computing time, and the
memory is constant. I think you underestimate the complexity of
interpreting a regular expression (or more likey: running the finite
state automaton it has been compiled into) on a string.

I created a test that made an array of 100000 random umbers in the
range 80000-99999. Then it tested both methods against that table, using
result1 = !!table[data];
and
result2 = re.test(data);
(with a base check of
result0 = data,false;
to find the overhead of the other parts of the code not used in the
actual test)
The entire test is included below.

The results were (in milliseconds):
base table regexp
IE 6: 1212 1733 23373
Opera 7.23: 601 681 2944
Moz FB 0.7 471 540 2304

So, your method is, by far, less efficient than a table lookup, and
even in IE, which (IIRC) uses a linear lookup for object properties.

The only case where the table lookup loses is in size. It can be made
better by building the table dynamically:

var numbers = [101,102,103,105,106,107,108,109,110,111,116,117,118,120,
121,130,140,150,160,190,199,401,402,403,405,406,407,408,409,410,412];
var table = {};
for (var i in numbers) {table[93000+numbers]=true;}

Still larger than a regular expression, but not significantly.

Not generally, no.

Click to expand...

Definitly yes.
I am very familiar with regular expressions, but I still have to think
to read and understand one. The table is obvious. And while the table
might take more space (that is a relevant parameter), it is easier to
write. Given the list of numbers, it won't take long in Emacs to turn
it into a table.

No, see below.

Click to expand...

To extend the regular expression, you have to either find the place in it
that requires changing, or rebuild it from scratch. In a (sorted) table,
you just have to find the correct place and add the line (or add it anywhere
if you don't sort the table).

It might not be a big difference, but it is definitly there.
Regular expressions requires thought. The table can be automated.

^^^^^^^^^^^^^^^^
Define that.

Click to expand...

It's a complexity class.

The *genereal* problem of, given a regular expression and another
efficient description of a language (where language := set of strings
not necessarily finite), decide whether the regular expression
recognizes exactly the strings of the language, can (worst case)
require memory space that is exponential in the size of the regular
expression. I.e., it's bloody slow.

As a comparison, factorizing (large) numbers only requires polynomial
space and exponential time, and it's considered too inefficient to
use in practice.

Not at all. It is primarily a matter of structured building of the
RegExp, finding similarities first.

Click to expand...

It takes thought and familiarity with regular expressions. You can do
it. I can do it. Many other people here can too, but there are lots of
people writing Javascript for web pages that considers regular expressions
black magic, and just uses what they are given. If one of them is going
to maintain the page with your regular expression, he'll be back here
to ask for help in changing it when the numbers change.

You will not tell me that the above was hard work, will you?

Click to expand...

Hard, no. Work, yes. Building the table was *no* work at all.

Because the RegExp was built *this* way, it is easy as well to find out
the strings it will match,

Click to expand...

This regular expression is also special in that it only recognizes a
finite number of strings. That makes it easier to handle than ones
with "*" or "+" in them. So, the general hardness of the problem
doesn't necessarily apply to this case.

We have only five-digit numbers with few linear exceptions here, one
should manage it to see that the above RegExp matches without writing
the matches down, especially if one has built the RegExp by themselves
as described above.

Click to expand...

Yes. It's (fairly) easy.

So new numbers are not be a problem at all. If in doubt, one can simply
add another alternative: If 93429 should be forbidden, too, the RegExp
can be simply changed to

/^(93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02])|93429)$/
^ ^^^^^^^

which, of course, could (later) be optimized to

/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02]|29))$/

Click to expand...

Yes. It's a relatively simple case.
But regualr expressions are not as obvious to a lot of other people.

The test:
---
//<script>
function test(){
var table = {
93101 : true, 93102 : true, 93103 : true, 93105 : true, 93106 : true,
93107 : true, 93108 : true, 93109 : true, 93110 : true, 93111 : true,
93116 : true, 93117 : true, 93118 : true, 93120 : true, 93121 : true,
93130 : true, 93140 : true, 93150 : true, 93160 : true, 93190 : true,
93199 : true, 93401 : true, 93402 : true, 93403 : true, 93405 : true,
93406 : true, 93407 : true, 93408 : true, 93409 : true, 93410 : true,
93412 : true
};
var re = /^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|40[1-35-9]|41[02])$/;
var testsize = 100000;
var testdata = [];
for (var i = 0;i<testsize;i++) {
testdata=Math.floor(Math.random()*20000+80000);
}

var result0 = new Array(testsize);
var d1 = new Date();
for(var i =0;i<testsize;i++) {
result0 = testdata,false;
}
var d2 = new Date();
var timebase = d2-d1;

var result1 = new Array(testsize);
var d1 = new Date();
for(var i =0;i<testsize;i++) {
result1 = !!table[testdata];
}
var d2 = new Date();
var timetable = d2-d1;

var result2 = new Array(testsize);
var d1 = new Date();
for(var i =0;i<testsize;i++) {
result2 = re.test(testdata);
}
var d2 = new Date();
var timere = d2-d1;

alert([timebase,timetable,timere]);
}
test();
//</script>

Thomas 'PointedEars' Lahn · Dec 7, 2003

Lasse said:
Have you tested it?

I do not have with JavaScript, and I must admit that `*every* case'
was a bit exaggerated.

It consumes much less computing time,

Well, apparently that depends on the implementation and on the
complexity of the RegExp. AFAIS Mozilla/5.0's engine is on the
average much faster on RegExp than other engines of ECMAScript
implementations.

and the memory is constant. I think you underestimate the complexity of
interpreting a regular expression (or more likey: running the finite
state automaton it has been compiled into) on a string.
[...]
So, your method is, by far, less efficient than a table lookup, and
even in IE, which (IIRC) uses a linear lookup for object properties.

It depends on how you build the RegExp, i.e. on how it is composed.

What you overlook here is that I used an RegExp optimized for
length. Of course, the matching can be also done with the longer
/^(93101|93102|93103|...)$/, respectively, where the RegExp *wins*
in matters of speed, size and amount of maintenance effort.

Definitly yes.

Wrong, see above and below.

I am very familiar with regular expressions, but I still have to think
to read and understand one. The table is obvious.

A list of simple-formed alternatives separated by `|' is obvious, too,
if not even more than a table solution.

And while the table might take more space (that is a relevant parameter),
it is easier to write.

/(foo|bar)/ *is* easy to write.

Given the list of numbers, it won't take long in Emacs to turn
it into a table.

Although I'd prefer `vi', same goes for RegExps.

To extend the regular expression, you have to either find the place in it
that requires changing, or rebuild it from scratch.

You do not have to. As I wrote, when in doubt, simply add another
alternative expression at the lowest subexpression level. Since it
is not evaluated if the previous does match, it then only takes a
little bit more of memory, not really of computing or maintenance
time.

In a (sorted) table, you just have to find the correct place and add the
line (or add it anywhere if you don't sort the table).

In a RegExp, you just have to find the place where the `(' and `)'
for alternatives must be placed and add another alternative. Or for
testing matches you simply AND-combine the previous test with another
one testing the new number expression (or a subexpression of it).

It might not be a big difference, but it is definitly there.
Regular expressions requires thought. The table can be automated.

You can do that with RegExps, too. Using the RegExp(...) constructor
function and a string argument, you can even accomplish that with
JavaScript.

^^^^^^^^^^^^^^^^
Define that.

Click to expand...

It's a complexity class.
[...]
Thanks.

Not at all. It is primarily a matter of structured building of the
RegExp, finding similarities first.

Click to expand...

It takes thought and familiarity with regular expressions.

It takes thought and at least average familiarity with JavaScript to
create an object/array (literal) from a given set of strings. Your
turn.

You will not tell me that the above was hard work, will you?

Click to expand...

[...] Building the table was *no* work at all.

I seriously doubt that ;-)

This regular expression is also special in that it only recognizes a
finite number of strings. That makes it easier to handle than ones
with "*" or "+" in them. So, the general hardness of the problem
doesn't necessarily apply to this case.

You can easily add alternatives or additional tests no matter how the
original RegExp was composed.

So new numbers are not be a problem at all. If in doubt, one can simply
add another alternative: If 93429 should be forbidden, too, the RegExp
can be simply changed to

/^(93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02])|93429)$/
^ ^^^^^^^

which, of course, could (later) be optimized to

/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02]|29))$/

Click to expand...

Yes. It's a relatively simple case.
But regualr expressions are not as obvious to a lot of other people.

It is all about how to add another alternative. The optimization
for length (which turned out as the opposite regarding computing
speed) that I performed here is _not_ required.

The test:

Why have you added the `//' *in* *front* of the ` said:
[...]

Thanks. I tried that and got about the same as you did in the
mentioned UAs.

Now guess what changed when using number atoms as alternative
expressions: The RegExp solution then proved to be about 4 to 6
(according to repeated tests) times faster than the table solution,
but (surprisingly) only with Mozilla/5.0. (Seems that IE's and
Opera's RegExp engines need a little bit of tuning

)

For Regular Expressions are widely known as *the* efficient method for
matching strings, the opposite would have been very surprising to me.

Note:
Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0; Q312461) requires
the property identifiers here to be quoted as they do not conform to
valid identifiers in the version of JScript it supports by default.
For Mozilla also accepts string literals, without having the access
method to differ from numeric ones, one should always quote the
identifier if in doubt.

PointedEars

Dr John Stockton · Dec 7, 2003

JRS: In article <[email protected]>, seen in

news:comp.lang.javascript said:
Definitly yes.
I am very familiar with regular expressions, but I still have to think
to read and understand one. The table is obvious. And while the table
might take more space (that is a relevant parameter), it is easier to
write. Given the list of numbers, it won't take long in Emacs to turn
it into a table.

Indeed. While all available ability can be used to generate the initial
code, one should allow for a possible future change, and the need to
implement it with inferior staff. Table-based methods are easy to read,
and fairly easy to make minor modifications to. Complex RegExps are
not, and would need extra bolt-on tests or complete redesign.

Lasse Reichstein Nielsen · Dec 7, 2003

Thomas 'PointedEars' Lahn said:
Lasse Reichstein Nielsen wrote:

Well, apparently that depends on the implementation and on the
complexity of the RegExp.

Obviously. But I still haven't seen a single example where the RegExp
was even close to being more efficient. More like an order of magnitude
slower.

AFAIS Mozilla/5.0's engine is on the average much faster on RegExp
than other engines of ECMAScript implementations.

It seems so from my results (actually Opera is faster on RegExps), but
it is also faster at property lookup.

It depends on how you build the RegExp, i.e. on how it is composed.

Probably. But not trivially.

What you overlook here is that I used an RegExp optimized for
length. Of course, the matching can be also done with the longer
/^(93101|93102|93103|...)$/, respectively, where the RegExp *wins*
in matters of speed, size and amount of maintenance effort.

Test it. I did, and it was even slower than the "size optimized"
version. The RegExp:
var re = /^(?:93101|93102|93103|93105|93106|93107|93108|93109|93110|93111|93116|93117|93118|93120|93121|93130|93140|93150|93160|93190|93199|93401|93402|93403|93405|93406|93407|93408|93409|93410|93412)$/;

The results:
base table short regexp long regexp
IE6 1032 1482 20980 27239
O7.23 570 591 1853 2103
Moz FB 0.7 431 489 2053 2434

A list of simple-formed alternatives separated by `|' is obvious, too,
if not even more than a table solution.

If you build the table from an array of names, the array is simpler.

Although I'd prefer `vi', same goes for RegExps.

If you use simple regular expressions, yes.

In a RegExp, you just have to find the place where the `(' and `)'
for alternatives must be placed and add another alternative.

I.e., same complexity.

You can do that with RegExps, too. Using the RegExp(...) constructor
function and a string argument, you can even accomplish that with
JavaScript.
Correct.

It takes thought and at least average familiarity with JavaScript to
create an object/array (literal) from a given set of strings. Your
turn.

Ok. Let's call it a draw. If we use simepl "|"-separated regular
expressions, writing them are equally simple.

[...] Building the table was *no* work at all.

Click to expand...

I seriously doubt that ;-)

It took time, not work

It is all about how to add another alternative. The optimization
for length (which turned out as the opposite regarding computing
speed) that I performed here is _not_ required.

I find that the "long" regExp is slower than the size-optimized
version in all my browsers.

Why have you added the `//' *in* *front* of the `<script>' tag?

I used the same code and either evaluated it with "eval" or inserted
it into a new page. This way, it's legal either way

Thanks. I tried that and got about the same as you did in the
mentioned UAs.

Now guess what changed when using number atoms as alternative
expressions: The RegExp solution then proved to be about 4 to 6
(according to repeated tests) times faster than the table solution,
but (surprisingly) only with Mozilla/5.0. (Seems that IE's and
Opera's RegExp engines need a little bit of tuning )

I don't get that. In Mozilla FB 0.7, using the above "long" regular
expression and the original "size-optimized" regexp, I find that the long
one is slower (200 ms on 100000 runs, but slower).

For Regular Expressions are widely known as *the* efficient method for
matching strings, the opposite would have been very surprising to me.

They are very efficient for *complex*

Note:
Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0; Q312461) requires
the property identifiers here to be quoted as they do not conform to
valid identifiers in the version of JScript it supports by default.
For Mozilla also accepts string literals, without having the access
method to differ from numeric ones, one should always quote the
identifier if in doubt.

That led me to one (stupid) mistake I made in my test. I created the
test data as numbers, not strings, which forced a toString conversion
in some cases.

I changed the test data to be strings, saving later toString
conversions. It helped the time for the regular expression tests, but
not enough to be faster than the table lookup. To make it compatible
with Netscape 4, I build the table from an array (of strings, not
numbers). I also build the long regular expression from the same
data, using RegExp("^("+tableData.join("|")+")$") .

New results:

base tabel short re long re
IE 6 1612 1983 2694 3525
O7.23 591 902 1572 1772
Moz 511 641 1051 1342
NS 4* 831 1713 1762 1572

Much better performance for regular expressions (due to less toString
conversion). Still slower than table look up (but not as much), and
long RE still slower than short. Except for Netscape 4, where the
long re is the fastest.

(Instead of posting the code again, I have uploaded it to
<URL:http://www.infimum.dk/privat/numberLookup.html>)

My conclusion stands: Regular expressions are not more efficient than
table lookup. They might be as simple to write, but then they are not
as efficient as they can be.

/L

Grant Wagner · Dec 10, 2003

Thomas said:
Lasse said:

Most likely not. He gave a generic way to test for a finite number of
strings. It wokrs whether there are structure to the strings or not.

Click to expand...

Undoubtedly. But his method consumes much more memory and computing
time than mine, no matter if the strings are structured or not. IOW:
Compared to my method, his is highly inefficient in *every* case.

Regexps take more work to make, and are harder to read.

Click to expand...

Not generally, no.

And *much* harder to extend with new numbers, if it becomes necessary

Click to expand...

No, see below.

return(!/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|40[1-35-9]|41[02])$/.test(o.value));
}

Click to expand...

Are you sure that your regexp matches exactly the correct strings?

Click to expand...

Pretty sure.

(It probably does, but comparing RegExps to string is ExpSpace complete

Click to expand...

^^^^^^^^^^^^^^^^
Define that.

in general, so very hard to do).

Click to expand...

Not at all. It is primarily a matter of structured building of the
RegExp, finding similarities first.

Tiy say, "Not at all" then proceed with a hundred line explanation of how to compose a RegExp
to match his set of numbers.

An additional test may be as well combined with `&&' without wasting to
much computing time.

PointedEars

How about this:

function isValidZIP(theZIP) {
switch (theZIP) {
case 93101: case 93102: case 93103: case 93105:
case 93106: case 93107: case 93108: case 93109:
case 93110: case 93111: case 93116: case 93117:
case 93118: case 93120: case 93121: case 93130:
case 93140: case 93150: case 93160: case 93190:
case 93199: case 93199: case 93401: case 93402:
case 93403: case 93405: case 93406: case 93407:
case 93408: case 93409: case 93410: case 93412:
// ZIP is invalid
return false;
break;
default:
// ZIP is valid
return true;
break;
}
}

Self-documenting, you can see AT A GLANCE which ZIP codes are valid, and you can easily add or
remove additional ZIP codes without having to reconstruct your RegExp.

With your RegExp, you'd have to add a comment similar to:

/*
matches 93101, or 93102 or 93103, or 93105 ....
or 93412
*/

because when you come back to work on it in 6 months, you won't remember what it does, and
you'll have to waste time decoding it to figure out which ZIP codes are valid.

--
| Grant Wagner <[email protected]>

* Client-side Javascript and Netscape 4 DOM Reference available at:
* http://devedge.netscape.com/library/manuals/2000/javascript/1.3/reference/frames.html
* Internet Explorer DOM Reference available at:
* http://msdn.microsoft.com/workshop/author/dhtml/reference/dhtml_reference_entry.asp
* Netscape 6/7 DOM Reference available at:
* http://www.mozilla.org/docs/dom/domref/
* Tips for upgrading JavaScript for Netscape 7 / Mozilla
* http://www.mozilla.org/docs/web-developer/upgrade_2.html

Thomas 'PointedEars' Lahn · Dec 10, 2003

Grant said:
Thomas said:

Lasse said:

return(!/^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|40[1-35-9]|41[02])$/.test(o.value));
}

Are you sure that your regexp matches exactly the correct
strings? (It probably does, but comparing RegExps to string is
ExpSpace complete in general, so very hard to do).

Click to expand...

Not at all. It is primarily a matter of structured building of the
RegExp, finding similarities first.

Click to expand...

Tiy say, "Not at all" then proceed with a hundred line explanation of
how to compose a RegExp to match his set of numbers.

That was how I built mine and once you are used to it, RegExp are no
longer difficult (so I explained it in detail for others to learn).
You can build yours far simpler, and I explained that, too.

An additional test may be as well combined with `&&' without
wasting to much computing time.
[...]

Click to expand...

How about this:
[switch-case-default-example]
Self-documenting, you can see AT A GLANCE which
ZIP codes are valid, and you can easily add or remove additional ZIP
codes without having to reconstruct your RegExp.

With your RegExp, you'd have to add a comment similar to:

/* matches 93101, or 93102 or 93103, or 93105 .... or 93412 */

because when you come back to work on it in 6 months, you won't
remember what it does, and you'll have to waste time decoding it
to figure out which ZIP codes are valid.

OK, you wanted it, you get it:

function isValidZIP(
/** @argument number|string */ sInput,
/** @argument Array of number|string */ aInvalidZIPs)
/**
* @author (C) 2003 Thomas Lahn <[email protected]>
* @param sInput ZIP code to be checked.
* @returns <code>true</code> if <code>sInput</code> is
* a valid ZIP code, <code>false</code> otherwise.
*/
{
var rxInvalidZIPs =
new RegExp("^(" + aInvalidZIPs.join("|") + ")$");
return !rxInvalidZIPs.test(sInput);
}

// Array of invalid ZIP codes
var aInvZIPs =
[93101, 93102, 93103, 93105, 93106, 93107, 93108, 93109,
93110, 93111, 93116, 93117, 93118, 93140, 93150, 93199,
93199, 93401, 93402, 93403, 93405, 93406, 93407, 93408,
93409, 93410, 93412];

var r = String(Math.floor(Math.random() * 1000)); // integer 0..999
while (r.length < 3) // add leading zeroes
r = "0" + r;
var z = "93" + r; // add prefix
alert(
z + " is "
+ (isValidZIP(z, aInvZIPs) ? "" : "NOT")
+ " a valid ZIP code.");

Happy testing!

--

Your signature separator is borken, do not use Mozillas HTML editor to
avoid that. Besides, your signature is far too long. Appropriate is
a signature of up to 4 lines with up to 80 characters.

80 characters per line is also the allowed maximum for Usenet messages
which your posting exceeds by far. Set your automagic linebreak
function to a recommended value between 72 to 76 characters per line so
that a few quoting levels do not extend the 80th.

And please trim your quotes to the absolute necessary. Especially, do
not quote signatures (names and so-called signatures as well) if you do
not refer to them.

PointedEars

Dr John Stockton · Dec 11, 2003

JRS: In article <[email protected]>, seen in
Thomas 'PointedEars' Lahn

80 characters per line is also the allowed maximum for Usenet messages
which your posting exceeds by far. Set your automagic linebreak
function to a recommended value between 72 to 76 characters per line so
that a few quoting levels do not extend the 80th.

I know of no reference for an allowed maximum, except at of the order of
1000 characters. If you know of a lower one, in an authoritative
document which takes evident cognisance of posting non-text material,
then cite it.

There is a strong recommendation that paragraphs of text should be sent
properly wrapped with hard returns; figures vary from about 64 to 76
characters. But where a line which ought be long is to be sent, it
should not be arbitrarily broken.

Script for News, therefore, should be composed with that limit in mind;
anyhow, it seems more readable that way. But script which is longer
must not be machine-wrapped, unless the machine understands the wrapping
of indented script.

Material which is transmitted with lines longer than 70-80 characters
may be broken by displaying software, but it may be possible for the
reader to extend those margins, and it should be possible to copy the
material as transmitted into a file.

"input-group-text" help	7	Aug 10, 2023
event question for a time input field	2	Feb 2, 2012
How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	May 1, 2023
Hi Ppl... Need help customizing a WiFi Driver (RTL8811AU + RTL8189ES)	1	Oct 12, 2023
FAQ 4.19 How do I validate input?	0	Jan 5, 2011
Uhhhhh, What can I do next?	6	Nov 25, 2023
Persistent Space Character in an Input Text Field	3	Feb 27, 2007
If a validation script fails, how do I place focus back into the field until entered correctly	2	May 14, 2017

validate a text input field. (again)

Eddie

Douglas Crockford

Thomas 'PointedEars' Lahn

Lasse Reichstein Nielsen

Dr John Stockton

@SM

Thomas 'PointedEars' Lahn

Lasse Reichstein Nielsen

Thomas 'PointedEars' Lahn

Dr John Stockton

Lasse Reichstein Nielsen

Grant Wagner

Thomas 'PointedEars' Lahn

Dr John Stockton

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads