Validate URL script help


M

MJ

For some reason the following script does not work in Netscape/Mozilla, but
works fine in IE and Opera. It is supposed to check the syntax, make sure
there is a valid TLD (yes, those are all of the current TLDs), and allow for
addresses with or without trailing slashes or page addresses.

Anybody have any ideas on how to get this to work in Netscape? I suspect it
has something to do with the regular expression, but I can't get it to work.
Any help would be GREALY appreciated!

// Validate URL
re3 =
/^(http|https):\/\/\S+\.(ac|ad|ae|aero|af|ag|ai|al|am|an|ao|aq|ar|arpa|as|at
|au|aw|az|ba|bb|bd|be|bf|bg|bh|bi|biz|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc
|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|com|coop|cr|cu|cv|cx|cy|cz|de|dj|dk|dm|do|dz|
ec|edu|ee|eg|er|es|et|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gov
|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|info|int|io|iq|ir|
is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|l
v|ly|ma|mc|md|mg|mh|mil|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|museum|mv|mw|mx|my|
mz|na|name|nc|ne|net|nf|ng|ni|nl|no|np|nr|nu|nz|om|org|pa|pe|pf|pg|ph|pk|pl|
pm|pn|pr|pro|ps|pt|pw|py|qa|re|ro|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|
sn|so|sr|st|su|sv|sy|sz|tc|td|tf|tg|th|tj|tk|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|u
g|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)([/]\S+|)$/i;

function validateURL(textfield){
if (textfield.value == ""){
return true;
} else {
if (textfield.value.substring(0,7) != "http://" &&
textfield.value.substring(0,8) != "https://") {
textfield.value = "http://" + textfield.value;
}
if (!re3.test(textfield.value)){
alert("Invalid web site address");
textfield.focus();
}
return false;
}
}

It is being called by a simple:

<input name="Website" type="text" onBlur="validateURL(this)">
 
Ad

Advertisements

L

Lasse Reichstein Nielsen

MJ said:
For some reason the following script does not work in Netscape/Mozilla, but
works fine in IE and Opera.

"does not work" how? Do you get an error message or does it accept the wrong
strings?
/^(http|https):\/\/\S+\.(ac|ad|ae|aero|af|ag|ai|al|am|an|ao|aq|ar|arpa|as|at
|au|aw|az|ba|bb|bd|be|bf|bg|bh|bi|biz|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc

Your newsclient has wrapped the line. It should be on one line to work.
g|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)([/]\S+|)$/i;
^
That slash should be escaped. Change "[/]" to "\/".

Not tested (didn't want to rewrap the lines :)
/L
 
M

MJ

Ah, it was the regular expression. Escaping the slash fixed it. I could
have sworn I had tried that before, but I guess not.

Thanks for the help! You're a life saver.


Lasse Reichstein Nielsen said:
MJ said:
For some reason the following script does not work in Netscape/Mozilla, but
works fine in IE and Opera.

"does not work" how? Do you get an error message or does it accept the wrong
strings?

/^(http|https):\/\/\S+\.(ac|ad|ae|aero|af|ag|ai|al|am|an|ao|aq|ar|arpa|as|at
|au|aw|az|ba|bb|bd|be|bf|bg|bh|bi|biz|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc

Your newsclient has wrapped the line. It should be on one line to work.
g|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)([/]\S+|)$/i;
^
That slash should be escaped. Change "[/]" to "\/".

Not tested (didn't want to rewrap the lines :)
/L
 
Ad

Advertisements

T

Thomas 'PointedEars' Lahn

MJ said:
// Validate URL re3 = /^(http|https):

The alternation can be written as /https?/ which is generally
more efficient ("http" can be matched always only once).
\/\/\S+\.(ac|ad|ae|aero|af|ag|ai|al|am|an|ao|aq|ar|arpa|as|at
|au|aw|az|ba|bb|bd|be|bf|bg|bh|bi|biz|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc
|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|com|coop|cr|cu|cv|cx|cy|cz|de|dj|dk|dm|do|dz|
ec|edu|ee|eg|er|es|et|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gov
|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|info|int|io|iq|ir|
is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|l
v|ly|ma|mc|md|mg|mh|mil|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|museum|mv|mw|mx|my|
mz|na|name|nc|ne|net|nf|ng|ni|nl|no|np|nr|nu|nz|om|org|pa|pe|pf|pg|ph|pk|pl|
pm|pn|pr|pro|ps|pt|pw|py|qa|re|ro|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|
sn|so|sr|st|su|sv|sy|sz|tc|td|tf|tg|th|tj|tk|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|u
g|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)

That can be shortened very much to

\/\/\S+\.(aero|arpa|a[c-gilm-oq-uwz]|biz|b[a-bd-fjm-or-tvwyz]|com|coop
|c[acdf-ik-oruvx-z|d[ejkmoz]|edu|e[cegr-t]|f[i-kmor]|gov|g[abd-il-npr-uwy]
|h[kmnrtu]|info|int|i[del-oqr-t]|j[emop]|k[eghimnrwyz]|l[abcikr-vy]
|museum|mil|m[acdghkl-z]|name|net|n[acefgilopruz]|org|om|pro|p[aefghk-nr-twy]
|qa|r[eouw]|s[a-eg-ort-vyz]|t[cdf-hjkm-prtvwz]|u[agkmsyz]|v[aceginu]
|w[fs]|y[etu]|z[amw])

(if I have not missed a character or two, but I think you get the
idea). That is not only shorter but can be more efficient than
complete alternation, depending on the type of RegExp engine used.
With a NFA, character classes are much more efficient than alternation
because matching can be done in parallel and thus much faster. (See
<http://www.oreilly.com/catalog/regex/chapter/ch04.html>, "Character
Classes vs. Alternation".) For ECMAScript and implementations, a NFA
is clearly involved in the matching process, if not the only engine
type used, as backreferences and capturing parantheses are supported.
So it is clearly a Good Thing to replace alternation with character
classes here and avoid alternation where possible.

But:

Are you sure you need the top level domain this precise while
the rest is checked rather sloppy? Are you prepared to maintain
that script as top level domains evolve? Why don't you allow
IPv4 addresses in URLs? They can be static. Why don't you stick
(close) to RFC 2396? The BNF grammar can be easily implemented
as a RegExp.
([/]\S+|)$/i;

As Lasse already pointed out, in RegExp literals every forward slash
must be escaped, even in character classes. In fact, the character
class does not help here. But you probably meant /((\/\S+)?)$/i.
Rule of thumb: Do not use alternation when not necessary, see above.


PointedEars

P.S.
Please take heed of Usenet/Internet standards and use an existing "From:"
address. Avoiding spam is no excuse for breaking standards and thus helping
to destroy the functionality of the involved media:
<http://www.interhack.net/pubs/munging-harmful/>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top