Check if value is a website URL

J

jwcarlton

This is a tricky one for me. I'm validating a form, and want to check
if a field entered is a legitimate website address. I don't
necessarily need to ensure that the site works (I can do that later),
but I do want to see if what's entered is a likely URL.

I'm currently just checking to see if it begins with "http", but
that's not so great; a less-savvy person might enter
"www.example.com", or even "example.com", and get an error that it's
not a legitimate link.

I've thought about testing to see if it contains at least 1 "." (since
all website addresses would, I think), but that's pretty vague; a less-
savvy person might enter their email address, and it would go through.
I guess that I could also check for an "@", but I can't help but
wonder if there's a smarter / smoother option?
 
D

dhtml

This is a tricky one for me. I'm validating a form, and want to check
if a field entered is a legitimate website address. I don't
necessarily need to ensure that the site works (I can do that later),
but I do want to see if what's entered is a likely URL.

I'm currently just checking to see if it begins with "http", but
that's not so great; a less-savvy person might enter
"www.example.com", or even "example.com", and get an error that it's
not a legitimate link.

If you want to require the protocol to be explicit, the UI should
indicate that in some way. For example, use placeholder text that
reads http://www.example.com, or use a label as "Address" or
"Location" instead of "URL".

(Lest the so-called "less-savvy" user actually know what a URL is and
enter a perfectly valid one that your code can't handle (i.e. not
beginning with "http")).

Validate the "location" field with a regexp on the client and on the
server. You might consider using HTML5 pattern attribute where
supported.
I've thought about testing to see if it contains at least 1 "." (since
all website addresses would, I think), but that's pretty vague; a less-
savvy person might enter their email address, and it would go through.
I guess that I could also check for an "@", but I can't help but
wonder if there's a smarter / smoother option?

HTML5 INPUT type="email", feature tested, and with a fallback on the
client where the test fails, and a fallback on the server (server side
handling) where JS is disabled).
 
S

Swifty

I've thought about testing to see if it contains at least 1 "." (since
all website addresses would, I think), but that's pretty vague; a less-
savvy person might enter their email address

My current algorithm test for an interior "." (i.e. not at the ends),
no "@" and no "." at the ends. It is for my own consumption, but I'm
better than Mr Average at naking mistakes. There's another one!

Going further would take me into the land of diminishing returns, but
this decision depends on how accurate you need to be.
 
J

Jukka K. Korpela

If you want to require the protocol to be explicit, the UI should
indicate that in some way.

The protocol part is required in absolute URLs. But of course one might
consider prepending http:// if there is no protocol part.
For example, use placeholder text that
reads http://www.example.com, or use a label as "Address" or
"Location" instead of "URL".

"URL" is much more accurate than "Address" or "Location" (which might
refer to postal addresses or geographic locations, for example). "Web
address" might do. Or "Web site address", if that's what one is asking for.
Validate the "location" field with a regexp on the client and on the
server.

That's non-trivial. Would you write one that accepts foo://example.com
and reject http://www.sää.fi for example?

If the intent is to check that the URL actually works, then it would be
simplest to do just that, instead of a separate syntax check. Checking
that it works is of course nontrivial, especially since it may involve
dealing with redirections and temporary network and server problemn.
You might consider using HTML5 pattern attribute where
supported. [...]
I've thought about testing to see if it contains at least 1 "." (since
all website addresses would, I think), but that's pretty vague; a less-
savvy person might enter their email address, and it would go through.
I guess that I could also check for an "@", but I can't help but
wonder if there's a smarter / smoother option?

HTML5 INPUT type="email", feature tested, and with a fallback on the
client where the test fails, and a fallback on the server (server side
handling) where JS is disabled).

Pardon? This is place for using <input type=url>, isn't it? It's good to
use it even though most browsers will treat it as <input type=text>, so
that any client-side checks will be performed only if coded in
JavaScript and when JavaScript is enabled. (To be honest, there is a
risk in using <input type=url>, or <input type=email> for that matter -
it is useful when you specifically expect email address. The risk is
that when browsers start supporting them more widely, they will first do
it wildly. It's easy even to people who write browsers to produce code
that checks URLs and email addresses so that correct data is rejected
and incorrect data passes thru.)
 
P

P E Schoen

"jwcarlton" wrote in message
This is a tricky one for me. I'm validating a form, and want to
check if a field entered is a legitimate website address. I don't
necessarily need to ensure that the site works (I can do that later),
but I do want to see if what's entered is a likely URL.
I'm currently just checking to see if it begins with "http", but
that's not so great; a less-savvy person might enter
"www.example.com", or even "example.com", and get an error that
it's not a legitimate link.
I've thought about testing to see if it contains at least 1 "." (since
all website addresses would, I think), but that's pretty vague; a
less-savvy person might enter their email address, and it would
go through. I guess that I could also check for an "@", but I can't
help but wonder if there's a smarter / smoother option?

I found this which may help, but it's in PHP:
http://www.tutorialcode.com/php/link-verifier-check-if-a-url-is-valid-or-not/

And here is a simple regex from geekpedia:

function CheckValidUrl(strUrl)
{
var RegexUrl =
/(ftp|http|https):\/\/(\w+:{0,1}\w*@)?(\S+):)[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?/
return RegexUrl.test(strUrl);
}

// Sample use

alert(CheckValidUrl("http://www.geekpedia.com")); "));

I have not used either one, but it seems like a handy utility.

Paul
 
D

dhtml

The protocol part is required in absolute URLs. But of course one might
consider prepending http:// if there is no protocol part.


"URL" is much more accurate than "Address" or "Location" (which might
refer to postal addresses or geographic locations, for example). "Web
address" might do. Or "Web site address", if that's what one is asking for.


That's non-trivial. Would you write one that accepts foo://example.com
and rejecthttp://www.s.fi for example?
Point being that it is insufficient to validate only on the client.
If the intent is to check that the URL actually works, then it would be
simplest to do just that, instead of a separate syntax check. Checking
that it works is of course nontrivial, especially since it may involve
dealing with redirections and temporary network and server problemn.

Right. From the client, you're dealing with connectivity problems
(WiFi, 3g, AT&T DSL, etc). From the server, you have to deal with
other servers that may be slow or down. Do you really want the client
to wait while the program is trying to connnect to say
"jibbering.com"?
 > You might consider using HTML5 pattern attribute where
supported. [...]
I've thought about testing to see if it contains at least 1 "." (since
all website addresses would, I think), but that's pretty vague; a less-
savvy person might enter their email address, and it would go through.
I guess that I could also check for an "@", but I can't help but
wonder if there's a smarter / smoother option?
HTML5 INPUT type="email", feature tested, and with a fallback on the
client where the test fails, and a fallback on the server (server side
handling) where JS is disabled).

Pardon? This is place for using <input type=url>, isn't it?

Right, I misread, thanks for pointing it out. (I though he'd also
wanted to validate emails.)

http://www.whatwg.org/specs/web-app...e/states-of-the-type-attribute.html#url-state

"User agents may allow the user to set the value to a string that is
not a valid absolute URL, but may also or instead automatically escape
characters entered by the user so that the value is always a valid
absolute URL"

http://diveintohtml5.org/examples/input-type-url.html

Passes as a valid URL there: L.A://%-@-%\\

It's good to
use it even though most browsers will treat it as <input type=text>, so
that any client-side checks will be performed only if coded in
JavaScript and when JavaScript is enabled. (To be honest, there is a
risk in using <input type=url>, or <input type=email> for that matter-
it is useful when you specifically expect email address. The risk is
that when browsers start supporting them more widely, they will first do
it wildly.

We've seen that already with input type="date".
 
M

Mike Duffy

This is a tricky one for me. I'm validating a form, and want to check
if a field entered is a legitimate website address. I don't
necessarily need to ensure that the site works (I can do that later),

If you are going to do that later anyway, why even bother to try to parse
it first? Are you not just wasting effort?

Let the DNS server do the work.
 
D

Denis McMahon

This is a tricky one for me. I'm validating a form, and want to check if
a field entered is a legitimate website address. I don't necessarily
need to ensure that the site works (I can do that later), but I do want
to see if what's entered is a likely URL.

Why are you doing the validation client side?

Is entering a website mandatory?

If it's not mandatory, why validate client site? Validate it server side
(you need to validate everything server side anyway) and just discard if
it's not valid.

If you must have a website entered, why? Consider the personal data
implications. If you don't really need it, see above.

If you really really must have a website entered, then the best you can
do client side is check to see if it looks genuine, and that really means
just looking for a valid host name. This code might get a lot of false
positives, but I don't think it will give any false negatives:

<script type="text/javascript">
function isValidWebsiteUri(str) { return true; }
</script>

If you insist on doing more than that, consider the following:

numeric ips are valid
%-encoded characters are valid
rfc 3986
rfc 2616 (and others it mentions)

I'm not going to try and write javascript code to validate a url, simply
because no matter how complex and all encompassing my code is, someone
will suggest (a) a valid http url that it rejects and (b) an invalid url
that it accepts.

You might be better off doing an ajax exchange with your server and
calling a dns query on the supplied uri following the field's blur event.
Obviously allow for the field being changed from containing an invalid uri
to empty if it's a non mandatory field.

Rgds

Denis McMahon
 
J

Jukka K. Korpela

Why are you doing the validation client side?

I think the idea of client-side validation is good, as it often helps
the user (and thus indirectly the site owner). The problem is that
validating a URL client-side is complicated, perhaps so complicated that
it is better to do server-side validation only.
If it's not mandatory, why validate client site?

For the same reason as for mandatory addresses. For example, if the user
mistakenly types htttp://www.example.com or http://www.example,com, we
would like to tell about the problem immediately so that he can see the
problem and fix it right away.
If you really really must have a website entered, then the best you can
do client side is check to see if it looks genuine, and that really means
just looking for a valid host name.

It's not "just" looking for a valid host name (a vague concept). And a
web site address may well have a path part (as mine does).
 
D

Dr J R Stockton

In comp.lang.javascript message <d69e1a77-5741-442e-b783-b9027ab96a30@r2
1g2000yqr.googlegroups.com>, Tue, 13 Sep 2011 23:12:08, jwcarlton
This is a tricky one for me. I'm validating a form, and want to check
if a field entered is a legitimate website address. I don't
necessarily need to ensure that the site works (I can do that later),
but I do want to see if what's entered is a likely URL.

You don't give any indication of your location, so we can guess nothing
about your circumstances. If, for example, you are a resident Cuban,
you may need to allow for North Korean addresses, But, if you are
American, it might even be illegal to accept them.

Use Wikipedia <http://en.wikipedia.org/wiki/Request_for_Comments> and
its links to search the RFCs for the allowable forms of website address,
being sure to use only currently-applicable RFCs.

Remember to allow for dotted quad and IPv6 equivalent.

If there is any chance that your test may reject a valid and legitimate
address, consider allowing the user to override refusal.

Note that the majority of typoes applied to valid addresses yield
possible addresses. Therefore, it is really necessary to rely on the
user being careful enough.
 
D

Dr J R Stockton

In comp.lang.javascript message <[email protected]>
If you are going to do that later anyway, why even bother to try to parse
it first? Are you not just wasting effort?

Let the DNS server do the work.

If one attempts to validate over the Net, the user may have to wait
several seconds for an answer. That is annoying when not necessary.


One should realise that there three possible states of validation : yes,
no, and don't know.

Only a net test can assure the user that the URL is valid; and even then
it may not remain valid - and it still may not be the right one.

But some strings, liable to be entered in error, can be ruled out as
possible URLs by a local check, with more or less confidence. An empty
string cannot, I think, be a URL, even if a default protocol is added.
Probably there must be at least one dot in the URL, though for all I
know something other than \u002E might be allowed in Asian URLs
nowadays. The final character probably has to be a letter, but perhaps
not necessarily in the ranges A-Z a-z.

One should test for cannot-be-right input at the client end, as far as
that can be reasonably done in safety - which includes not only the
soundness of the intended algorithm, but also the coder's ability to get
it right.
 
M

Mike Duffy

But some strings, liable to be entered in error, can be ruled out as
possible URLs by a local check, with more or less confidence.

True. But the most common types if miss-spelling (as well as the most
common spells of miss-typing) will not be within the "http://" part of the
URL, which might not even be there if it is assumed to be such by the
application. Usually an error will be an ommission, duplication, or
transposition of alphanumeric characters. Mistakes such as this will always
evaluate as valid and will need to be run through the network in any case.

In other words, yes, you can very quickly (client side) notice it when a
"." has been entered as a ",". But most typing mistakes will not be
noticed. Is it worth making your code more complicated to do this?

It will not speed up the obligatory check which will need to be done
afterwards. And it introduces more code which increases the chance of your
user seeing one of those pesky JS error boxes.
 
D

Dr J R Stockton

In comp.lang.javascript message <[email protected]>
True. But the most common types if miss-spelling (as well as the most
common spells of miss-typing) will not be within the "http://" part of the
URL, which might not even be there if it is assumed to be such by the
application. Usually an error will be an ommission, duplication, or
transposition of alphanumeric characters. Mistakes such as this will always
evaluate as valid and will need to be run through the network in any case.

In other words, yes, you can very quickly (client side) notice it when a
"." has been entered as a ",". But most typing mistakes will not be
noticed. Is it worth making your code more complicated to do this?

Yes, provided that the code added is short, within the competence of the
code, and written with a sufficient knowledge of the RFCs.

A moderately careful user will look at the input fields before
submission, but only moderately carefully. Comma-for-dot is an easy
error to make and to miss. Another is entering a wrong value entirely,
perhaps a telephone number. Another is not entering anything.

Such are worth finding, for the user's point of view, because they allow
an immediate response.
It will not speed up the obligatory check which will need to be done
afterwards. And it introduces more code which increases the chance of your
user seeing one of those pesky JS error boxes.

If one cannot, with testing, reliably code a client-side test for
"at least one dot", then one should not be coding such applications.

If the server-side code logs the reasons for its rejections, one may see
other client-side tests that would be useful.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top