Check if value is a website URL

Discussion in 'Javascript' started by jwcarlton, Sep 14, 2011.

  1. jwcarlton

    jwcarlton Guest

    This is a tricky one for me. I'm validating a form, and want to check
    if a field entered is a legitimate website address. I don't
    necessarily need to ensure that the site works (I can do that later),
    but I do want to see if what's entered is a likely URL.

    I'm currently just checking to see if it begins with "http", but
    that's not so great; a less-savvy person might enter
    "www.example.com", or even "example.com", and get an error that it's
    not a legitimate link.

    I've thought about testing to see if it contains at least 1 "." (since
    all website addresses would, I think), but that's pretty vague; a less-
    savvy person might enter their email address, and it would go through.
    I guess that I could also check for an "@", but I can't help but
    wonder if there's a smarter / smoother option?
     
    jwcarlton, Sep 14, 2011
    #1
    1. Advertising

  2. jwcarlton

    dhtml Guest

    On Sep 13, 11:12 pm, jwcarlton <> wrote:
    > This is a tricky one for me. I'm validating a form, and want to check
    > if a field entered is a legitimate website address. I don't
    > necessarily need to ensure that the site works (I can do that later),
    > but I do want to see if what's entered is a likely URL.
    >
    > I'm currently just checking to see if it begins with "http", but
    > that's not so great; a less-savvy person might enter
    > "www.example.com", or even "example.com", and get an error that it's
    > not a legitimate link.


    If you want to require the protocol to be explicit, the UI should
    indicate that in some way. For example, use placeholder text that
    reads http://www.example.com, or use a label as "Address" or
    "Location" instead of "URL".

    (Lest the so-called "less-savvy" user actually know what a URL is and
    enter a perfectly valid one that your code can't handle (i.e. not
    beginning with "http")).

    Validate the "location" field with a regexp on the client and on the
    server. You might consider using HTML5 pattern attribute where
    supported.

    >
    > I've thought about testing to see if it contains at least 1 "." (since
    > all website addresses would, I think), but that's pretty vague; a less-
    > savvy person might enter their email address, and it would go through.
    > I guess that I could also check for an "@", but I can't help but
    > wonder if there's a smarter / smoother option?


    HTML5 INPUT type="email", feature tested, and with a fallback on the
    client where the test fails, and a fallback on the server (server side
    handling) where JS is disabled).
    --
    Garrett
     
    dhtml, Sep 14, 2011
    #2
    1. Advertising

  3. jwcarlton

    Swifty Guest

    On Tue, 13 Sep 2011 23:12:08 -0700 (PDT), jwcarlton
    <> wrote:

    >I've thought about testing to see if it contains at least 1 "." (since
    >all website addresses would, I think), but that's pretty vague; a less-
    >savvy person might enter their email address


    My current algorithm test for an interior "." (i.e. not at the ends),
    no "@" and no "." at the ends. It is for my own consumption, but I'm
    better than Mr Average at naking mistakes. There's another one!

    Going further would take me into the land of diminishing returns, but
    this decision depends on how accurate you need to be.

    --
    Steve Swift
    http://www.swiftys.org.uk/swifty.html
    http://www.ringers.org.uk
     
    Swifty, Sep 14, 2011
    #3
  4. 14/09/2011 09:57, dhtml wrote:

    > If you want to require the protocol to be explicit, the UI should
    > indicate that in some way.


    The protocol part is required in absolute URLs. But of course one might
    consider prepending http:// if there is no protocol part.

    > For example, use placeholder text that
    > reads http://www.example.com, or use a label as "Address" or
    > "Location" instead of "URL".


    "URL" is much more accurate than "Address" or "Location" (which might
    refer to postal addresses or geographic locations, for example). "Web
    address" might do. Or "Web site address", if that's what one is asking for.

    > Validate the "location" field with a regexp on the client and on the
    > server.


    That's non-trivial. Would you write one that accepts foo://example.com
    and reject http://www.sää.fi for example?

    If the intent is to check that the URL actually works, then it would be
    simplest to do just that, instead of a separate syntax check. Checking
    that it works is of course nontrivial, especially since it may involve
    dealing with redirections and temporary network and server problemn.

    > You might consider using HTML5 pattern attribute where
    > supported.

    [...]
    >> I've thought about testing to see if it contains at least 1 "." (since
    >> all website addresses would, I think), but that's pretty vague; a less-
    >> savvy person might enter their email address, and it would go through.
    >> I guess that I could also check for an "@", but I can't help but
    >> wonder if there's a smarter / smoother option?

    >
    > HTML5 INPUT type="email", feature tested, and with a fallback on the
    > client where the test fails, and a fallback on the server (server side
    > handling) where JS is disabled).


    Pardon? This is place for using <input type=url>, isn't it? It's good to
    use it even though most browsers will treat it as <input type=text>, so
    that any client-side checks will be performed only if coded in
    JavaScript and when JavaScript is enabled. (To be honest, there is a
    risk in using <input type=url>, or <input type=email> for that matter -
    it is useful when you specifically expect email address. The risk is
    that when browsers start supporting them more widely, they will first do
    it wildly. It's easy even to people who write browsers to produce code
    that checks URLs and email addresses so that correct data is rejected
    and incorrect data passes thru.)

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, Sep 14, 2011
    #4
  5. jwcarlton

    P E Schoen Guest

    "jwcarlton" wrote in message
    news:...

    > This is a tricky one for me. I'm validating a form, and want to
    > check if a field entered is a legitimate website address. I don't
    > necessarily need to ensure that the site works (I can do that later),
    > but I do want to see if what's entered is a likely URL.


    > I'm currently just checking to see if it begins with "http", but
    > that's not so great; a less-savvy person might enter
    > "www.example.com", or even "example.com", and get an error that
    > it's not a legitimate link.


    > I've thought about testing to see if it contains at least 1 "." (since
    > all website addresses would, I think), but that's pretty vague; a
    > less-savvy person might enter their email address, and it would
    > go through. I guess that I could also check for an "@", but I can't
    > help but wonder if there's a smarter / smoother option?


    I found this which may help, but it's in PHP:
    http://www.tutorialcode.com/php/link-verifier-check-if-a-url-is-valid-or-not/

    And here is a simple regex from geekpedia:

    function CheckValidUrl(strUrl)
    {
    var RegexUrl =
    /(ftp|http|https):\/\/(\w+:{0,1}\w*@)?(\S+):)[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?/
    return RegexUrl.test(strUrl);
    }

    // Sample use

    alert(CheckValidUrl("http://www.geekpedia.com")); "));

    I have not used either one, but it seems like a handy utility.

    Paul
     
    P E Schoen, Sep 14, 2011
    #5
  6. jwcarlton

    P E Schoen Guest

    P E Schoen, Sep 14, 2011
    #6
  7. jwcarlton

    dhtml Guest

    On Sep 14, 1:16 am, "Jukka K. Korpela" <> wrote:
    > 14/09/2011 09:57, dhtml wrote:
    > > If you want to require the protocol to be explicit, the UI should
    > > indicate that in some way.

    >
    > The protocol part is required in absolute URLs. But of course one might
    > consider prepending http:// if there is no protocol part.
    >
    > > For example, use placeholder text that
    > > readshttp://www.example.com, or use a label as "Address" or
    > > "Location" instead of "URL".

    >
    > "URL" is much more accurate than "Address" or "Location" (which might
    > refer to postal addresses or geographic locations, for example). "Web
    > address" might do. Or "Web site address", if that's what one is asking for.
    >
    > > Validate the "location" field with a regexp on the client and on the
    > > server.

    >
    > That's non-trivial. Would you write one that accepts foo://example.com
    > and rejecthttp://www.s.fi for example?
    >

    Point being that it is insufficient to validate only on the client.

    > If the intent is to check that the URL actually works, then it would be
    > simplest to do just that, instead of a separate syntax check. Checking
    > that it works is of course nontrivial, especially since it may involve
    > dealing with redirections and temporary network and server problemn.
    >


    Right. From the client, you're dealing with connectivity problems
    (WiFi, 3g, AT&T DSL, etc). From the server, you have to deal with
    other servers that may be slow or down. Do you really want the client
    to wait while the program is trying to connnect to say
    "jibbering.com"?

    >  > You might consider using HTML5 pattern attribute where
    >
    > > supported.

    > [...]
    > >> I've thought about testing to see if it contains at least 1 "." (since
    > >> all website addresses would, I think), but that's pretty vague; a less-
    > >> savvy person might enter their email address, and it would go through.
    > >> I guess that I could also check for an "@", but I can't help but
    > >> wonder if there's a smarter / smoother option?

    >
    > > HTML5 INPUT type="email", feature tested, and with a fallback on the
    > > client where the test fails, and a fallback on the server (server side
    > > handling) where JS is disabled).

    >
    > Pardon? This is place for using <input type=url>, isn't it?


    Right, I misread, thanks for pointing it out. (I though he'd also
    wanted to validate emails.)

    http://www.whatwg.org/specs/web-app...e/states-of-the-type-attribute.html#url-state

    "User agents may allow the user to set the value to a string that is
    not a valid absolute URL, but may also or instead automatically escape
    characters entered by the user so that the value is always a valid
    absolute URL"

    http://diveintohtml5.org/examples/input-type-url.html

    Passes as a valid URL there: L.A://%-@-%\\

    It's good to
    > use it even though most browsers will treat it as <input type=text>, so
    > that any client-side checks will be performed only if coded in
    > JavaScript and when JavaScript is enabled. (To be honest, there is a
    > risk in using <input type=url>, or <input type=email> for that matter-
    > it is useful when you specifically expect email address. The risk is
    > that when browsers start supporting them more widely, they will first do
    > it wildly.


    We've seen that already with input type="date".
    --
    Garrett
     
    dhtml, Sep 14, 2011
    #7
  8. jwcarlton

    Mike Duffy Guest

    jwcarlton <> wrote in news:d69e1a77-5741-442e-b783-
    :

    > This is a tricky one for me. I'm validating a form, and want to check
    > if a field entered is a legitimate website address. I don't
    > necessarily need to ensure that the site works (I can do that later),


    If you are going to do that later anyway, why even bother to try to parse
    it first? Are you not just wasting effort?

    Let the DNS server do the work.

    --
    http://pages.videotron.ca/duffym/index.htm#
     
    Mike Duffy, Sep 14, 2011
    #8
  9. On Tue, 13 Sep 2011 23:12:08 -0700, jwcarlton wrote:

    > This is a tricky one for me. I'm validating a form, and want to check if
    > a field entered is a legitimate website address. I don't necessarily
    > need to ensure that the site works (I can do that later), but I do want
    > to see if what's entered is a likely URL.


    Why are you doing the validation client side?

    Is entering a website mandatory?

    If it's not mandatory, why validate client site? Validate it server side
    (you need to validate everything server side anyway) and just discard if
    it's not valid.

    If you must have a website entered, why? Consider the personal data
    implications. If you don't really need it, see above.

    If you really really must have a website entered, then the best you can
    do client side is check to see if it looks genuine, and that really means
    just looking for a valid host name. This code might get a lot of false
    positives, but I don't think it will give any false negatives:

    <script type="text/javascript">
    function isValidWebsiteUri(str) { return true; }
    </script>

    If you insist on doing more than that, consider the following:

    numeric ips are valid
    %-encoded characters are valid
    rfc 3986
    rfc 2616 (and others it mentions)

    I'm not going to try and write javascript code to validate a url, simply
    because no matter how complex and all encompassing my code is, someone
    will suggest (a) a valid http url that it rejects and (b) an invalid url
    that it accepts.

    You might be better off doing an ajax exchange with your server and
    calling a dns query on the supplied uri following the field's blur event.
    Obviously allow for the field being changed from containing an invalid uri
    to empty if it's a non mandatory field.

    Rgds

    Denis McMahon
     
    Denis McMahon, Sep 15, 2011
    #9
  10. 15.9.2011 20:17, Denis McMahon wrote:

    > On Tue, 13 Sep 2011 23:12:08 -0700, jwcarlton wrote:
    >
    >> This is a tricky one for me. I'm validating a form, and want to check if
    >> a field entered is a legitimate website address. I don't necessarily
    >> need to ensure that the site works (I can do that later), but I do want
    >> to see if what's entered is a likely URL.

    >
    > Why are you doing the validation client side?


    I think the idea of client-side validation is good, as it often helps
    the user (and thus indirectly the site owner). The problem is that
    validating a URL client-side is complicated, perhaps so complicated that
    it is better to do server-side validation only.

    > If it's not mandatory, why validate client site?


    For the same reason as for mandatory addresses. For example, if the user
    mistakenly types htttp://www.example.com or http://www.example,com, we
    would like to tell about the problem immediately so that he can see the
    problem and fix it right away.

    > If you really really must have a website entered, then the best you can
    > do client side is check to see if it looks genuine, and that really means
    > just looking for a valid host name.


    It's not "just" looking for a valid host name (a vague concept). And a
    web site address may well have a path part (as mine does).

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, Sep 15, 2011
    #10
  11. In comp.lang.javascript message <d69e1a77-5741-442e-b783-b9027ab96a30@r2
    1g2000yqr.googlegroups.com>, Tue, 13 Sep 2011 23:12:08, jwcarlton
    <> posted:

    >This is a tricky one for me. I'm validating a form, and want to check
    >if a field entered is a legitimate website address. I don't
    >necessarily need to ensure that the site works (I can do that later),
    >but I do want to see if what's entered is a likely URL.


    You don't give any indication of your location, so we can guess nothing
    about your circumstances. If, for example, you are a resident Cuban,
    you may need to allow for North Korean addresses, But, if you are
    American, it might even be illegal to accept them.

    Use Wikipedia <http://en.wikipedia.org/wiki/Request_for_Comments> and
    its links to search the RFCs for the allowable forms of website address,
    being sure to use only currently-applicable RFCs.

    Remember to allow for dotted quad and IPv6 equivalent.

    If there is any chance that your test may reject a valid and legitimate
    address, consider allowing the user to override refusal.

    Note that the majority of typoes applied to valid addresses yield
    possible addresses. Therefore, it is really necessary to rely on the
    user being careful enough.

    --
    (c) John Stockton, nr London, UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
    Web <http://www.merlyn.demon.co.uk/> - FAQqish topics, acronyms and links;
    Astro stuff via astron-1.htm, gravity0.htm ; quotings.htm, pascal.htm, etc.
    No Encoding. Quotes before replies. Snip well. Write clearly. Don't Mail News.
     
    Dr J R Stockton, Sep 15, 2011
    #11
  12. In comp.lang.javascript message <Xns9F60C00978E66nevermind@94.75.214.39>
    , Wed, 14 Sep 2011 22:53:01, Mike Duffy <> posted:

    >jwcarlton <> wrote in news:d69e1a77-5741-442e-b783-
    >:
    >
    >> This is a tricky one for me. I'm validating a form, and want to check
    >> if a field entered is a legitimate website address. I don't
    >> necessarily need to ensure that the site works (I can do that later),

    >
    >If you are going to do that later anyway, why even bother to try to parse
    >it first? Are you not just wasting effort?
    >
    >Let the DNS server do the work.


    If one attempts to validate over the Net, the user may have to wait
    several seconds for an answer. That is annoying when not necessary.


    One should realise that there three possible states of validation : yes,
    no, and don't know.

    Only a net test can assure the user that the URL is valid; and even then
    it may not remain valid - and it still may not be the right one.

    But some strings, liable to be entered in error, can be ruled out as
    possible URLs by a local check, with more or less confidence. An empty
    string cannot, I think, be a URL, even if a default protocol is added.
    Probably there must be at least one dot in the URL, though for all I
    know something other than \u002E might be allowed in Asian URLs
    nowadays. The final character probably has to be a letter, but perhaps
    not necessarily in the ranges A-Z a-z.

    One should test for cannot-be-right input at the client end, as far as
    that can be reasonably done in safety - which includes not only the
    soundness of the intended algorithm, but also the coder's ability to get
    it right.

    --
    (c) John Stockton, nr London UK. ?@merlyn.demon.co.uk IE8 FF3 Op12 Sf5 Cr12
    news:comp.lang.javascript FAQ <http://www.jibbering.com/faq/index.html>.
    <http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
    <http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
     
    Dr J R Stockton, Sep 16, 2011
    #12
  13. jwcarlton

    Mike Duffy Guest

    Dr J R Stockton <> wrote in news:qaYF
    $:

    > But some strings, liable to be entered in error, can be ruled out as
    > possible URLs by a local check, with more or less confidence.


    True. But the most common types if miss-spelling (as well as the most
    common spells of miss-typing) will not be within the "http://" part of the
    URL, which might not even be there if it is assumed to be such by the
    application. Usually an error will be an ommission, duplication, or
    transposition of alphanumeric characters. Mistakes such as this will always
    evaluate as valid and will need to be run through the network in any case.

    In other words, yes, you can very quickly (client side) notice it when a
    "." has been entered as a ",". But most typing mistakes will not be
    noticed. Is it worth making your code more complicated to do this?

    It will not speed up the obligatory check which will need to be done
    afterwards. And it introduces more code which increases the chance of your
    user seeing one of those pesky JS error boxes.

    --
    http://pages.videotron.ca/duffym/index.htm#
     
    Mike Duffy, Sep 17, 2011
    #13
  14. In comp.lang.javascript message <Xns9F62E15E755DCnevermind@94.75.214.39>
    , Sat, 17 Sep 2011 02:09:16, Mike Duffy <> posted:

    >Dr J R Stockton <> wrote in news:qaYF
    >$:
    >
    >> But some strings, liable to be entered in error, can be ruled out as
    >> possible URLs by a local check, with more or less confidence.

    >
    >True. But the most common types if miss-spelling (as well as the most
    >common spells of miss-typing) will not be within the "http://" part of the
    >URL, which might not even be there if it is assumed to be such by the
    >application. Usually an error will be an ommission, duplication, or
    >transposition of alphanumeric characters. Mistakes such as this will always
    >evaluate as valid and will need to be run through the network in any case.
    >
    >In other words, yes, you can very quickly (client side) notice it when a
    >"." has been entered as a ",". But most typing mistakes will not be
    >noticed. Is it worth making your code more complicated to do this?


    Yes, provided that the code added is short, within the competence of the
    code, and written with a sufficient knowledge of the RFCs.

    A moderately careful user will look at the input fields before
    submission, but only moderately carefully. Comma-for-dot is an easy
    error to make and to miss. Another is entering a wrong value entirely,
    perhaps a telephone number. Another is not entering anything.

    Such are worth finding, for the user's point of view, because they allow
    an immediate response.

    >It will not speed up the obligatory check which will need to be done
    >afterwards. And it introduces more code which increases the chance of your
    >user seeing one of those pesky JS error boxes.


    If one cannot, with testing, reliably code a client-side test for
    "at least one dot", then one should not be coding such applications.

    If the server-side code logs the reasons for its rejections, one may see
    other client-side tests that would be useful.

    --
    (c) John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
    Web <http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
    Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
    Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)
     
    Dr J R Stockton, Sep 18, 2011
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jon paugh
    Replies:
    1
    Views:
    884
  2. cc dd via .NET 247
    Replies:
    1
    Views:
    435
    GrantMagic
    Sep 23, 2004
  3. Replies:
    5
    Views:
    961
    X-Centric
    Jun 30, 2005
  4. Just D.
    Replies:
    0
    Views:
    511
    Just D.
    Aug 11, 2004
  5. Tim Fröglich

    Getting ID, calling url, search for value, return value

    Tim Fröglich, Jan 7, 2006, in forum: ASP .Net Web Services
    Replies:
    1
    Views:
    152
    Josh Twist
    Jan 10, 2006
Loading...

Share This Page