URL normalization

Discussion in 'Java' started by Chris, May 3, 2004.

  1. Chris

    Chris Guest

    Chris, May 3, 2004
    #1
    1. Advertising

  2. On Mon, 3 May 2004 12:55:12 -0500, Chris wrote:

    > Does anyone know of code to normalize a URL? We're processing a lot of URLs,
    > and we get a lot of bad ones:


    Why, where do they originate?

    It is probably better strategy to put
    editing and checking of URL's at the
    input stage, if you can, so you never get
    these bad URL's.

    > http://mydomain.com/mydir/../anotherdir/ (/../)


    So does this exist?
    http://mydomain.com/anotherdir/

    URL's, AFAIR, will resolve that correctly
    if it actually poinsts somewhere that exists.
    (And if it does not, how do you determine what
    the user actually meant?)

    > http://mydomain.com/mydir\/anotherdir/ (backslashes)
    > http://mydomain.com/mydir//anotherdir/ (extra slashes)
    > http://mydomain.com/mydir/file name here.htm (filename with spaces)


    The last one can be solved by using
    URLEncoder.encode()

    > I'm sure there's a lot of stuff out there that I haven't seen yet, either.
    > I'm wondering if anyone has already written a class that will clean this
    > stuff up.


    It sounds like it needs to have DWIMNWIS
    functionality, or implement the Psychic
    interface. ;-)

    --
    Andrew Thompson
    http://www.PhySci.org/ Open-source software suite
    http://www.PhySci.org/codes/ Web & IT Help
    http://www.1point1C.org/ Science & Technology
     
    Andrew Thompson, May 3, 2004
    #2
    1. Advertising

  3. Chris

    Real Gagnon Guest

    > Does anyone know of code to normalize a URL? We're processing a lot of
    > URLs, and we get a lot of bad ones:
    >
    > http://mydomain.com/mydir/../anotherdir/ (/../)
    > http://mydomain.com/mydir\/anotherdir/ (backslashes)
    > http://mydomain.com/mydir//anotherdir/ (extra slashes)
    > http://mydomain.com/mydir/file name here.htm (filename with spaces)


    You may want to check the java.net.URI class, you will find a normalize()
    method to do that.

    Bye.
    --
    Real Gagnon from Quebec, Canada
    * Looking for Java or PB snippets ? Visit Real's How-to
    * http://www.rgagnon.com/howto.html
     
    Real Gagnon, May 4, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    1,053
  2. Aaron Gray

    HTML normalization

    Aaron Gray, Feb 2, 2006, in forum: HTML
    Replies:
    15
    Views:
    1,457
    Andy Dingley
    Feb 5, 2006
  3. Rakesh Kumar

    STL - Vector - Normalization ?

    Rakesh Kumar, Apr 22, 2004, in forum: C++
    Replies:
    14
    Views:
    8,515
    Siemel Naran
    Apr 28, 2004
  4. William Ahern

    Unicode Normalization of Text Streams

    William Ahern, Sep 14, 2006, in forum: C Programming
    Replies:
    4
    Views:
    363
    Simon Biber
    Sep 19, 2006
  5. Chris Dollin

    normalization of pointers...

    Chris Dollin, May 30, 2007, in forum: C Programming
    Replies:
    34
    Views:
    1,706
    Chris Hills
    Jun 3, 2007
Loading...

Share This Page