hash from long url to short url

Discussion in 'C Programming' started by joe, Feb 22, 2008.

  1. joe

    joe Guest

    Hello anyone knows how to write a funtion to genereate a tiny url with
    letters and numbers only. Something almost always unique. THanks.
    joe, Feb 22, 2008
    #1
    1. Advertising

  2. "joe" <> wrote in message
    news:...
    > Hello anyone knows how to write a funtion to genereate a tiny url with
    > letters and numbers only. Something almost always unique. THanks.
    >

    It's inherently impossible to collapse 36^N unique URLs to 36^(N/4) unique
    tiny urls.
    However there's a hash function on my websites. It's in one of the free
    chapters under Basic Algorithms. I recommend that you split the string into
    2, and generate 2 unsigned longs. The chance of a collision is so low as to
    be negligible. Then use the modulus operation to reduce to alpahanumeric.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
    Malcolm McLean, Feb 22, 2008
    #2
    1. Advertising

  3. joe

    santosh Guest

    Malcolm McLean wrote:

    >
    > "joe" <> wrote in message
    >

    news:...
    >> Hello anyone knows how to write a funtion to genereate a tiny url
    >> with letters and numbers only. Something almost always unique.
    >> THanks.
    >>

    > It's inherently impossible to collapse 36^N unique URLs to 36^(N/4)
    > unique tiny urls.
    > However there's a hash function on my websites. It's in one of the
    > free chapters under Basic Algorithms. I recommend that you split the
    > string into 2, and generate 2 unsigned longs. The chance of a
    > collision is so low as to be negligible. Then use the modulus
    > operation to reduce to alpahanumeric.


    Also a Google search seems to show up a lot of TinyURL generators though
    not, AFAICS, in C. Maybe the OP can study the code and rewrite it in C.
    santosh, Feb 22, 2008
    #3
  4. "Malcolm McLean" <> writes:

    > "joe" <> wrote in message
    > news:...
    >> Hello anyone knows how to write a funtion to genereate a tiny url with
    >> letters and numbers only. Something almost always unique. THanks.
    >>

    > It's inherently impossible to collapse 36^N unique URLs to 36^(N/4)
    > unique tiny urls.
    > However there's a hash function on my websites. It's in one of the
    > free chapters under Basic Algorithms. I recommend that you split the
    > string into 2, and generate 2 unsigned longs. The chance of a
    > collision is so low as to be negligible.


    I have a feeling this is a bad idea[1]. The Bernstein hash function
    (which is the one on your) site uses unsigned long but will work just
    as well with any unsigned integer type. If the OP has access to
    longer integers, it seems safer to simply use a longer integer that
    than generate two hashes from two parts of the string.

    [1] I have no formal argument in support of this, just the feeling
    that, since URLs often have similar parts you are wasting the hash
    function's mixing ability if you split the string. Anyway, even if
    there is no reason to worry here, why take the risk -- unless, of
    course, you don't have longer integer types.

    --
    Ben.
    Ben Bacarisse, Feb 22, 2008
    #4
  5. "Ben Bacarisse" <> wrote in message
    > I have a feeling this is a bad idea[1]. The Bernstein hash function
    > (which is the one on your) site uses unsigned long but will work just
    > as well with any unsigned integer type. If the OP has access to
    > longer integers, it seems safer to simply use a longer integer that
    > than generate two hashes from two parts of the string.
    >
    > [1] I have no formal argument in support of this, just the feeling
    > that, since URLs often have similar parts you are wasting the hash
    > function's mixing ability if you split the string. Anyway, even if
    > there is no reason to worry here, why take the risk -- unless, of
    > course, you don't have longer integer types.
    >

    You might well be right. If two URLs share the same introduction, which is
    extremely plausible, then effectively you are wasting a long.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
    Malcolm McLean, Feb 22, 2008
    #5
  6. joe

    CBFalconer Guest

    joe wrote:
    >
    > Hello anyone knows how to write a funtion to genereate a tiny url with
    > letters and numbers only. Something almost always unique. THanks.


    Yes. No. You're welcome.

    --
    [mail]: Chuck F (cbfalconer at maineline dot net)
    [page]: <http://cbfalconer.home.att.net>
    Try the download section.



    --
    Posted via a free Usenet account from http://www.teranews.com
    CBFalconer, Feb 22, 2008
    #6
  7. joe

    Flash Gordon Guest

    Malcolm McLean wrote, On 22/02/08 15:23:
    >
    > "Ben Bacarisse" <> wrote in message
    >> I have a feeling this is a bad idea[1]. The Bernstein hash function
    >> (which is the one on your) site uses unsigned long but will work just
    >> as well with any unsigned integer type. If the OP has access to
    >> longer integers, it seems safer to simply use a longer integer that
    >> than generate two hashes from two parts of the string.
    >>
    >> [1] I have no formal argument in support of this, just the feeling
    >> that, since URLs often have similar parts you are wasting the hash
    >> function's mixing ability if you split the string. Anyway, even if
    >> there is no reason to worry here, why take the risk -- unless, of
    >> course, you don't have longer integer types.
    >>

    > You might well be right. If two URLs share the same introduction, which
    > is extremely plausible, then effectively you are wasting a long.


    There is also a significant possibility of two URLs sharing a
    significant tail. E.g.
    www.somesite/area/somewhere?pageid=1234&action=read&format=prettyformat
    --
    Flash Gordon
    Flash Gordon, Feb 22, 2008
    #7
  8. joe

    joe Guest

    May be i could get the char value of each letter multiply it by the
    position in the string and come up with a number. Just an idea.
    On Feb 22, 2:31 pm, CBFalconer <> wrote:
    > joe wrote:
    >
    > > Hello anyone knows how to write a funtion to genereate a tiny url with
    > > letters and numbers only. Something almost always unique. THanks.

    >
    > Yes. No. You're welcome.
    >
    > --
    > [mail]: Chuck F (cbfalconer at maineline dot net)
    > [page]: <http://cbfalconer.home.att.net>
    > Try the download section.
    >
    > --
    > Posted via a free Usenet account fromhttp://www.teranews.com
    joe, Feb 22, 2008
    #8
  9. joe

    Default User Guest

    Re: hash from long url to short url - TPA

    joe wrote:

    > May be i could get the char value of each letter multiply it by the
    > position in the string and come up with a number. Just an idea.


    Please don't top-post. Your replies belong following or interspersed
    with properly trimmed quotes. See the majority of other posts in the
    newsgroup, or:
    <http://www.caliburn.nl/topposting.html>
    Default User, Feb 22, 2008
    #9
  10. joe <> writes:

    >> > Hello anyone knows how to write a funtion to genereate a tiny url with
    >> > letters and numbers only. Something almost always unique. THanks.


    > May be i could get the char value of each letter multiply it by the
    > position in the string and come up with a number. Just an idea.


    [Top posting corrected]. That is one way, but there has been a lot of
    study of how to turn a string into a number and your method has no
    particular advantage over others that have been found to be useful.
    The advice you've had to use a known hash function is good advice.

    --
    Ben.
    Ben Bacarisse, Feb 23, 2008
    #10
  11. joe

    Paul Hsieh Guest

    On Feb 22, 5:26 am, joe <> wrote:
    > Hello anyone knows how to write a funtion to genereate a tiny url with
    > letters and numbers only. Something almost always unique. THanks.


    As mentioned by santosh, in practice you can essentially try to
    duplicate what the site "tinyUrl.com" does, however, either way it
    requires that you solve the more general problem of mapping limitless
    strings to fixed sized hashes that are "almost always unique". So let
    us solve this latter problem before we address the whole problem.

    To map a set of variable length strings to something "pseudo-unique"
    fundamentally requires an understanding of "The Birthday
    Paradox" (look it up on Wikipedia.) However to make a long story
    short you should study "secure hashing". Open source implementations
    of SHA-256 and WHIRLPOOL exist, which are probably the "safest" bets
    right now. The upshot of this is that you can transform variable
    length strings to something like 160 or 256 bits in a way that even an
    expert adversary cannot find any collisions in.

    Now for something like an URL, however, you are going to need to
    transform the output to text. So use Base-64 or hex encoding or
    something like that to turn the binary to alphanumerics. Lets call
    this a "long encoding" of the URL.

    Now as for making it tiny, the only way I can see how to do this is to
    basically retain a database of all such hashings done and always
    return the shortest possible truncated version of the "long
    encoding" (which is calculated as described above) that is unique.
    Then you can just make the URL something like: www.smalladdress.com/<URL>
    and you would probably use some Apache configuration trick to turn
    this into a call to php or cgi which takes the <URL> and looks it up
    in your DB and produces the original URL as a redirect. I assume that
    this, roughly speaking, is what tinyurl.com does.

    --
    Paul Hsieh
    http://www.pobox.com/~qed/
    http://bstring.sf.net/
    Paul Hsieh, Feb 23, 2008
    #11
  12. joe

    Richard Guest

    Re: hash from long url to short url - TPA

    "Default User" <> writes:

    > joe wrote:
    >
    >> May be i could get the char value of each letter multiply it by the
    >> position in the string and come up with a number. Just an idea.

    >
    > Please don't top-post. Your replies belong following or interspersed
    > with properly trimmed quotes. See the majority of other posts in the
    > newsgroup, or:
    > <http://www.caliburn.nl/topposting.html>


    CBFalconer and Default User actively contributing to CLC in their own
    inimitable manner again. Nice.
    Richard, Feb 23, 2008
    #12
  13. Re: hash from long url to short url - TPA

    On 23 Feb 2008 at 20:37, Richard wrote:
    > "Default User" <> writes:
    >
    >> joe wrote:
    >>
    >>> May be i could get the char value of each letter multiply it by the
    >>> position in the string and come up with a number. Just an idea.

    >>
    >> Please don't top-post. Your replies belong following or interspersed
    >> with properly trimmed quotes. See the majority of other posts in the
    >> newsgroup, or:
    >> <http://www.caliburn.nl/topposting.html>

    >
    > CBFalconer and Default User actively contributing to CLC in their own
    > inimitable manner again. Nice.


    Yes - with "contributors" like these, who needs trolls?
    Antoninus Twink, Feb 24, 2008
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. George Marsaglia

    Assigning unsigned long to unsigned long long

    George Marsaglia, Jul 8, 2003, in forum: C Programming
    Replies:
    1
    Views:
    674
    Eric Sosman
    Jul 8, 2003
  2. Daniel Rudy

    unsigned long long int to long double

    Daniel Rudy, Sep 19, 2005, in forum: C Programming
    Replies:
    5
    Views:
    1,186
    Peter Shaggy Haywood
    Sep 20, 2005
  3. David Geering

    longs, long longs, short short long ints . . . huh?!

    David Geering, Jan 8, 2007, in forum: C Programming
    Replies:
    15
    Views:
    558
    Keith Thompson
    Jan 11, 2007
  4. veryhotsausage
    Replies:
    1
    Views:
    1,799
    veryhotsausage
    Jul 4, 2008
  5. rp
    Replies:
    1
    Views:
    517
    red floyd
    Nov 10, 2011
Loading...

Share This Page