# hash from long url to short url

Discussion in 'C Programming' started by joe, Feb 22, 2008.

1. ### joeGuest

Hello anyone knows how to write a funtion to genereate a tiny url with
letters and numbers only. Something almost always unique. THanks.

joe, Feb 22, 2008

2. ### Malcolm McLeanGuest

"joe" <> wrote in message
news:...
> Hello anyone knows how to write a funtion to genereate a tiny url with
> letters and numbers only. Something almost always unique. THanks.
>

It's inherently impossible to collapse 36^N unique URLs to 36^(N/4) unique
tiny urls.
However there's a hash function on my websites. It's in one of the free
chapters under Basic Algorithms. I recommend that you split the string into
2, and generate 2 unsigned longs. The chance of a collision is so low as to
be negligible. Then use the modulus operation to reduce to alpahanumeric.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Malcolm McLean, Feb 22, 2008

3. ### santoshGuest

Malcolm McLean wrote:

>
> "joe" <> wrote in message
>

news:...
>> Hello anyone knows how to write a funtion to genereate a tiny url
>> with letters and numbers only. Something almost always unique.
>> THanks.
>>

> It's inherently impossible to collapse 36^N unique URLs to 36^(N/4)
> unique tiny urls.
> However there's a hash function on my websites. It's in one of the
> free chapters under Basic Algorithms. I recommend that you split the
> string into 2, and generate 2 unsigned longs. The chance of a
> collision is so low as to be negligible. Then use the modulus
> operation to reduce to alpahanumeric.

Also a Google search seems to show up a lot of TinyURL generators though
not, AFAICS, in C. Maybe the OP can study the code and rewrite it in C.

santosh, Feb 22, 2008
4. ### Ben BacarisseGuest

"Malcolm McLean" <> writes:

> "joe" <> wrote in message
> news:...
>> Hello anyone knows how to write a funtion to genereate a tiny url with
>> letters and numbers only. Something almost always unique. THanks.
>>

> It's inherently impossible to collapse 36^N unique URLs to 36^(N/4)
> unique tiny urls.
> However there's a hash function on my websites. It's in one of the
> free chapters under Basic Algorithms. I recommend that you split the
> string into 2, and generate 2 unsigned longs. The chance of a
> collision is so low as to be negligible.

I have a feeling this is a bad idea[1]. The Bernstein hash function
(which is the one on your) site uses unsigned long but will work just
as well with any unsigned integer type. If the OP has access to
longer integers, it seems safer to simply use a longer integer that
than generate two hashes from two parts of the string.

[1] I have no formal argument in support of this, just the feeling
that, since URLs often have similar parts you are wasting the hash
function's mixing ability if you split the string. Anyway, even if
there is no reason to worry here, why take the risk -- unless, of
course, you don't have longer integer types.

--
Ben.

Ben Bacarisse, Feb 22, 2008
5. ### Malcolm McLeanGuest

"Ben Bacarisse" <> wrote in message
> I have a feeling this is a bad idea[1]. The Bernstein hash function
> (which is the one on your) site uses unsigned long but will work just
> as well with any unsigned integer type. If the OP has access to
> longer integers, it seems safer to simply use a longer integer that
> than generate two hashes from two parts of the string.
>
> [1] I have no formal argument in support of this, just the feeling
> that, since URLs often have similar parts you are wasting the hash
> function's mixing ability if you split the string. Anyway, even if
> there is no reason to worry here, why take the risk -- unless, of
> course, you don't have longer integer types.
>

You might well be right. If two URLs share the same introduction, which is
extremely plausible, then effectively you are wasting a long.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Malcolm McLean, Feb 22, 2008
6. ### CBFalconerGuest

joe wrote:
>
> Hello anyone knows how to write a funtion to genereate a tiny url with
> letters and numbers only. Something almost always unique. THanks.

Yes. No. You're welcome.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

CBFalconer, Feb 22, 2008
7. ### Flash GordonGuest

Malcolm McLean wrote, On 22/02/08 15:23:
>
> "Ben Bacarisse" <> wrote in message
>> I have a feeling this is a bad idea[1]. The Bernstein hash function
>> (which is the one on your) site uses unsigned long but will work just
>> as well with any unsigned integer type. If the OP has access to
>> longer integers, it seems safer to simply use a longer integer that
>> than generate two hashes from two parts of the string.
>>
>> [1] I have no formal argument in support of this, just the feeling
>> that, since URLs often have similar parts you are wasting the hash
>> function's mixing ability if you split the string. Anyway, even if
>> there is no reason to worry here, why take the risk -- unless, of
>> course, you don't have longer integer types.
>>

> You might well be right. If two URLs share the same introduction, which
> is extremely plausible, then effectively you are wasting a long.

There is also a significant possibility of two URLs sharing a
significant tail. E.g.
--
Flash Gordon

Flash Gordon, Feb 22, 2008
8. ### joeGuest

May be i could get the char value of each letter multiply it by the
position in the string and come up with a number. Just an idea.
On Feb 22, 2:31 pm, CBFalconer <> wrote:
> joe wrote:
>
> > Hello anyone knows how to write a funtion to genereate a tiny url with
> > letters and numbers only. Something almost always unique. THanks.

>
> Yes. No. You're welcome.
>
> --
> [mail]: Chuck F (cbfalconer at maineline dot net)
> [page]: <http://cbfalconer.home.att.net>
>
> --
> Posted via a free Usenet account fromhttp://www.teranews.com

joe, Feb 22, 2008
9. ### Default UserGuest

Re: hash from long url to short url - TPA

joe wrote:

> May be i could get the char value of each letter multiply it by the
> position in the string and come up with a number. Just an idea.

with properly trimmed quotes. See the majority of other posts in the
newsgroup, or:
<http://www.caliburn.nl/topposting.html>

Default User, Feb 22, 2008
10. ### Ben BacarisseGuest

joe <> writes:

>> > Hello anyone knows how to write a funtion to genereate a tiny url with
>> > letters and numbers only. Something almost always unique. THanks.

> May be i could get the char value of each letter multiply it by the
> position in the string and come up with a number. Just an idea.

[Top posting corrected]. That is one way, but there has been a lot of
study of how to turn a string into a number and your method has no
particular advantage over others that have been found to be useful.

--
Ben.

Ben Bacarisse, Feb 23, 2008
11. ### Paul HsiehGuest

On Feb 22, 5:26 am, joe <> wrote:
> Hello anyone knows how to write a funtion to genereate a tiny url with
> letters and numbers only. Something almost always unique. THanks.

As mentioned by santosh, in practice you can essentially try to
duplicate what the site "tinyUrl.com" does, however, either way it
requires that you solve the more general problem of mapping limitless
strings to fixed sized hashes that are "almost always unique". So let
us solve this latter problem before we address the whole problem.

To map a set of variable length strings to something "pseudo-unique"
fundamentally requires an understanding of "The Birthday
Paradox" (look it up on Wikipedia.) However to make a long story
short you should study "secure hashing". Open source implementations
of SHA-256 and WHIRLPOOL exist, which are probably the "safest" bets
right now. The upshot of this is that you can transform variable
length strings to something like 160 or 256 bits in a way that even an
expert adversary cannot find any collisions in.

Now for something like an URL, however, you are going to need to
transform the output to text. So use Base-64 or hex encoding or
something like that to turn the binary to alphanumerics. Lets call
this a "long encoding" of the URL.

Now as for making it tiny, the only way I can see how to do this is to
basically retain a database of all such hashings done and always
return the shortest possible truncated version of the "long
encoding" (which is calculated as described above) that is unique.
Then you can just make the URL something like: www.smalladdress.com/<URL>
and you would probably use some Apache configuration trick to turn
this into a call to php or cgi which takes the <URL> and looks it up
in your DB and produces the original URL as a redirect. I assume that
this, roughly speaking, is what tinyurl.com does.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Paul Hsieh, Feb 23, 2008
12. ### RichardGuest

Re: hash from long url to short url - TPA

"Default User" <> writes:

> joe wrote:
>
>> May be i could get the char value of each letter multiply it by the
>> position in the string and come up with a number. Just an idea.

>
> with properly trimmed quotes. See the majority of other posts in the
> newsgroup, or:
> <http://www.caliburn.nl/topposting.html>

CBFalconer and Default User actively contributing to CLC in their own
inimitable manner again. Nice.

Richard, Feb 23, 2008
13. ### Antoninus TwinkGuest

Re: hash from long url to short url - TPA

On 23 Feb 2008 at 20:37, Richard wrote:
> "Default User" <> writes:
>
>> joe wrote:
>>
>>> May be i could get the char value of each letter multiply it by the
>>> position in the string and come up with a number. Just an idea.

>>
>> with properly trimmed quotes. See the majority of other posts in the
>> newsgroup, or:
>> <http://www.caliburn.nl/topposting.html>

>
> CBFalconer and Default User actively contributing to CLC in their own
> inimitable manner again. Nice.

Yes - with "contributors" like these, who needs trolls?

Antoninus Twink, Feb 24, 2008