shorten SHA1-hash

P

Peter Skovgaard

In one of my railsapplications I have system for up/downloading files.
Since I don't want people to browse all the files, but anyone should
be able to link to his/her files each file has it's own unique URL -
all pretty standard.

Right now the URL just is /download/sha-sum-of-the-file (since I have
the SHA anyway), but the length of the url is bugging me.

Now - the problem: The SHA1-sum is, as usually, just represented in
base16 which makes it 40 chars long. However, in an url we have at
least a-zA-Z0-9 (and also some special characters) which gives us the
opportunity to represent the SHA-sum in at least base62 which should
make it about half the size.

I know that I in this case just could store an extra little random
string in my database and link to /download/little-string, but that's
not the point here :) I would just like to hear you: How few
(printable) characters can you shorten a SHA-sum down to?
 
T

Tom Werner

Peter said:
In one of my railsapplications I have system for up/downloading files.
Since I don't want people to browse all the files, but anyone should
be able to link to his/her files each file has it's own unique URL -
all pretty standard.

Right now the URL just is /download/sha-sum-of-the-file (since I have
the SHA anyway), but the length of the url is bugging me.

Now - the problem: The SHA1-sum is, as usually, just represented in
base16 which makes it 40 chars long. However, in an url we have at
least a-zA-Z0-9 (and also some special characters) which gives us the
opportunity to represent the SHA-sum in at least base62 which should
make it about half the size.

I know that I in this case just could store an extra little random
string in my database and link to /download/little-string, but that's
not the point here :) I would just like to hear you: How few
(printable) characters can you shorten a SHA-sum down to?

Modified Base64 for URLs is probably your best and easiest route.

Wikipedia (http://en.wikipedia.org/wiki/Base64) explains it well:

Base64 encoding can be helpful when fairly lengthy identifying
information is used in an HTTP environment. Hibernate
<http://en.wikipedia.org/wiki/Hibernate_(Java)>, a database
persistence framework for Java
<http://en.wikipedia.org/wiki/Java_(programming_language)> objects,
uses Base64 encoding to encode a relatively large unique id (generally
128-bit UUIDs <http://en.wikipedia.org/wiki/UUID>) into a string for use
as an HTTP parameter in HTTP forms or HTTP GET URLs
<http://en.wikipedia.org/wiki/URL>. Also, many applications need to
encode binary data in a way that is convenient for inclusion in URLs,
including in hidden web form fields, and Base64 is a convenient encoding
to render them in not only a compact way, but in a relatively unreadable
one when trying to obscure the nature of data from a casual human observer.

Using a URL-encoder on standard Base64, however, is inconvenient as it
will translate the '+' and '/' characters into special '%XX' hexadecimal
sequences ('+' = '%2B' and '/' = '%2F'). When this is later used with
database storage or across heterogeneous systems, they will themselves
choke on the '%' character generated by URL-encoders (because the '%'
character is also used in ANSI SQL as a wildcard).

For this reason, a *modified Base64 for URL* variant exists, where /no/
padding '=' will be used, and the '+' and '/' characters of standard
Base64 are respectively replaced by '*' and '-', so that using URL
encoders/decoders is no longer necessary and has no impact on the length
of the encoded value, leaving the same encoded form intact for use in
relational databases, web forms, and object identifiers in general.

Tom

--
* Libraries:
Chronic (chronic.rubyforge.org)
God (god.rubyforge.org)
* Site:
rubyisawesome.com
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top