Hash Function (str -> 32 bits)

B

barcaroller

I couldn't find a message-digest newsgroup, so I posted here. I have a C
function that converts a string of arbitrary length to a 32-bit hash value.
I realize this is overkill but I used OpenSSL's sha1() to convert the string
to a SHA-1 160-bit message digest.

The question is: how do I use these 160 bits to get my final 32 bits?
Should I use the first 32 bits or the last 32 bits or is there yet another
method?

Note: I understand that the 32-bit result is much more likely to cause
collisions than the SHA-1 message digest.
 
W

websnarf

barcaroller said:
I couldn't find a message-digest newsgroup, so I posted here.

sci.crypt is the group you want. But comp.programming would have
worked just as well.
[...] I have a C
function that converts a string of arbitrary length to a 32-bit hash value.
I realize this is overkill but I used OpenSSL's sha1() to convert the string
to a SHA-1 160-bit message digest.

Yes that is overkill. If all you want is 32 bits, then you can use the
function found here:

http://www.pobox.com/~qed/hash.html

Its a smaller and much faster piece of code.
The question is: how do I use these 160 bits to get my final 32 bits?
Should I use the first 32 bits or the last 32 bits or is there yet another
method?

You may pick *any* of the 32 bits in the digest. That's kind of the
point of it.
 
B

barcaroller

Yes that is overkill. If all you want is 32 bits, then you can use the
function found here:

http://www.pobox.com/~qed/hash.html

Its a smaller and much faster piece of code.

You may pick *any* of the 32 bits in the digest. That's kind of the
point of it.


Thanks for your response. Two questions for you:

1. Does the function you mentioned cause less collisions than SHA-1
converted to 32 bits?

2. Are you saying that choosing the last 32 bits or the first 32 bits in the
SHA-1 message digest would (on average) result in the same number of
collisions?
 
W

websnarf

barcaroller said:
Thanks for your response. Two questions for you:

1. Does the function you mentioned cause less collisions than SHA-1
converted to 32 bits?

It depends on the situation. If you are trying to formulaically attack
the hash, then SHA-1 is clearly better, since you require a brute force
attack for *each* collision (my hash will fall to deterministic attacks
which can be generalized to any size). If you are expecting random,
uniform, or otherwise typical data, then they will behave practically
the same.
2. Are you saying that choosing the last 32 bits or the first 32 bits in the
SHA-1 message digest would (on average) result in the same number of
collisions?

Yes -- minimal.
 
C

CBFalconer

barcaroller said:
I couldn't find a message-digest newsgroup, so I posted here. I
have a C function that converts a string of arbitrary length to a
32-bit hash value. I realize this is overkill but I used OpenSSL's
sha1() to convert the string to a SHA-1 160-bit message digest.

The question is: how do I use these 160 bits to get my final 32
bits? Should I use the first 32 bits or the last 32 bits or is
there yet another method?

You can find some simple yet effective string hashing functions in
the source code for hashlib, together with some references to other
possible functions. Hashlib is at:

<http://cbfalconer.home.att.net/download/hashlib.zip>

Since hashlib was originally developed to try out hash functions,
it can still be used for the same purpose. It collects suitable
statistics during a run.

Hashlib is implemented in C, but this general subject is better
suited to comp.programming.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
M

Michael Wojcik

As Paul already noted, this is off-topic for comp.lang.c, and SHA-1
is probably not well-suited to your needs.
Yes -- minimal.

More formally, you would have a collision with probability 1/2 after
hashing 2**16 objects in this fashion, per the "birthday paradox".
That's the best you can do for a pseudorandom hash, or for any hash
over random input. (If you know your input is biased, you can bias
your hash accordingly and reduce chances of collision further.)

Note this assumes that SHA-1 is not broken, or is broken in a fashion
that does not bias it.

SHA-1 is a cryptographic message digest. Any non-broken crypto
digest of length N bits must have the full avalanche property, or
it's advertising more bits of strength than it actually has. That
means each input bits affects each output bit independently with
probability 1/2. And that means that the output bits in the final
result are independent.

It doesn't matter how you reduce those N bits (assuming N > 32) to 32
bits, so long as you allow each of your output bits to be fairly
determined by at least bit of entropy from those N bits. You can
take any 32 bits from the digest; you can take 64 bits as two groups
of 32 and XOR them together. Makes no difference.

Obviously you can misapply the input bits and reduce entropy, eg by
ORing two groups of 32 bits together, which will bias bits toward 1.
But any 32 bits picked at random from the output of a non-broken
digest will be equally well-distributed. They each have value 1 with
probability 1/2.
 
W

Walter Roberson

As Paul already noted, this is off-topic for comp.lang.c, and SHA-1
is probably not well-suited to your needs.
More formally, you would have a collision with probability 1/2 after
hashing 2**16 objects in this fashion, per the "birthday paradox".

Minor OT correction: you would need 1L<<16 * pow(2*log(2),1/2.)
That's about 1.1774 times larger than your count.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,241
Latest member
Lisa1997

Latest Threads

Top