Fast 1 -> 1 non md5 hash algorithm

Discussion in 'C Programming' started by thecodemachine@gmail.com, Jan 9, 2006.

  1. Guest

    Hi,

    I'm looking for a fast and simple one to one hash function, suitable
    for longer strings (up to 2048 in length). I'd like keys to be
    relatively short, I doubt I'd be creating more than 256 keys..

    I could use md5 but even that is more than I need, and I assume not the
    fastest algorithm (?), so I'm reluctant to use it.

    I've been looking around a lot, not finding much anything come to mind?

    thanks!
    Calin
    , Jan 9, 2006
    #1
    1. Advertising

  2. Eric Sosman Guest

    wrote On 01/09/06 16:17,:
    > Hi,
    >
    > I'm looking for a fast and simple one to one hash function, suitable
    > for longer strings (up to 2048 in length). I'd like keys to be
    > relatively short, I doubt I'd be creating more than 256 keys..
    >
    > I could use md5 but even that is more than I need, and I assume not the
    > fastest algorithm (?), so I'm reluctant to use it.
    >
    > I've been looking around a lot, not finding much anything come to mind?


    The first thing that comes to mind is that you've found
    the wrong newsgroup. comp.lang.c is about the C programming
    language, not about algorithms.

    The second thing that comes to mind is that a one-to-one
    hash function can be extremely simple and extremely fast:
    just use the hashed object as its own hash code. (Note that
    a code with either fewer or more bits than the original cannot
    be one-to-one.)

    The third thing that comes to mind is that since the
    second thing didn't come to your mind, you may not have a
    clear idea of what you're trying to do.

    --
    Eric Sosman, Jan 9, 2006
    #2
    1. Advertising

  3. In article <>,
    <> wrote:
    >I'm looking for a fast and simple one to one hash function, suitable
    >for longer strings (up to 2048 in length). I'd like keys to be
    >relatively short, I doubt I'd be creating more than 256 keys..


    >I could use md5 but even that is more than I need, and I assume not the
    >fastest algorithm (?), so I'm reluctant to use it.


    There are lot of different hash functions. In order to be able to
    recommend one, we would have to know:

    - do you need a hash function that is *certain* not to produce
    the same hash for values you wish to distinguish; or

    - are you using a "bucket" hashing system; or

    - are you just going to fill the next available bucket when you have
    a hash collision; or

    - are you going to use a series of rehash functions to attempt to
    resolve hash collisions;

    - are you truly looking for a hash function, or are you looking
    for a digital signature?

    - is this a case in which you can take your time hashing during the
    data analysis phase, but then expect to retrieve the data over and
    over again, so theoretical read performance is much more important than
    time to insert into hash structures? If so, then have you considered
    "perfect hash" ?

    - would it be acceptable to put an arbitrary limit on the number of
    keys if by doing so you could create a noticably more efficient hash?
    For example, would either of 251 or 257 be acceptable limits?

    - if you find out later that you need more than 256 keys, must it be
    easy to expand the maximum number of keys, or would it be acceptable for
    noticable "work" (by the computer) to be involved? Though in this
    particular case it sounds like you won't be needing more than 512 Kb for
    your hash, in the general case someone might be hashing over a million
    entries each of which might be a couple of hundred megabytes, and
    so for them, it could be quite important that as the number of keys
    expanded that the previously processed objects continue to hash into
    exactly the same disk storage location that they did before.
    --
    "No one has the right to destroy another person's belief by
    demanding empirical evidence." -- Ann Landers
    Walter Roberson, Jan 9, 2006
    #3
  4. Guest

    Thanks for your reply.

    > - do you need a hash function that is *certain* not to produce
    > the same hash for values you wish to distinguish; or


    Yes

    > - are you using a "bucket" hashing system; or


    No

    > - are you just going to fill the next available bucket when you have
    > a hash collision; or


    Yes

    > - are you going to use a series of rehash functions to attempt to
    > resolve hash collisions;


    No

    > - are you truly looking for a hash function, or are you looking
    > for a digital signature?


    More a digital signature i suppose, not a hash function as is normally
    used with chained hash table & buckets where some collisions are
    acceptable etc.

    > - is this a case in which you can take your time hashing during the
    > data analysis phase, but then expect to retrieve the data over and
    > over again, so theoretical read performance is much more important than
    > time to insert into hash structures?


    Yes...but time to insert can't be "slow" ...anything that might take
    more than 1 sec would be too slow..though obviously that'd vary widely
    on the machine

    If so, then have you considered
    > "perfect hash" ?


    I noticed that but wasn't sure about it, I'll look into it further
    thank you.

    >
    > - would it be acceptable to put an arbitrary limit on the number of
    > keys if by doing so you could create a noticably more efficient hash?
    > For example, would either of 251 or 257 be acceptable limits?
    >


    definately


    > - if you find out later that you need more than 256 keys, must it be
    > easy to expand the maximum number of keys, or would it be acceptable for
    > noticable "work" (by the computer) to be involved?


    Noticable work would be fine, that's an unlikely case and speed is
    paramount.


    thanks very much!
    Calin
    , Jan 10, 2006
    #4
  5. In article <>, writes:
    >
    > [Please preserve attributions for quoted material.]
    >
    > > - do you need a hash function that is *certain* not to produce
    > > the same hash for values you wish to distinguish; or

    >
    > Yes
    >
    > > - are you just going to fill the next available bucket when you have
    > > a hash collision; or

    >
    > Yes


    This is inconsistent with your first answer. If the hash function is
    guaranteed to avoid collisions, then the bucket will always be
    available.

    > > - are you truly looking for a hash function, or are you looking
    > > for a digital signature?

    >
    > More a digital signature i suppose, not a hash function as is normally
    > used with chained hash table & buckets where some collisions are
    > acceptable etc.


    A hash function is not a digital signature. They are completely
    different operations. Many digital signature algorithms are built
    on top of a cryptographic hash (aka "message digest"), but not all
    are (eg encrypt-and-sign modes), and a digital signature built on
    a cryptographic hash is much more than just a hash.

    I would submit that if you think you may be looking for any sort
    of cryptographic primitive, you should know *exactly* what you
    need, and where to find it. Otherwise, either you don't need crypto,
    or you're going to use it incorrectly. The history of correct
    crypto implemented by non-experts is very thin.

    > > If so, then have you considered
    > > "perfect hash" ?

    >
    > I noticed that but wasn't sure about it, I'll look into it further
    > thank you.


    Perfect hashing may be a solution for you.

    In your first message you mentioned cryptographic hashes such as
    MD5 and SHA-1. While they're very unlikely to accidentally produce
    a collision, in practice it's now quite feasible to deliberately
    produce MD5 collisions, and SHA-1 may not be far behind. And they
    are not fast - in fact, they're deliberately slow, to make it more
    difficult to mount certain kinds of attacks.

    --
    Michael Wojcik

    This record comes with a coupon that wins you a trip around the world.
    -- Pizzicato Five
    Michael Wojcik, Jan 12, 2006
    #5
  6. On 12 Jan 2006 18:13:31 GMT, (Michael Wojcik)
    wrote:
    <snip>
    > I would submit that if you think you may be looking for any sort
    > of cryptographic primitive, you should know *exactly* what you
    > need, and where to find it. Otherwise, either you don't need crypto,
    > or you're going to use it incorrectly. The history of correct
    > crypto implemented by non-experts is very thin.
    >

    Concur strongly.

    > In your first message you mentioned cryptographic hashes such as
    > MD5 and SHA-1. While they're very unlikely to accidentally produce
    > a collision, in practice it's now quite feasible to deliberately
    > produce MD5 collisions, and SHA-1 may not be far behind. And they
    > are not fast - in fact, they're deliberately slow, to make it more
    > difficult to mount certain kinds of attacks.


    Crypto-hashes are not deliberately slow, and in fact modern
    (implementations of) them are nearly as fast as CRCs even though they
    (must!) "mix" the input data much "better". Where you want deliberate
    slowness, like in PKCS5 (and 12) and IIRC in password "stretching"
    protocols like SRP et al, you iterate the hash many times.

    Some crypto operations like RSA (which is already pretty slow)
    sometimes need to be implemented as constant-speed to prevent
    "side-channel" (power and timing) attacks, but no hashes AFAIK.
    Although, Bernstein has recently claimed a timing attack on a popular
    speedup for the new US standard symmetric cipher AES or Rijndael,
    although I am among those skeptical his prerequisites hold widely.

    - David.Thompson1 at worldnet.att.net
    Dave Thompson, Jan 23, 2006
    #6
  7. In article <>, Dave Thompson <> writes:
    > On 12 Jan 2006 18:13:31 GMT, (Michael Wojcik)
    > wrote:
    >
    > > In your first message you mentioned cryptographic hashes such as
    > > MD5 and SHA-1. While they're very unlikely to accidentally produce
    > > a collision, in practice it's now quite feasible to deliberately
    > > produce MD5 collisions, and SHA-1 may not be far behind. And they
    > > are not fast - in fact, they're deliberately slow, to make it more
    > > difficult to mount certain kinds of attacks.

    >
    > Crypto-hashes are not deliberately slow, and in fact modern
    > (implementations of) them are nearly as fast as CRCs even though they
    > (must!) "mix" the input data much "better". Where you want deliberate
    > slowness, like in PKCS5 (and 12) and IIRC in password "stretching"
    > protocols like SRP et al, you iterate the hash many times.


    Right, right. I was actually thinking of older digest algorithms
    like Snefru and MD2 that used iteration as a strengthening factor,
    and key expansion, and somehow generalized that into "digests are
    deliberately slow to increase the work factor of brute-force image
    searches", which clearly doesn't make much sense.

    > Some crypto operations like RSA (which is already pretty slow)
    > sometimes need to be implemented as constant-speed to prevent
    > "side-channel" (power and timing) attacks, but no hashes AFAIK.


    I imagine that if a digest used operations with variable power or
    time costs, side-channel attacks would be an issue; but since the
    data-dependent operations of MD5, SHA-*, and RIPE-MD (which represent
    most real-world commercial use) are bitwise operations or integer
    addition on fixed-sized quantitites, all general-purpose CPUs will
    perform them at constant time and dissipation.

    Digest functions built from block ciphers or asymmetric ciphers could
    be vulnerable to side-channel attacks if the underlying ciphers are,
    assuming the channel was modulated by the data being digested.

    But a side-channel attack against a hash probably has a smaller
    attack surface than one for encryption does; leaking some information
    about the preimage seems to be of limited use for most hashing
    applications, for example.

    Anyway, this has drifted off-topic for the group, though if the OP
    is still reading he may by now have gotten the impression that this
    is not a trivial do-it-yourself task.

    --
    Michael Wojcik

    There are some, who do not understand true enjoyment, will tell you that
    rules spoil convivial meetings, and that a merry company becomes a dull
    committee as soon as it is called a club. Do not believe them: the
    precedents are all against them. -- Arthur Ransome
    Michael Wojcik, Jan 23, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    9
    Views:
    17,056
    John Salerno
    May 8, 2006
  2. Charleees

    Need C# Coding for MD5 Algorithm...

    Charleees, May 29, 2006, in forum: Python
    Replies:
    4
    Views:
    4,724
    Edward Elliott
    May 30, 2006
  3. Replies:
    1
    Views:
    1,820
    Uncle Noah
    Jan 10, 2007
  4. rp
    Replies:
    1
    Views:
    491
    red floyd
    Nov 10, 2011
  5. Peter Woodsky

    create a md5 / md5 passwd with a salt

    Peter Woodsky, Nov 20, 2008, in forum: Ruby
    Replies:
    6
    Views:
    192
    Brian Candler
    Nov 21, 2008
Loading...

Share This Page