python hash function

Discussion in 'Python' started by Damien Morton, Jun 24, 2003.

  1. Ive been doing some investigation of python hashtables and the hash
    function used in Python.

    It turns out, that when running pystone.py, as much time is spent in
    string_hash() as is spent in lookdict_string(). If you look at
    python's hash function, shown below, you can get an idea why - a
    multiply is used for every character in the string.

    static long
    string_hash1(register char *p, int size)
    {
    register int len;
    register long x;

    len = size;
    x = *p << 7;
    while (--len >= 0)
    x = (100003*x) ^ *p++;
    x ^= size;
    return x;
    }

    Looking around the web I found the following hash function, which is
    much faster, and seems to distribute better than the python hash
    function.

    static long
    string_hash3(register char *p, register int size)
    {
    register long x;
    register char c;

    x = 0xd2d84a61;
    while (c = *p++)
    x ^= ( (x<<7) + c + (x>>5) );
    return x;
    }

    I also came up with a hash function of my own, which seems to
    distribute near-perfectly (in the sense of being comparable to
    assigning, as hashes, random numbers to unique strings) and be faster
    yet (at least, on my P4 box).

    static long
    string_hash6(register unsigned short *p, int size)
    {
    register short s;
    register unsigned long x = 0;

    if (size == 0) return 0;

    len = (size+1) >> 1;
    while (len--) {
    x += (x>>14) + (*p++ * 0xd2d84a61);
    }
    x += (x>>14) + (size*0xd2d84a61);

    return x;
    }

    Ive tested these functions by hashing a large set of strings generated
    by instrumenting the python interpeter to emit a string every time a
    string is added to a dictionary. These strings are hashed and thrown
    into the buckets of various sized tables. I then calculate sigma
    statistics (sum((observed-expected)^2)/(N-1)) for the number of items
    in the buckets of those tables.

    Im not sure what other tests to try out. Im hoping that someone on
    c.l.py has some experience in testing hash functions, and can suggest
    some additional tests and/or tweaks.
    Damien Morton, Jun 24, 2003
    #1
    1. Advertising

  2. Damien Morton

    John J. Lee Guest

    (Damien Morton) writes:

    > Ive been doing some investigation of python hashtables and the hash
    > function used in Python.

    [...]

    Have you had a look at the python-dev archives? Bound to be stuff
    there about this general area.


    John
    John J. Lee, Jun 24, 2003
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Red Orchid
    Replies:
    3
    Views:
    1,024
  2. Pieter Claassen
    Replies:
    1
    Views:
    1,093
    CBFalconer
    Aug 4, 2004
  3. Bo Peng
    Replies:
    4
    Views:
    779
  4. rp
    Replies:
    1
    Views:
    491
    red floyd
    Nov 10, 2011
  5. Srijayanth Sridhar
    Replies:
    19
    Views:
    595
    David A. Black
    Jul 2, 2008
Loading...

Share This Page