Hamming Distance

Discussion in 'Python' started by godavemon, Jun 20, 2008.

  1. godavemon

    godavemon Guest

    I need to calculate the Hamming Distance of two integers. The hamming
    distance is the number of bits in two integers that don't match. I
    thought there'd be a function in math or scipy but i haven't been able
    to find one. This is my function but it seems like there should be a
    faster way. I do this computation many times and speed up is
    important.

    def hamdist( a, b , bits = 32):
    def _hamdist( x, bits):
    if bits:
    return (x & 1) + _hamdist(x >> 1, bits-1)
    return x & 1
    return _hamdist( a ^ b, bits)


    Another alternative would be to convert the XOR to a binary string and
    count the # of 1's.

    Which would be fastest? Are there better alternatives?

    Thanks!
     
    godavemon, Jun 20, 2008
    #1
    1. Advertising

  2. On Jun 19, 4:27 pm, godavemon <> wrote:
    > I need to calculate the Hamming Distance of two integers.  The hamming
    > distance is the number of bits in two integers that don't match.  I
    > thought there'd be a function in math or scipy but i haven't been able
    > to find one.  This is my function but it seems like there should be a
    > faster way.  I do this computation many times and speed up is
    > important.


    The simplest speed-up is to break the inputs into n-length blocks and
    then look them up in an n-by-n table.

    Also, re-write your function to use iteration instead of recursion
    (the latter is *very* expensive in Python).

    The fastest way is to write a C function or use Pyrex.


    Raymond
     
    Raymond Hettinger, Jun 20, 2008
    #2
    1. Advertising

  3. godavemon

    John Machin Guest

    On Jun 20, 9:27 am, godavemon <> wrote:
    > I need to calculate the Hamming Distance of two integers. The hamming
    > distance is the number of bits in two integers that don't match. I
    > thought there'd be a function in math or scipy but i haven't been able
    > to find one. This is my function but it seems like there should be a
    > faster way. I do this computation many times and speed up is
    > important.
    >
    > def hamdist( a, b , bits = 32):
    > def _hamdist( x, bits):
    > if bits:
    > return (x & 1) + _hamdist(x >> 1, bits-1)
    > return x & 1
    > return _hamdist( a ^ b, bits)
    >
    > Another alternative would be to convert the XOR to a binary string and


    Isn't it a "binary string" already?

    > count the # of 1's.


    My guess is that counting the 1-bits in (a ^ b) would be hard to beat,
    and that a recursive function is just not in the race.

    >
    > Which would be fastest?


    Implement both and time them.

    > Are there better alternatives?


    I doubt it.

    BTW one presumes that your integers are non-negative ...

    HTH

    Cheers,
    John
     
    John Machin, Jun 20, 2008
    #3
  4. godavemon

    Mensanator Guest

    On Jun 19, 6:27 pm, godavemon <> wrote:
    > I need to calculate the Hamming Distance of two integers.  The hamming
    > distance is the number of bits in two integers that don't match.  I
    > thought there'd be a function in math or scipy but i haven't been able
    > to find one.  This is my function but it seems like there should be a
    > faster way.  I do this computation many times and speed up is
    > important.
    >
    > def hamdist( a, b , bits = 32):
    >     def _hamdist( x, bits):
    >         if bits:
    >             return (x & 1) + _hamdist(x >> 1, bits-1)
    >         return x & 1
    >     return _hamdist( a ^ b, bits)
    >
    > Another alternative would be to convert the XOR to a binary string and
    > count the # of 1's.
    >
    > Which would be fastest?  Are there better alternatives?


    Yep, use the gmpy module.

    >>> import gmpy
    >>> help(gmpy.hamdist)

    Help on built-in function hamdist in module gmpy:
    hamdist(...)
    hamdist(x,y): returns the Hamming distance (number of bit-
    positions
    where the bits differ) between x and y. x and y must be mpz, or
    else
    get coerced to mpz.

    >>> gmpy.hamdist(15,6)

    2
    >>> gmpy.hamdist(2**177149,10**53330)

    61877
    >>> gmpy.hamdist(2**177149-1,10**53330)

    115285


    >
    > Thanks!
     
    Mensanator, Jun 20, 2008
    #4
  5. godavemon

    Robert Kern Guest

    godavemon wrote:
    > I need to calculate the Hamming Distance of two integers. The hamming
    > distance is the number of bits in two integers that don't match. I
    > thought there'd be a function in math or scipy but i haven't been able
    > to find one. This is my function but it seems like there should be a
    > faster way. I do this computation many times and speed up is
    > important.
    >
    > def hamdist( a, b , bits = 32):
    > def _hamdist( x, bits):
    > if bits:
    > return (x & 1) + _hamdist(x >> 1, bits-1)
    > return x & 1
    > return _hamdist( a ^ b, bits)
    >
    >
    > Another alternative would be to convert the XOR to a binary string and
    > count the # of 1's.
    >
    > Which would be fastest? Are there better alternatives?


    I think this works:

    In [26]: hexbits = {'0': 0,
    ....: '1': 1,
    ....: '2': 1,
    ....: '3': 2,
    ....: '4': 1,
    ....: '5': 2,
    ....: '6': 2,
    ....: '7': 3,
    ....: '8': 1,
    ....: '9': 2,
    ....: 'A': 2,
    ....: 'B': 3,
    ....: 'C': 2,
    ....: 'D': 3,
    ....: 'E': 3,
    ....: 'F': 4}

    In [28]: def hamming(a, b):
    ....: return sum(hexbits[h] for h in hex(a^b)[2:])
    ....:

    In [29]: hamming(1,1)
    Out[29]: 0

    In [30]: hamming(1,2)
    Out[30]: 2

    In [31]: hamming(2,2)
    Out[31]: 0

    In [32]: hamming(2,3)
    Out[32]: 1


    This might be a wee faster, I haven't timed it:

    sum(map(h.get, hex(a^b)[2:]))

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
     
    Robert Kern, Jun 20, 2008
    #5
  6. godavemon

    Matimus Guest

    On Jun 19, 4:27 pm, godavemon <> wrote:
    > I need to calculate the Hamming Distance of two integers.  The hamming
    > distance is the number of bits in two integers that don't match.  I
    > thought there'd be a function in math or scipy but i haven't been able
    > to find one.  This is my function but it seems like there should be a
    > faster way.  I do this computation many times and speed up is
    > important.
    >
    > def hamdist( a, b , bits = 32):
    >     def _hamdist( x, bits):
    >         if bits:
    >             return (x & 1) + _hamdist(x >> 1, bits-1)
    >         return x & 1
    >     return _hamdist( a ^ b, bits)
    >
    > Another alternative would be to convert the XOR to a binary string and
    > count the # of 1's.
    >
    > Which would be fastest?  Are there better alternatives?
    >
    > Thanks!


    I see no good reason to use recursion for this type of thing. Here are
    some of my attempts:

    Code:
    from math import log
    
    def yours(a, b , bits = 32):
         def _hamdist( x, bits):
             if bits:
                 return (x & 1) + _hamdist(x >> 1, bits-1)
             return x & 1
         return _hamdist(a ^ b, bits)
    
    
    def simple(a, b, bits=32):
        x = a ^ b
        return sum((x >> i & 1) for i in xrange(bits))
    
    def lazy(a, b, bits=32):
        x = (a ^ b) & ((1 << bits) - 1)
        tot = 0
        while x:
            tot += x & 1
            x >>= 1
        return tot
    
    def fancy(a, b, bits=32):
        x = (a ^ b) & ((1 << bits) - 1)
        tot = 0
        while x:
            tot += 1
            x ^= 1 << int(log(x, 2))
        return tot
    
    test_vals = (
            ((0xffffffff, 0), 32),
            ((0,0), 0),
            ((1,0), 1),
            ((0x80000000, 0), 1),
            ((0x55555555, 0), 16)
            )
    
    def test(f):
        test_vals = (
                ((0xffffffff, 0), 32), # ALL
                ((0,0), 0), # None
                ((1,0), 1), # First
                ((0x80000000, 0), 1), # Last
                ((0x55555555, 0), 16), # Every Other
                ((0xffff, 0), 16), # First Half
                ((0xffff0000, 0), 16), # Last Half
                )
        for i, (args, exp) in enumerate(test_vals):
            if f(*args) != exp:
                return 0
        return 1
    
    if __name__ == "__main__":
        for f in (yours, simple, lazy, fancy):
            if not test(f):
                print "%s failed"%f.__name__
    
    The python module `timeit` is handy for testing speed:

    python -mtimeit -s"from hamdist import *" "test(yours)"
    10000 loops, best of 3: 95.1 usec per loop

    python -mtimeit -s"from hamdist import *" "test(simple)"
    10000 loops, best of 3: 65.3 usec per loop

    python -mtimeit -s"from hamdist import *" "test(lazy)"
    10000 loops, best of 3: 59.8 usec per loop

    python -mtimeit -s"from hamdist import *" "test(fancy)"
    10000 loops, best of 3: 77.2 usec per loop

    Even the ridiculous `fancy` version beat the recursive version.

    Matt
     
    Matimus, Jun 20, 2008
    #6
  7. Non-recursive, 8-bit block table lookup version:

    def ham(a, b, ht=[hamdist(a,0) for a in range(256)]):
    x = a ^ b
    dist = 0
    while x:
    dist += ht[x & 255]
    x >>= 8
    return dist

    Raymond
     
    Raymond Hettinger, Jun 20, 2008
    #7
  8. godavemon

    godavemon Guest

    Awesome! Thanks a lot.

    On Jun 19, 5:00 pm, Mensanator <> wrote:
    > On Jun 19, 6:27 pm, godavemon <> wrote:
    >
    >
    >
    > > I need to calculate the Hamming Distance of two integers. The hamming
    > > distance is the number of bits in two integers that don't match. I
    > > thought there'd be a function in math or scipy but i haven't been able
    > > to find one. This is my function but it seems like there should be a
    > > faster way. I do this computation many times and speed up is
    > > important.

    >
    > > def hamdist( a, b , bits = 32):
    > > def _hamdist( x, bits):
    > > if bits:
    > > return (x & 1) + _hamdist(x >> 1, bits-1)
    > > return x & 1
    > > return _hamdist( a ^ b, bits)

    >
    > > Another alternative would be to convert the XOR to a binary string and
    > > count the # of 1's.

    >
    > > Which would be fastest? Are there better alternatives?

    >
    > Yep, use the gmpy module.
    >
    > >>> import gmpy
    > >>> help(gmpy.hamdist)

    >
    > Help on built-in function hamdist in module gmpy:
    > hamdist(...)
    > hamdist(x,y): returns the Hamming distance (number of bit-
    > positions
    > where the bits differ) between x and y. x and y must be mpz, or
    > else
    > get coerced to mpz.
    >
    > >>> gmpy.hamdist(15,6)

    > 2
    > >>> gmpy.hamdist(2**177149,10**53330)

    > 61877
    > >>> gmpy.hamdist(2**177149-1,10**53330)

    >
    > 115285
    >
    >
    >
    > > Thanks!
     
    godavemon, Jun 20, 2008
    #8
  9. godavemon

    godavemon Guest

    Great thanks!

    On Jun 19, 5:37 pm, Matimus <> wrote:
    > On Jun 19, 4:27 pm, godavemon <> wrote:
    >
    >
    >
    > > I need to calculate the Hamming Distance of two integers. The hamming
    > > distance is the number of bits in two integers that don't match. I
    > > thought there'd be a function in math or scipy but i haven't been able
    > > to find one. This is my function but it seems like there should be a
    > > faster way. I do this computation many times and speed up is
    > > important.

    >
    > > def hamdist( a, b , bits = 32):
    > > def _hamdist( x, bits):
    > > if bits:
    > > return (x & 1) + _hamdist(x >> 1, bits-1)
    > > return x & 1
    > > return _hamdist( a ^ b, bits)

    >
    > > Another alternative would be to convert the XOR to a binary string and
    > > count the # of 1's.

    >
    > > Which would be fastest? Are there better alternatives?

    >
    > > Thanks!

    >
    > I see no good reason to use recursion for this type of thing. Here are
    > some of my attempts:
    >
    >
    Code:
    > from math import log
    >
    > def yours(a, b , bits = 32):
    >      def _hamdist( x, bits):
    >          if bits:
    >              return (x & 1) + _hamdist(x >> 1, bits-1)
    >          return x & 1
    >      return _hamdist(a ^ b, bits)
    >
    > def simple(a, b, bits=32):
    >     x = a ^ b
    >     return sum((x >> i & 1) for i in xrange(bits))
    >
    > def lazy(a, b, bits=32):
    >     x = (a ^ b) & ((1 << bits) - 1)
    >     tot = 0
    >     while x:
    >         tot += x & 1
    >         x >>= 1
    >     return tot
    >
    > def fancy(a, b, bits=32):
    >     x = (a ^ b) & ((1 << bits) - 1)
    >     tot = 0
    >     while x:
    >         tot += 1
    >         x ^= 1 << int(log(x, 2))
    >     return tot
    >
    > test_vals = (
    >         ((0xffffffff, 0), 32),
    >         ((0,0), 0),
    >         ((1,0), 1),
    >         ((0x80000000, 0), 1),
    >         ((0x55555555, 0), 16)
    >         )
    >
    > def test(f):
    >     test_vals = (
    >             ((0xffffffff, 0), 32), # ALL
    >             ((0,0), 0), # None
    >             ((1,0), 1), # First
    >             ((0x80000000, 0), 1), # Last
    >             ((0x55555555, 0), 16), # Every Other
    >             ((0xffff, 0), 16), # First Half
    >             ((0xffff0000, 0), 16), # Last Half
    >             )
    >     for i, (args, exp) in enumerate(test_vals):
    >         if f(*args) != exp:
    >             return 0
    >     return 1
    >
    > if __name__ == "__main__":
    >     for f in (yours, simple, lazy, fancy):
    >         if not test(f):
    >             print "%s failed"%f.__name__
    > 
    >
    > The python module `timeit` is handy for testing speed:
    >
    > python -mtimeit -s"from hamdist import *" "test(yours)"
    > 10000 loops, best of 3: 95.1 usec per loop
    >
    > python -mtimeit -s"from hamdist import *" "test(simple)"
    > 10000 loops, best of 3: 65.3 usec per loop
    >
    > python -mtimeit -s"from hamdist import *" "test(lazy)"
    > 10000 loops, best of 3: 59.8 usec per loop
    >
    > python -mtimeit -s"from hamdist import *" "test(fancy)"
    > 10000 loops, best of 3: 77.2 usec per loop
    >
    > Even the ridiculous `fancy` version beat the recursive version.
    >
    > Matt
     
    godavemon, Jun 20, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Niv

    Hamming distance

    Niv, Jan 19, 2006, in forum: VHDL
    Replies:
    9
    Views:
    3,750
    adyshon
    Jan 19, 2011
  2. LabWINC

    Low Pass Hamming filter design

    LabWINC, Mar 30, 2006, in forum: Python
    Replies:
    2
    Views:
    1,410
    Robert Kern
    Mar 31, 2006
  3. jaysome

    C Unleashed Chapter 18: Hamming Codes

    jaysome, Dec 18, 2007, in forum: C Programming
    Replies:
    2
    Views:
    496
    Richard Heathfield
    Dec 18, 2007
  4. Hamming Distance

    , Mar 16, 2010, in forum: Java
    Replies:
    7
    Views:
    831
    Roedy Green
    Mar 17, 2010
  5. Martin Hansen

    Hamming Distance

    Martin Hansen, Mar 7, 2011, in forum: Ruby
    Replies:
    4
    Views:
    426
Loading...

Share This Page