# Hamming Distance

Discussion in 'Python' started by godavemon, Jun 20, 2008.

1. ### godavemonGuest

I need to calculate the Hamming Distance of two integers. The hamming
distance is the number of bits in two integers that don't match. I
thought there'd be a function in math or scipy but i haven't been able
to find one. This is my function but it seems like there should be a
faster way. I do this computation many times and speed up is
important.

def hamdist( a, b , bits = 32):
def _hamdist( x, bits):
if bits:
return (x & 1) + _hamdist(x >> 1, bits-1)
return x & 1
return _hamdist( a ^ b, bits)

Another alternative would be to convert the XOR to a binary string and
count the # of 1's.

Which would be fastest? Are there better alternatives?

Thanks!

godavemon, Jun 20, 2008

2. ### Raymond HettingerGuest

On Jun 19, 4:27 pm, godavemon <> wrote:
> I need to calculate the Hamming Distance of two integers.  The hamming
> distance is the number of bits in two integers that don't match.  I
> thought there'd be a function in math or scipy but i haven't been able
> to find one.  This is my function but it seems like there should be a
> faster way.  I do this computation many times and speed up is
> important.

The simplest speed-up is to break the inputs into n-length blocks and
then look them up in an n-by-n table.

(the latter is *very* expensive in Python).

The fastest way is to write a C function or use Pyrex.

Raymond

Raymond Hettinger, Jun 20, 2008

3. ### John MachinGuest

On Jun 20, 9:27 am, godavemon <> wrote:
> I need to calculate the Hamming Distance of two integers. The hamming
> distance is the number of bits in two integers that don't match. I
> thought there'd be a function in math or scipy but i haven't been able
> to find one. This is my function but it seems like there should be a
> faster way. I do this computation many times and speed up is
> important.
>
> def hamdist( a, b , bits = 32):
> def _hamdist( x, bits):
> if bits:
> return (x & 1) + _hamdist(x >> 1, bits-1)
> return x & 1
> return _hamdist( a ^ b, bits)
>
> Another alternative would be to convert the XOR to a binary string and

Isn't it a "binary string" already?

> count the # of 1's.

My guess is that counting the 1-bits in (a ^ b) would be hard to beat,
and that a recursive function is just not in the race.

>
> Which would be fastest?

Implement both and time them.

> Are there better alternatives?

I doubt it.

BTW one presumes that your integers are non-negative ...

HTH

Cheers,
John

John Machin, Jun 20, 2008
4. ### MensanatorGuest

On Jun 19, 6:27 pm, godavemon <> wrote:
> I need to calculate the Hamming Distance of two integers.  The hamming
> distance is the number of bits in two integers that don't match.  I
> thought there'd be a function in math or scipy but i haven't been able
> to find one.  This is my function but it seems like there should be a
> faster way.  I do this computation many times and speed up is
> important.
>
> def hamdist( a, b , bits = 32):
>     def _hamdist( x, bits):
>         if bits:
>             return (x & 1) + _hamdist(x >> 1, bits-1)
>         return x & 1
>     return _hamdist( a ^ b, bits)
>
> Another alternative would be to convert the XOR to a binary string and
> count the # of 1's.
>
> Which would be fastest?  Are there better alternatives?

Yep, use the gmpy module.

>>> import gmpy
>>> help(gmpy.hamdist)

Help on built-in function hamdist in module gmpy:
hamdist(...)
hamdist(x,y): returns the Hamming distance (number of bit-
positions
where the bits differ) between x and y. x and y must be mpz, or
else
get coerced to mpz.

>>> gmpy.hamdist(15,6)

2
>>> gmpy.hamdist(2**177149,10**53330)

61877
>>> gmpy.hamdist(2**177149-1,10**53330)

115285

>
> Thanks!

Mensanator, Jun 20, 2008
5. ### Robert KernGuest

godavemon wrote:
> I need to calculate the Hamming Distance of two integers. The hamming
> distance is the number of bits in two integers that don't match. I
> thought there'd be a function in math or scipy but i haven't been able
> to find one. This is my function but it seems like there should be a
> faster way. I do this computation many times and speed up is
> important.
>
> def hamdist( a, b , bits = 32):
> def _hamdist( x, bits):
> if bits:
> return (x & 1) + _hamdist(x >> 1, bits-1)
> return x & 1
> return _hamdist( a ^ b, bits)
>
>
> Another alternative would be to convert the XOR to a binary string and
> count the # of 1's.
>
> Which would be fastest? Are there better alternatives?

I think this works:

In [26]: hexbits = {'0': 0,
....: '1': 1,
....: '2': 1,
....: '3': 2,
....: '4': 1,
....: '5': 2,
....: '6': 2,
....: '7': 3,
....: '8': 1,
....: '9': 2,
....: 'A': 2,
....: 'B': 3,
....: 'C': 2,
....: 'D': 3,
....: 'E': 3,
....: 'F': 4}

In [28]: def hamming(a, b):
....: return sum(hexbits[h] for h in hex(a^b)[2:])
....:

In [29]: hamming(1,1)
Out[29]: 0

In [30]: hamming(1,2)
Out[30]: 2

In [31]: hamming(2,2)
Out[31]: 0

In [32]: hamming(2,3)
Out[32]: 1

This might be a wee faster, I haven't timed it:

sum(map(h.get, hex(a^b)[2:]))

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
an underlying truth."
-- Umberto Eco

Robert Kern, Jun 20, 2008
6. ### MatimusGuest

On Jun 19, 4:27 pm, godavemon <> wrote:
> I need to calculate the Hamming Distance of two integers.  The hamming
> distance is the number of bits in two integers that don't match.  I
> thought there'd be a function in math or scipy but i haven't been able
> to find one.  This is my function but it seems like there should be a
> faster way.  I do this computation many times and speed up is
> important.
>
> def hamdist( a, b , bits = 32):
>     def _hamdist( x, bits):
>         if bits:
>             return (x & 1) + _hamdist(x >> 1, bits-1)
>         return x & 1
>     return _hamdist( a ^ b, bits)
>
> Another alternative would be to convert the XOR to a binary string and
> count the # of 1's.
>
> Which would be fastest?  Are there better alternatives?
>
> Thanks!

I see no good reason to use recursion for this type of thing. Here are
some of my attempts:

Code (Text):

from math import log

def yours(a, b , bits = 32):
def _hamdist( x, bits):
if bits:
return (x & 1) + _hamdist(x >> 1, bits-1)
return x & 1
return _hamdist(a ^ b, bits)

def simple(a, b, bits=32):
x = a ^ b
return sum((x >> i & 1) for i in xrange(bits))

def lazy(a, b, bits=32):
x = (a ^ b) & ((1 << bits) - 1)
tot = 0
while x:
tot += x & 1
x >>= 1

def fancy(a, b, bits=32):
x = (a ^ b) & ((1 << bits) - 1)
tot = 0
while x:
tot += 1
x ^= 1 << int(log(x, 2))

test_vals = (
((0xffffffff, 0), 32),
((0,0), 0),
((1,0), 1),
((0x80000000, 0), 1),
((0x55555555, 0), 16)
)

def test(f):
test_vals = (
((0xffffffff, 0), 32), # ALL
((0,0), 0), # None
((1,0), 1), # First
((0x80000000, 0), 1), # Last
((0x55555555, 0), 16), # Every Other
((0xffff, 0), 16), # First Half
((0xffff0000, 0), 16), # Last Half
)
for i, (args, exp) in enumerate(test_vals):
if f(*args) != exp:
return 0
return 1

if __name__ == "__main__":
for f in (yours, simple, lazy, fancy):
if not test(f):
print "%s failed"%f.__name__

The python module `timeit` is handy for testing speed:

python -mtimeit -s"from hamdist import *" "test(yours)"
10000 loops, best of 3: 95.1 usec per loop

python -mtimeit -s"from hamdist import *" "test(simple)"
10000 loops, best of 3: 65.3 usec per loop

python -mtimeit -s"from hamdist import *" "test(lazy)"
10000 loops, best of 3: 59.8 usec per loop

python -mtimeit -s"from hamdist import *" "test(fancy)"
10000 loops, best of 3: 77.2 usec per loop

Even the ridiculous `fancy` version beat the recursive version.

Matt

Matimus, Jun 20, 2008
7. ### Raymond HettingerGuest

Non-recursive, 8-bit block table lookup version:

def ham(a, b, ht=[hamdist(a,0) for a in range(256)]):
x = a ^ b
dist = 0
while x:
dist += ht[x & 255]
x >>= 8
return dist

Raymond

Raymond Hettinger, Jun 20, 2008
8. ### godavemonGuest

Awesome! Thanks a lot.

On Jun 19, 5:00 pm, Mensanator <> wrote:
> On Jun 19, 6:27 pm, godavemon <> wrote:
>
>
>
> > I need to calculate the Hamming Distance of two integers. The hamming
> > distance is the number of bits in two integers that don't match. I
> > thought there'd be a function in math or scipy but i haven't been able
> > to find one. This is my function but it seems like there should be a
> > faster way. I do this computation many times and speed up is
> > important.

>
> > def hamdist( a, b , bits = 32):
> > def _hamdist( x, bits):
> > if bits:
> > return (x & 1) + _hamdist(x >> 1, bits-1)
> > return x & 1
> > return _hamdist( a ^ b, bits)

>
> > Another alternative would be to convert the XOR to a binary string and
> > count the # of 1's.

>
> > Which would be fastest? Are there better alternatives?

>
> Yep, use the gmpy module.
>
> >>> import gmpy
> >>> help(gmpy.hamdist)

>
> Help on built-in function hamdist in module gmpy:
> hamdist(...)
> hamdist(x,y): returns the Hamming distance (number of bit-
> positions
> where the bits differ) between x and y. x and y must be mpz, or
> else
> get coerced to mpz.
>
> >>> gmpy.hamdist(15,6)

> 2
> >>> gmpy.hamdist(2**177149,10**53330)

> 61877
> >>> gmpy.hamdist(2**177149-1,10**53330)

>
> 115285
>
>
>
> > Thanks!

godavemon, Jun 20, 2008
9. ### godavemonGuest

Great thanks!

On Jun 19, 5:37 pm, Matimus <> wrote:
> On Jun 19, 4:27 pm, godavemon <> wrote:
>
>
>
> > I need to calculate the Hamming Distance of two integers. The hamming
> > distance is the number of bits in two integers that don't match. I
> > thought there'd be a function in math or scipy but i haven't been able
> > to find one. This is my function but it seems like there should be a
> > faster way. I do this computation many times and speed up is
> > important.

>
> > def hamdist( a, b , bits = 32):
> > def _hamdist( x, bits):
> > if bits:
> > return (x & 1) + _hamdist(x >> 1, bits-1)
> > return x & 1
> > return _hamdist( a ^ b, bits)

>
> > Another alternative would be to convert the XOR to a binary string and
> > count the # of 1's.

>
> > Which would be fastest? Are there better alternatives?

>
> > Thanks!

>
> I see no good reason to use recursion for this type of thing. Here are
> some of my attempts:
>
>
Code (Text):

> from math import log
>
> def yours(a, b , bits = 32):
>      def _hamdist( x, bits):
>          if bits:
>              return (x & 1) + _hamdist(x >> 1, bits-1)
>          return x & 1
>      return _hamdist(a ^ b, bits)
>
> def simple(a, b, bits=32):
>     x = a ^ b
>     return sum((x >> i & 1) for i in xrange(bits))
>
> def lazy(a, b, bits=32):
>     x = (a ^ b) & ((1 << bits) - 1)
>     tot = 0
>     while x:
>         tot += x & 1
>         x >>= 1
>
> def fancy(a, b, bits=32):
>     x = (a ^ b) & ((1 << bits) - 1)
>     tot = 0
>     while x:
>         tot += 1
>         x ^= 1 << int(log(x, 2))
>
> test_vals = (
>         ((0xffffffff, 0), 32),
>         ((0,0), 0),
>         ((1,0), 1),
>         ((0x80000000, 0), 1),
>         ((0x55555555, 0), 16)
>         )
>
> def test(f):
>     test_vals = (
>             ((0xffffffff, 0), 32), # ALL
>             ((0,0), 0), # None
>             ((1,0), 1), # First
>             ((0x80000000, 0), 1), # Last
>             ((0x55555555, 0), 16), # Every Other
>             ((0xffff, 0), 16), # First Half
>             ((0xffff0000, 0), 16), # Last Half
>             )
>     for i, (args, exp) in enumerate(test_vals):
>         if f(*args) != exp:
>             return 0
>     return 1
>
> if __name__ == "__main__":
>     for f in (yours, simple, lazy, fancy):
>         if not test(f):
>             print "%s failed"%f.__name__
>
>
> The python module `timeit` is handy for testing speed:
>
> python -mtimeit -s"from hamdist import *" "test(yours)"
> 10000 loops, best of 3: 95.1 usec per loop
>
> python -mtimeit -s"from hamdist import *" "test(simple)"
> 10000 loops, best of 3: 65.3 usec per loop
>
> python -mtimeit -s"from hamdist import *" "test(lazy)"
> 10000 loops, best of 3: 59.8 usec per loop
>
> python -mtimeit -s"from hamdist import *" "test(fancy)"
> 10000 loops, best of 3: 77.2 usec per loop
>
> Even the ridiculous `fancy` version beat the recursive version.
>
> Matt

godavemon, Jun 20, 2008