unsigned 32 bit arithmetic type?

R

Robin Becker

Hi, just trying to avoid wheel reinvention. I have need of an unsigned 32 bit
arithmetic type to carry out a checksum operation and wondered if anyone had
already defined such a beast.

Our current code works with 32 bit cpu's, but is failing with 64 bit
comparisons; it's clearly wrong as we are comparing a number with a negated
number; the bits might drop off in 32 bits, but not in 64.
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Robin said:
Hi, just trying to avoid wheel reinvention. I have need of an unsigned
32 bit arithmetic type to carry out a checksum operation and wondered if
anyone had already defined such a beast.

Our current code works with 32 bit cpu's, but is failing with 64 bit
comparisons; it's clearly wrong as we are comparing a number with a
negated number; the bits might drop off in 32 bits, but not in 64.

Not sure what operations you are doing: In Python, bits never drop off
(at least not in recent versions).

If you need to drop bits, you need to do so explicitly, by using the
bit mask operations. I could tell you more if you'd tell us what
the specific operations are.

Regards,
Martin
 
R

Robin Becker

Martin said:
Not sure what operations you are doing: In Python, bits never drop off
(at least not in recent versions).

If you need to drop bits, you need to do so explicitly, by using the
bit mask operations. I could tell you more if you'd tell us what
the specific operations are.


This code is in a contribution to the reportlab toolkit that handles TTF fonts.
The fonts contain checksums computed using 32bit arithmetic. The original
Cdefintion is as follows

ULONG CalcTableChecksum(ULONG *Table, ULONG Length)
{
ULONG Sum = 0L;
ULONG *Endptr = Table+((Length+3) & ~3) / sizeof(ULONG);

while (Table < EndPtr)
Sum += *Table++;
return Sum;
}

so effectively we're doing only additions and letting bits roll off the end.

Of course the actual semantics is dependent on what C unsigned arithmetic does
so we're relying on that being the same everywhere.

This algorithm was pretty simple in Python until 2.3 when shifts over the end of
ints started going wrong. For some reason we didn't do the obvious and just do
everything in longs and just mask off the upper bits. For some reason (probably
my fault) we seem to have accumulated code like

def _L2U32(L):
'''convert a long to u32'''
return unpack('l',pack('L',L))[0]


if sys.hexversion>=0x02030000:
def add32(x, y):
"Calculate (x + y) modulo 2**32"
return _L2U32((long(x)+y) & 0xffffffffL)
else:
def add32(x, y):
"Calculate (x + y) modulo 2**32"
lo = (x & 0xFFFF) + (y & 0xFFFF)
hi = (x >> 16) + (y >> 16) + (lo >> 16)
return (hi << 16) | (lo & 0xFFFF)

def calcChecksum(data):
"""Calculates TTF-style checksums"""
if len(data)&3: data = data + (4-(len(data)&3))*"\0"
sum = 0
for n in unpack(">%dl" % (len(data)>>2), data):
sum = add32(sum,n)
return sum

and also silly stuff like

def testAdd32(self):
"Test add32"
self.assertEquals(add32(10, -6), 4)
self.assertEquals(add32(6, -10), -4)
self.assertEquals(add32(_L2U32(0x80000000L), -1), 0x7FFFFFFF)
self.assertEquals(add32(0x7FFFFFFF, 1), _L2U32(0x80000000L))

def testChecksum(self):
"Test calcChecksum function"
self.assertEquals(calcChecksum(""), 0)
self.assertEquals(calcChecksum("\1"), 0x01000000)
self.assertEquals(calcChecksum("\x01\x02\x03\x04\x10\x20\x30\x40"), 0x11223344)
self.assertEquals(calcChecksum("\x81"), _L2U32(0x81000000L))
_L2U32(0x80000000L))

where while it might be reasonable to do testing it seems the tests aren't very
sensible eg what is -6 doing in a u32 test? This stuff just about works on a 32
bit machine, but is failing miserably on a 64bit AMD. As far as I can see I just
need to use masked longs throughout.

In a C extension I can still do the computation exfficiently on a 32bit machine,
but I need to do masking for a 64 bit machine.
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Robin said:
Of course the actual semantics is dependent on what C unsigned
arithmetic does so we're relying on that being the same everywhere.

Assuming that ULONG has the same width on all systems, the outcome
is actually mandated by the C standard: unsigned arithmetic is
defined to operate modulo (max_uint+1) (even if that is not a power
of two).
This algorithm was pretty simple in Python until 2.3 when shifts over
the end of ints started going wrong.

Actually, they start going *right* :) Addition of two positive numbers
never gives a negative result, in mathematics.
where while it might be reasonable to do testing it seems the tests
aren't very sensible eg what is -6 doing in a u32 test? This stuff just
about works on a 32 bit machine, but is failing miserably on a 64bit
AMD. As far as I can see I just need to use masked longs throughout.
Exactly.

In a C extension I can still do the computation exfficiently on a 32bit
machine, but I need to do masking for a 64 bit machine.

Well, no. You just need to find a 32-bit unsigned integer type on the
64-bit machine. Typically, "unsigned int" should work fine (with
only the Cray being a notable exception, AFAIK). IOW, replace ULONG
with uint32_t wherever you really mean an unsigned 32-bit type,
then use stdint.h where available, else define it to unsigned int
(with a build-time or run-time test whether sizeof(unsigned int)==4).

Regards,
Martin
 
S

sturlamolden

Is this what you want?

import numpy
def CalcTableChecksum(Table, Length=None):
tmp = numpy.array(Table,dtype=numpy.uint32)
if Length == None: Length = tmp.size
endptr = ((Length+3) & ~3) / 4
return (tmp[0:endptr]).sum()





as nx
type(nx.array([1,2,3],dtype=nx.uint32)[0])


so effectively we're doing only additions and letting bits roll off the end.

Of course the actual semantics is dependent on what C unsigned arithmetic does
so we're relying on that being the same everywhere.

This algorithm was pretty simple in Python until 2.3 when shifts over the end of
ints started going wrong. For some reason we didn't do the obvious and just do
everything in longs and just mask off the upper bits. For some reason (probably
my fault) we seem to have accumulated code like

def _L2U32(L):
'''convert a long to u32'''
return unpack('l',pack('L',L))[0]


if sys.hexversion>=0x02030000:
def add32(x, y):
"Calculate (x + y) modulo 2**32"
return _L2U32((long(x)+y) & 0xffffffffL)
else:
def add32(x, y):
"Calculate (x + y) modulo 2**32"
lo = (x & 0xFFFF) + (y & 0xFFFF)
hi = (x >> 16) + (y >> 16) + (lo >> 16)
return (hi << 16) | (lo & 0xFFFF)

def calcChecksum(data):
"""Calculates TTF-style checksums"""
if len(data)&3: data = data + (4-(len(data)&3))*"\0"
sum = 0
for n in unpack(">%dl" % (len(data)>>2), data):
sum = add32(sum,n)
return sum

and also silly stuff like

def testAdd32(self):
"Test add32"
self.assertEquals(add32(10, -6), 4)
self.assertEquals(add32(6, -10), -4)
self.assertEquals(add32(_L2U32(0x80000000L), -1), 0x7FFFFFFF)
self.assertEquals(add32(0x7FFFFFFF, 1), _L2U32(0x80000000L))

def testChecksum(self):
"Test calcChecksum function"
self.assertEquals(calcChecksum(""), 0)
self.assertEquals(calcChecksum("\1"), 0x01000000)
self.assertEquals(calcChecksum("\x01\x02\x03\x04\x10\x20\x30\x40"), 0x11223344)
self.assertEquals(calcChecksum("\x81"), _L2U32(0x81000000L))
_L2U32(0x80000000L))

where while it might be reasonable to do testing it seems the tests aren't very
sensible eg what is -6 doing in a u32 test? This stuff just about works on a 32
bit machine, but is failing miserably on a 64bit AMD. As far as I can see I just
need to use masked longs throughout.

In a C extension I can still do the computation exfficiently on a 32bit machine,
but I need to do masking for a 64 bit machine.
 
S

sturlamolden

Is this what you want?

import numpy
def CalcTableChecksum(Table, Length=None):
tmp = numpy.array(Table,dtype=numpy.uint32)
if Length == None: Length = tmp.size
endptr = ((Length+3) & ~3) / 4
return (tmp[0:endptr]).sum()
 
R

Robin Becker

sturlamolden said:
import numpy
def CalcTableChecksum(Table, Length=None):
tmp = numpy.array(Table,dtype=numpy.uint32)
if Length == None: Length = tmp.size
endptr = ((Length+3) & ~3) / 4
return (tmp[0:endptr]).sum()
it's probably wonderful, but I don't think I can ask people to add numpy to the
list of requirements for reportlab :)

I used to love its predecessor Numeric, but it was quite large.
 
S

sturlamolden

Robin said:
it's probably wonderful, but I don't think I can ask people to add numpy to the
list of requirements for reportlab :)

Maybe NumPy makes it into the core Python tree one day. At some point
other Python users than die-hard scientists and mathematicans will
realise that for and while loops are the root of all evil when doing
CPU bound operations in an interpreted language. Array slicing and
vectorised statements can be faster by astronomical proportions.


Here is one example: http://tinyurl.com/y79zhc

This statement that required twenty seconds to execute

dim = size(infocbcr);

image = zeros(dim(1), dim(2));

for i = 1:dim(1)
for j = 1:dim(2)
cb = double(infocbcr(i,j,2));
cr = double(infocbcr(i,j,3));
x = [(cb-media_b); (cr-media_r)];
%this gives a mult of 1*2 * 2*2 * 2*1
image(i,j) = exp(-0.5* x'*inv(brcov)* x);
end
end

could be replaced with an equivalent condensed statement that only
required a fraction of a second:

image = reshape(exp(-0.5*sum(((chol(brcov)')\ ...
((reshape(double(infocbcr:),:,2:3)),dim(1)*dim(2),2)')...
-repmat([media_b;media_r],1,dim(1)*dim(2)))).^2)'),dim(1),dim(2));

This was Matlab, but the same holds for Python and NumPy. The overhead
in the first code sniplet comes from calling the interpreter inside a
tight loop. That is why loops are the root of evilness when doung CPU
bound tasks in an interpreted language. I would think that 9 out of 10
tasks most Python users think require a C extension is actually more
easily solved with NumPy. This is old knowledge from the Matlab
community: even if you think you need a "MEX file" (that is, a C
extension for Matlab), you probably don't. Vectorize and it will be
fast enough.
 
R

Robin Becker

sturlamolden said:
.........
This was Matlab, but the same holds for Python and NumPy. The overhead
in the first code sniplet comes from calling the interpreter inside a
tight loop. That is why loops are the root of evilness when doung CPU
bound tasks in an interpreted language. I would think that 9 out of 10
tasks most Python users think require a C extension is actually more
easily solved with NumPy. This is old knowledge from the Matlab
community: even if you think you need a "MEX file" (that is, a C
extension for Matlab), you probably don't. Vectorize and it will be
fast enough.

I think you're preaching to the converted. The very first serious thing I did in
python involved a generational accounting model calculation that was translated
from matlab into Numeric/python. It ran about 10 times faster than matlab and
about 5 times faster than a matlab compiler.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,161
Latest member
GertrudeMa
Top