Checksum: Is this right or wrong ?

A

Abby

I need to do 8 bits 2's complement checksum, so I wrote code in order
to see if the checksum calculation is correct.

===========================================================================
#include <stdio.h>

int main(){

char data[4];
char resp;
data[0] = 0x01;
data[1] = 0x0E;
data[2] = 0x09;
data[3] = 0x0F;
data[4] = data[0] + data[1] + data[2] + data[3];
resp = data[4];
printf ("Checksum before Invert = %2.2X\n", resp);
resp = ~data[4];
printf ("Checksum after Invert = %2.2X\n", resp);
resp = resp + 1;
printf ("Checksum after Invert + 1 = %2.2X\n", resp);

return 0;

}

===========================================================================

The result I got from above code is

Checksum before Invert = 27
Checksum after Invert = FFFFFFD8
Checksum after Invert + 1 = FFFFFFD9

The correct result I need is D9. I don't understand why it said
FFFFFFD9 instead of just only D9. Please let me know what my problem
is here. Thanks a lot.
 
M

Mike Wahler

Abby said:
I need to do 8 bits 2's complement checksum, so I wrote code in order
to see if the checksum calculation is correct.
===========================================================================

It appears that your implementation's type 'char' is
signed (whether it is or not is implementation-defined).
#include <stdio.h>

int main(){

char data[4];

unsigned char data[4];
char resp;

unsigned char resp;

-Mike
 
C

Chris Torek

The operation of invert & add 1 is exactly what negating a number
does. So, you can actually combine the steps by doing something like
checksum = -checksum;

My understanding is that you can only do this if the data type is
signed. However, with GCC/Windows, it seems to work fine with an
unsigned variable. Not sure what the standard says on this. Help,
anyone?

In C, when x has type unsigned int, -x is just the "mathematically
correct" value of UINT_MAX + 1 - x.

Narrow types like "unsigned char" and "unsigned short" are widened
under "the usual arithmetic conversions", and -- in a fit of
stupidity -- these narrow unsigned types generally widen to *signed*
int, so using -x is in some ways dangerous, but probably no worse
than ~x + 1U in practice. You can avoid all possibilities of
undefined behavior by computing the sum in "unsigned int", though.
This is the code I used:

#include <stdio.h>

int main (void)
{
char data[4];
unsigned char a;
unsigned char b;
data[0] = 0x01;
data[1] = 0x0E;
data[2] = 0x09;
data[3] = 0x0F;

a = data[0] + data[1] + data[2] + data[3];

Note that the right hand side of the "=" operation consists
exclusively of signed items, and signed arithmetic can potentially
overflow. Changing the "data" array to "unsigned char" helps a
little bit but still leaves the arithmetic itself signed if unsigned
char widens to signed int, i.e., if UCHAR_MAX <= INT_MAX. (Whether
UCHAR_MAX <= INT_MAX is an implementation-specific detail, although
implementations on which it is false -- where UCHAR_MAX is say
65535, and INT_MAX is 32767 -- are rare. The X3J11 committee folks
had some specific situations with "surprising results" in mind when
they made up this particular widening rule, but now those situations
even depend on whether UCHAR_MAX and USHRT_MAX exceed INT_MAX,
making them unpredictable instead of "predictably surprising".
The *sane* choice, of having unsigned char widen to unsigned int
in all cases, would have been much more reasonable. Still, we
often must work with what we are given.)

In this particular case, the four values being summed are 1, 14, 9,
and 15, all of which are comfortably positive even in "char" and
quite definitely positive in "int"; the result is 39, which is
definitely smaller than all of CHAR_MAX, UCHAR_MAX, and INT_MAX.
(CHAR_MAX must be at least 127, UCHAR_MAX must be at least 255,
and INT_MAX must be at least 32767, in every C implementation.)
So this sets "a" to 39, stored in an object of type "unsigned char".
printf ("Before Invert = %2X\n", a);
a = -a;

Here "-a" widens "a" to int (if UCHAR_MAX <= INT_MAX) or to
unsigned int (if UCHAR_MAX > INT_MAX), and in either case, the
value continues to be 39. Then -a is either -39 (if the type
is signed int) or UINT_MAX + 1 - 39 (if the type is unsigned
int).

Consider the latter case for a moment; suppose UINT_MAX is 65535
-- i.e., we have 2-octet "char"s and 2-octet "int"s and sizeof(int)==1.
Then -(unsigned int)39, or -39U, is 65536 - 39, or 65497U, or
0xffd9U. In fact, we are guaranteed that UINT_MAX is one less than
some power of two, and is at least 65535, so the possible values
are 0xffd9U, 0x1ffd9U, 0x3ffd9U, 0x7ffd9U, 0xfffd9U, 0x1fffd9U,
0x3fffd9U, .... The key thing to note here is that *every bit
pattern ends in "d9"*. Mathematically, we can say that every one
of these "is congruent to 0xd9 mod 256". This is an absolutely
*wonderful* result, as it guarantees that we can get the answer we
want.

On the other hand, perhaps we have a more typical implementation
where UCHAR_MAX < INT_MAX, so that "-a" has type signed int. In
this case, becase we started with the known result of 39, the value
is definitely -39. Storing this back in a variable of type "unsigned
char" sets that variable to UCHAR_MAX + 1 - 39. This is quite
similar to what we just saw with the unsigned ints, except that
UCHAR_MAX is allowed to be as small as 255, so that the value stored
back into "a" is 0xd9, or perhaps 0x1d9, or possibly 0x3d9, or
0x7d9, 0xfd9, 0x1fd9, 3fd9, 7fd9, ffd9, 1ffd9, etc., ad nauseam.
The same "wonderful thing" has happened: the value is always
congruent to 0xd9 mod 256.

Thus, in this case, "a = -a" is fine, and the low 8 bits of the
result are always what we wanted. But note that all of this depends
on the assumption that "a" is initially 39. What if it has some
other value?

Suppose in particular that "a" had the value 32768 (remember that
"a" is "unsigned char" and UCHAR_MAX might be 65535, as is the case
on some C systems on digital signal processor CPUs). Suppose
further that INT_MAX is 32767 and UINT_MAX is 65535. Then unsigned
char widens to unsigned int, and we are OK. This particular
implementation forces the "unsigned widens to unsigned" case, so
that the error made back in the 1980s -- choosing the ridiculous
"unsigned widens to signed except when it doesn't" rule -- does
not affect us. Only if UCHAR_MAX is (say) 32767 will this system
have "unsigned char widens to signed int", and in that case, -a
can be no greater in magnitude than -32767, which still fits
(INT_MIN must also be at least 32767 in magnitude).

So where *can* things go wrong? As I implied earlier, the real
problem lies in the summation of a "data" array, which in
"real code" is probably going to do something like:

unsigned char data[N], sum;
int i;
... /* fill in the "data" array */
sum = 0;
for (i = 0; i < N; i++)
sum += data;

If sum is "unsigned char" and data is also "unsigned char", but
UCHAR_MAX is 32767 while INT_MAX is also 32767, then the arithmetic
happens in *signed* int, and we may attempt to compute (signed
int)32767 + (signed int)32767. This is allowed to produce a runtime
"overflow" trap and cause the code to abort. If we want to be
100% paranoid, we must write:

sum = (unsigned int)sum + (unsigned int)data;

which forces the arithmetic to be done in "unsigned" fashion,
with its nice mathematical properties.

In practice, real implementations do not have this problem, because
UCHAR_MAX is almost always 255 and INT_MAX guaranteed to be at
least 32767. In this case, the unsigned char values widen to signed
int, but the sum itself cannot exceed 510 (255+255), which certainly
fits in an int. Storing the result back into the "unsigned char"
in "sum" reduces it mod 256, making everything work on the next
trip through the loop. Things can only go wrong when UCHAR_MAX
(a) does not exceed INT_MAX, yet (b) is sufficiently close to
INT_MAX so that UCHAR_MAX + UCHAR_MAX > INT_MAX. Even then, the
implementation must do something unusual on integer overflow.

If, back in the 1980s, the X3J11 committee had used the sane rules
that the Bell Labs compilers used (i.e., "narrow unsigned types
widen to unsigned int"), even this remote possibility would be
ruled out. And as long as you only ever use 2s complement systems
that ignore signed integer overflow, none of this matters anyway.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top