Incorrect index computation

Boon · Mar 27, 2009

Hello everyone,

I've run across the following (IMHO incorrect) code.

const char TAB[96];
typedef unsigned char uint8;
static void foo(char *buf)
{
uint8 i=0;

while ((i < MAX_DIGITS) && (buf != 0))
{
if ((uint8)buf >= 0xA0)
buf = TAB[buf - 0xa0];
i++;
}
}

I have a problem with the index computation.

(Please correct me wherever I am wrong.)

NB : on my platform, plain char is signed, char is 8-bits wide.

Suppose buf = 0xCE, then (uint8)buf is indeed >= 0xA0

What is the type and value of the expression buf - 0xa0 ?
It might depend on my platform's representation of signed numbers?
(sign-and-magnitude, 1's complement, 2's complement)
http://en.wikipedia.org/wiki/Signed_number_representations

Assuming 2's complement, 0xCE represents -50
I think buf is promoted to an int
0xa0 is of type int (??) and has value 160
Thus the expression has type int and value -210, right?

Thus the code, in fact, performs TAB[-210] which means we've erred
into UB-land, right?

Thanks for any clarification.

Regards.

Eric Sosman · Mar 27, 2009

Boon said:
Hello everyone,

I've run across the following (IMHO incorrect) code.

const char TAB[96];
typedef unsigned char uint8;
static void foo(char *buf)
{
uint8 i=0;

while ((i < MAX_DIGITS) && (buf != 0))
{
if ((uint8)buf >= 0xA0)
buf = TAB[buf - 0xa0];
i++;
}
}

I have a problem with the index computation.

(Please correct me wherever I am wrong.)

NB : on my platform, plain char is signed, char is 8-bits wide.

Suppose buf = 0xCE, then (uint8)buf is indeed >= 0xA0

What is the type and value of the expression buf - 0xa0 ?
It might depend on my platform's representation of signed numbers?
(sign-and-magnitude, 1's complement, 2's complement)
http://en.wikipedia.org/wiki/Signed_number_representations

Assuming 2's complement, 0xCE represents -50
I think buf is promoted to an int
0xa0 is of type int (??) and has value 160
Thus the expression has type int and value -210, right?

Thus the code, in fact, performs TAB[-210] which means we've erred
into UB-land, right?

I think you've analyzed the problem correctly: On a system
where char is signed (and narrower than int), a "large" char
value will become a negative int on promotion, and this will
lead to a negative array index. Another cast fixes it:

buf = TAB[ (uint8)buf - 0xA0 ];

The issue of different schemes for representing negative
integers is, I think, a red herring. What matters is the value
of the promoted char: If it's less than 160, there's trouble
no matter what the machine is like.

Boon · Mar 27, 2009

Eric said:
Boon said:

I've run across the following (IMHO incorrect) code.

const char TAB[96];
typedef unsigned char uint8;
static void foo(char *buf)
{
uint8 i=0;

while ((i < MAX_DIGITS) && (buf != 0))
{
if ((uint8)buf >= 0xA0)
buf = TAB[buf - 0xa0];
i++;
}
}

I have a problem with the index computation.

(Please correct me wherever I am wrong.)

NB : on my platform, plain char is signed, char is 8-bits wide.

Suppose buf = 0xCE, then (uint8)buf is indeed >= 0xA0

What is the type and value of the expression buf - 0xa0 ?
It might depend on my platform's representation of signed numbers?
(sign-and-magnitude, 1's complement, 2's complement)
http://en.wikipedia.org/wiki/Signed_number_representations

Assuming 2's complement, 0xCE represents -50
I think buf is promoted to an int
0xa0 is of type int (??) and has value 160
Thus the expression has type int and value -210, right?

Thus the code, in fact, performs TAB[-210] which means we've erred
into UB-land, right?

Click to expand...

I think you've analyzed the problem correctly: On a system
where char is signed (and narrower than int), a "large" char
value will become a negative int on promotion, and this will
lead to a negative array index. Another cast fixes it:

buf = TAB[ (uint8)buf - 0xA0 ];

This is indeed the obvious fix. But I was looking for a solution where I
don't have to cast buf every time.

To me, casting means lying to the compiler

The issue of different schemes for representing negative
integers is, I think, a red herring. What matters is the value
of the promoted char: If it's less than 160, there's trouble
no matter what the machine is like.

Click to expand...

Here's what the code is supposed to do.

By the time we reach the function, buf contains only legal (printable)
ISO/IEC 8859-1 codes, that is
(0x20 <= c <= 0x7E) or (0xA0 <= c <= 0xFF)
http://en.wikipedia.org/wiki/ISO-8859-1#Codepage_layout

Most characters in the second range are not printable on my puny LCD.
Therefore these are mapped into characters from the first range.

In my opinion, foo's parameter should not be a 'char *' because we are
not dealing with "C strings". Instead, foo's parameter should be an
'unsigned char *' because buf really is an array of 8-bit "codes" which
are inherently unsigned quantities.

Do you agree?

Regards.

Eric Sosman · Mar 28, 2009

pete said:
Eric said:

[...]
I think you've analyzed the problem correctly: On a system
where char is signed (and narrower than int), a "large" char
value will become a negative int on promotion, and this will
lead to a negative array index. Another cast fixes it:

buf = TAB[ (uint8)buf - 0xA0 ];

The issue of different schemes for representing negative
integers is, I think, a red herring. What matters is the value
of the promoted char: If it's less than 160, there's trouble
no matter what the machine is like.

Click to expand...

I can't follow what you wrote.

Integer values can't become negative from promotion.

Correct. But a char with a negative value will promote
to an int with a negative value, and then there'll be the
trouble you discovered. It doesn't matter whether the
negative char has a two's complement, ones' complement, or
signed magnitude representation: Negatives promote to
negatives regardless.

If the above (uint8) cast changes anything,
then it's because (buf) is originally a negative char value.

Click to expand...

That's what I meant by "large," with the quotation
marks around the "large." The cast will leave all char
values in 0 .. CHAR_MAX unchanged, even if char is unsigned.
Negative char values (if there are any) will be "reflected"
by adding UCHAR_MAX+1, producing a non-negative result.

Two's Complement, Serialization, etc.	10	Sep 9, 2009
unsigned char	13	Oct 24, 2006
printing bits ... the right way	2	Apr 1, 2010
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 1, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 15, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Dec 15, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Feb 1, 2008

Incorrect index computation

Boon

Eric Sosman

Boon

Eric Sosman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads