Incorrect index computation

B

Boon

Hello everyone,

I've run across the following (IMHO incorrect) code.

const char TAB[96];
typedef unsigned char uint8;
static void foo(char *buf)
{
uint8 i=0;

while ((i < MAX_DIGITS) && (buf != 0))
{
if ((uint8)buf >= 0xA0)
buf = TAB[buf - 0xa0];
i++;
}
}

I have a problem with the index computation.

(Please correct me wherever I am wrong.)

NB : on my platform, plain char is signed, char is 8-bits wide.

Suppose buf = 0xCE, then (uint8)buf is indeed >= 0xA0

What is the type and value of the expression buf - 0xa0 ?
It might depend on my platform's representation of signed numbers?
(sign-and-magnitude, 1's complement, 2's complement)
http://en.wikipedia.org/wiki/Signed_number_representations

Assuming 2's complement, 0xCE represents -50
I think buf is promoted to an int
0xa0 is of type int (??) and has value 160
Thus the expression has type int and value -210, right?

Thus the code, in fact, performs TAB[-210] which means we've erred
into UB-land, right?

Thanks for any clarification.

Regards.
 
E

Eric Sosman

Boon said:
Hello everyone,

I've run across the following (IMHO incorrect) code.

const char TAB[96];
typedef unsigned char uint8;
static void foo(char *buf)
{
uint8 i=0;

while ((i < MAX_DIGITS) && (buf != 0))
{
if ((uint8)buf >= 0xA0)
buf = TAB[buf - 0xa0];
i++;
}
}

I have a problem with the index computation.

(Please correct me wherever I am wrong.)

NB : on my platform, plain char is signed, char is 8-bits wide.

Suppose buf = 0xCE, then (uint8)buf is indeed >= 0xA0

What is the type and value of the expression buf - 0xa0 ?
It might depend on my platform's representation of signed numbers?
(sign-and-magnitude, 1's complement, 2's complement)
http://en.wikipedia.org/wiki/Signed_number_representations

Assuming 2's complement, 0xCE represents -50
I think buf is promoted to an int
0xa0 is of type int (??) and has value 160
Thus the expression has type int and value -210, right?

Thus the code, in fact, performs TAB[-210] which means we've erred
into UB-land, right?


I think you've analyzed the problem correctly: On a system
where char is signed (and narrower than int), a "large" char
value will become a negative int on promotion, and this will
lead to a negative array index. Another cast fixes it:

buf = TAB[ (uint8)buf - 0xA0 ];

The issue of different schemes for representing negative
integers is, I think, a red herring. What matters is the value
of the promoted char: If it's less than 160, there's trouble
no matter what the machine is like.
 
B

Boon

Eric said:
Boon said:
I've run across the following (IMHO incorrect) code.

const char TAB[96];
typedef unsigned char uint8;
static void foo(char *buf)
{
uint8 i=0;

while ((i < MAX_DIGITS) && (buf != 0))
{
if ((uint8)buf >= 0xA0)
buf = TAB[buf - 0xa0];
i++;
}
}

I have a problem with the index computation.

(Please correct me wherever I am wrong.)

NB : on my platform, plain char is signed, char is 8-bits wide.

Suppose buf = 0xCE, then (uint8)buf is indeed >= 0xA0

What is the type and value of the expression buf - 0xa0 ?
It might depend on my platform's representation of signed numbers?
(sign-and-magnitude, 1's complement, 2's complement)
http://en.wikipedia.org/wiki/Signed_number_representations

Assuming 2's complement, 0xCE represents -50
I think buf is promoted to an int
0xa0 is of type int (??) and has value 160
Thus the expression has type int and value -210, right?

Thus the code, in fact, performs TAB[-210] which means we've erred
into UB-land, right?


I think you've analyzed the problem correctly: On a system
where char is signed (and narrower than int), a "large" char
value will become a negative int on promotion, and this will
lead to a negative array index. Another cast fixes it:

buf = TAB[ (uint8)buf - 0xA0 ];


This is indeed the obvious fix. But I was looking for a solution where I
don't have to cast buf every time.

To me, casting means lying to the compiler :)
The issue of different schemes for representing negative
integers is, I think, a red herring. What matters is the value
of the promoted char: If it's less than 160, there's trouble
no matter what the machine is like.

Here's what the code is supposed to do.

By the time we reach the function, buf contains only legal (printable)
ISO/IEC 8859-1 codes, that is
(0x20 <= c <= 0x7E) or (0xA0 <= c <= 0xFF)
http://en.wikipedia.org/wiki/ISO-8859-1#Codepage_layout

Most characters in the second range are not printable on my puny LCD.
Therefore these are mapped into characters from the first range.

In my opinion, foo's parameter should not be a 'char *' because we are
not dealing with "C strings". Instead, foo's parameter should be an
'unsigned char *' because buf really is an array of 8-bit "codes" which
are inherently unsigned quantities.

Do you agree?

Regards.
 
E

Eric Sosman

pete said:
Eric said:
[...]
I think you've analyzed the problem correctly: On a system
where char is signed (and narrower than int), a "large" char
value will become a negative int on promotion, and this will
lead to a negative array index. Another cast fixes it:

buf = TAB[ (uint8)buf - 0xA0 ];

The issue of different schemes for representing negative
integers is, I think, a red herring. What matters is the value
of the promoted char: If it's less than 160, there's trouble
no matter what the machine is like.


I can't follow what you wrote.

Integer values can't become negative from promotion.


Correct. But a char with a negative value will promote
to an int with a negative value, and then there'll be the
trouble you discovered. It doesn't matter whether the
negative char has a two's complement, ones' complement, or
signed magnitude representation: Negatives promote to
negatives regardless.
If the above (uint8) cast changes anything,
then it's because (buf) is originally a negative char value.


That's what I meant by "large," with the quotation
marks around the "large." The cast will leave all char
values in 0 .. CHAR_MAX unchanged, even if char is unsigned.
Negative char values (if there are any) will be "reflected"
by adding UCHAR_MAX+1, producing a non-negative result.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,276
Latest member
Sawatmakal

Latest Threads

Top