Consider the following program:
#include <string.h>
#include <stdlib.h>
const char cc[] =3D "hello";
int f(const unsigned char * p)
{
unsigned n =3D 0;
for(; n < strlen(cc); ++n)
{
if( *(p+n) !=3D (unsigned char)cc[n] )
{
return EXIT_FAILURE;
}
}
return EXIT_SUCCESS;
}
int main(void)
{
return f( (unsigned char*)cc );
}
Am I correct to assume that this program may fail on an implementation
where plain char is signed and has padding bits?
I'm referring to C99 TC2 here. I've been looking through the standard
for a while, and I thought I was able to prove that your conversion
was valid, except I'm hung up on one thing. Here's what I've got so
far:
6.2.6.2/1 states: "For unsigned integer types other than unsigned
char, the bits of the object
representation shall be divided into two groups: value bits and
padding bits (there need
not be any of the latter). If there are N value bits, each bit shall
represent a different power of 2 between 1 and 2N-1"
I take this to mean that unsigned char can not have padding bits.
6.2.6.2/2 states: "For signed integer types, the bits of the object
representation shall be divided into three groups: value bits, padding
bits, and the sign bit. There need not be any padding bits; there
shall be exactly one sign bit. Each bit that is a value bit shall have
the same value as the same bit in the object representation of the
corresponding unsigned type (if there are M value bits in the signed
type and N in the unsigned type, then M =98 N)."
On its own, this implies that signed char may have padding bits, but
that the value bits in a signed char must have the same value as the
corresponding value bits in an unsigned char. If an unsigned char can
not have padding bits, this implies that a signed char can *only* have
padding bits if the missing values from the padding bits are accounted
for elsewhere under the assumption that a signed char can not have any
continuity gaps in its value range. E.g. say you have a traditional 8-
bit unsigned char, and a signed char where the value '16' bit was
padding (here, each bit's value is 2^N where N is the number I've
placed in that bit's position, a '+' indicates the sign bit, a '.'
indicates padding):
unsigned char: 76543210 (8 bits total)
signed char: +65.3210 (8 bits total)
If signed char can have range gaps, then this representation is
allowable. If signed char can not have range gaps, then the only way
to compensate is to have signed char occupy more bits than an unsigned
char (placing the missing value bit in the extra position):
unsigned char: 76543210 (8 bits total)
signed char: 4+65.3210 (9 bits total)
However, this is not possible because of the constraint in 6.2.6.2/2
-- the signed char can not have any value bits that an unsigned char
does not have.
By the way, I'm not sure that signed char can't have range gaps in it.
I can't find anywhere in the standard that explicitly states that all
values representable by a given integer type must be contiguous.
However, there is strong evidence that suggests that it can't. For
example, if the representation of a signed char was
"+65.3210" (missing 2^4 bit), then the following would be undefined:
signed char c =3D 16;
I do not think the standard intends for that behavior to be allowable.
Another strong piece of evidence is 6.2.5/3, which states:
"An object declared as type char is large enough to store any member
of the basic execution character set. If a member of the basic
execution character set is stored in a char object, its value is
guaranteed to be nonnegative."
On a system where a char is a signed char, this implies that signed
char at least can't be missing value bits that are required to
represent characters in the basic execution character set.
Anyways, this is where I'm stuck. The above is not enough to show the
conversion is always valid because of this case: Consider the case
where the 2^6 bit is missing from signed char (assume a 2's-complement
representation):
signed char: +.543210 (8 bits total)
In this case, all value bits correspond to unsigned char value bits,
and there are no continuity gaps in its range, it simply has a smaller
range than an unsigned char. On such a system, your conversion would
break. Does anybody know if this representation is possible?
So what I have so far is:
- Unsigned char can not have padding bits.
- Signed char value bits must have same values as unsigned char
value bits in corresponding location.
- Signed char *probably* can't have gaps in its range (not sure, but
seems likely).
Therefore, whether or not your conversion is safe seems to rely
entirely on whether or not "+.543210" is a valid representation of a
signed char on a system where unsigned char is "76543210". If its
valid, then your conversion may not be defined. If its invalid, then
it shows that signed char simply can not have padding bits, period.
Somebody else needs to fill in the missing info here, I've been
staring at it too long.