Difficulty of passing char[] to function accepting unsigned char*

T

Triple-DES

Consider the following program:

#include <string.h>
#include <stdlib.h>

const char cc[] = "hello";

int f(const unsigned char * p)
{
unsigned n = 0;
for(; n < strlen(cc); ++n)
{
if( *(p+n) != (unsigned char)cc[n] )
{
return EXIT_FAILURE;
}
}
return EXIT_SUCCESS;
}

int main(void)
{

return f( (unsigned char*)cc );
}

Am I correct to assume that this program may fail on an implementation
where plain char is signed and has padding bits?

If so, is it possible to avoid copying all of cc into an array of
unsigned char, and passing that to f (and still get the correct
behaviour)?
 
V

vippstar

Consider the following program:

#include <string.h>
#include <stdlib.h>

const char cc[] = "hello";

int f(const unsigned char * p)
{
  unsigned n = 0;
  for(; n < strlen(cc); ++n)
  {
    if( *(p+n) != (unsigned char)cc[n] )
    {
      return EXIT_FAILURE;
    }
  }
  return EXIT_SUCCESS;

}

int main(void)
{

  return f(  (unsigned char*)cc );

}

Am I correct to assume that this program may fail on an implementation
where plain char is signed and has padding bits?

No you're not. All characters of the basic character set have positive
values in the range (0, SCHAR_MAX] and they can correctly be
represented by char. You have another problem. cc is const, and you
cast it to non-const. I think that's UB, but I never really understood
the rule.
If so, is it possible to avoid copying all of cc into an array of
unsigned char, and passing that to f (and still get the correct
behaviour)?

You don't have to do any of that.
 
J

James Kuyper

Consider the following program:

#include <string.h>
#include <stdlib.h>

const char cc[] = "hello";

int f(const unsigned char * p)
{
unsigned n = 0;
for(; n < strlen(cc); ++n)
{
if( *(p+n) != (unsigned char)cc[n] )
{
return EXIT_FAILURE;
}
}
return EXIT_SUCCESS;

}

int main(void)
{

return f( (unsigned char*)cc );

}
....
represented by char. You have another problem. cc is const, and you
cast it to non-const. I think that's UB, but I never really understood
the rule.

The cast itself does not have UB. Since 'cc' was defined as const, any
attempt to change it's contents (which would require some such cast)
would have UB. However, this particular program makes no such attempt,
so that's not a problem.
 
J

JC

Consider the following program:

#include <string.h>
#include <stdlib.h>

const char cc[] = "hello";

int f(const unsigned char * p)
{
unsigned n = 0;
for(; n < strlen(cc); ++n)
{
if( *(p+n) != (unsigned char)cc[n] )
{
return EXIT_FAILURE;
}
}
return EXIT_SUCCESS;

}

int main(void)
{

return f( (unsigned char*)cc );

}

Am I correct to assume that this program may fail on an implementation
where plain char is signed and has padding bits?


I'm referring to C99 TC2 here. I've been looking through the standard
for a while, and I thought I was able to prove that your conversion
was valid, except I'm hung up on one thing. Here's what I've got so
far:

6.2.6.2/1 states: "For unsigned integer types other than unsigned
char, the bits of the object
representation shall be divided into two groups: value bits and
padding bits (there need
not be any of the latter). If there are N value bits, each bit shall
represent a different power of 2 between 1 and 2N-1"

I take this to mean that unsigned char can not have padding bits.

6.2.6.2/2 states: "For signed integer types, the bits of the object
representation shall be divided into three groups: value bits, padding
bits, and the sign bit. There need not be any padding bits; there
shall be exactly one sign bit. Each bit that is a value bit shall have
the same value as the same bit in the object representation of the
corresponding unsigned type (if there are M value bits in the signed
type and N in the unsigned type, then M ˜ N)."

On its own, this implies that signed char may have padding bits, but
that the value bits in a signed char must have the same value as the
corresponding value bits in an unsigned char. If an unsigned char can
not have padding bits, this implies that a signed char can *only* have
padding bits if the missing values from the padding bits are accounted
for elsewhere under the assumption that a signed char can not have any
continuity gaps in its value range. E.g. say you have a traditional 8-
bit unsigned char, and a signed char where the value '16' bit was
padding (here, each bit's value is 2^N where N is the number I've
placed in that bit's position, a '+' indicates the sign bit, a '.'
indicates padding):

unsigned char: 76543210 (8 bits total)
signed char: +65.3210 (8 bits total)

If signed char can have range gaps, then this representation is
allowable. If signed char can not have range gaps, then the only way
to compensate is to have signed char occupy more bits than an unsigned
char (placing the missing value bit in the extra position):

unsigned char: 76543210 (8 bits total)
signed char: 4+65.3210 (9 bits total)

However, this is not possible because of the constraint in 6.2.6.2/2
-- the signed char can not have any value bits that an unsigned char
does not have.

By the way, I'm not sure that signed char can't have range gaps in it.
I can't find anywhere in the standard that explicitly states that all
values representable by a given integer type must be contiguous.
However, there is strong evidence that suggests that it can't. For
example, if the representation of a signed char was
"+65.3210" (missing 2^4 bit), then the following would be undefined:

signed char c = 16;

I do not think the standard intends for that behavior to be allowable.
Another strong piece of evidence is 6.2.5/3, which states:

"An object declared as type char is large enough to store any member
of the basic execution character set. If a member of the basic
execution character set is stored in a char object, its value is
guaranteed to be nonnegative."

On a system where a char is a signed char, this implies that signed
char at least can't be missing value bits that are required to
represent characters in the basic execution character set.

Anyways, this is where I'm stuck. The above is not enough to show the
conversion is always valid because of this case: Consider the case
where the 2^6 bit is missing from signed char (assume a 2's-complement
representation):

signed char: +.543210 (8 bits total)

In this case, all value bits correspond to unsigned char value bits,
and there are no continuity gaps in its range, it simply has a smaller
range than an unsigned char. On such a system, your conversion would
break. Does anybody know if this representation is possible?


So what I have so far is:

- Unsigned char can not have padding bits.
- Signed char value bits must have same values as unsigned char
value bits in corresponding location.
- Signed char *probably* can't have gaps in its range (not sure, but
seems likely).

Therefore, whether or not your conversion is safe seems to rely
entirely on whether or not "+.543210" is a valid representation of a
signed char on a system where unsigned char is "76543210". If its
valid, then your conversion may not be defined. If its invalid, then
it shows that signed char simply can not have padding bits, period.

Somebody else needs to fill in the missing info here, I've been
staring at it too long.


Jason
 
J

JC

[snip]
6.2.6.2/1 states: "For unsigned integer types other than unsigned
char, the bits of the object
representation shall be divided into two groups: value bits and
padding bits (there need
not be any of the latter). If there are N value bits, each bit shall
represent a different power of 2 between 1 and 2N-1"
[snip]

  unsigned char: 76543210 (8 bits total)
  signed char: 4+65.3210 (9 bits total)

However, this is not possible because of the constraint in 6.2.6.2/2
-- the signed char can not have any value bits that an unsigned char
does not have.

Oh, by the way, there's a reason I quoted all of 6.2.6.2/1 (the part
about "each bit shall represent a different power of 2") that got lost
in an edit.

That implies that all bits of an unsigned char must have a unique
value, so in the above example, it's also not possible for unsigned
char to have a duplicate 2^4 bit, i.e. this is not valid:

signed char: 4+65.3210 (9 bits total)
unsigned char: 476543210 (9 bits total, duplicate 2^4 bit)

Since that does satisfy the other constraint that the corresponding
value bits have the same value.

Jason
 
J

JC

Triple-DES said:
Consider the following program:
#include <string.h>
#include <stdlib.h>
const char cc[] = "hello";
int f(const unsigned char * p)
{
  unsigned n = 0;
  for(; n < strlen(cc); ++n)
  {
    if( *(p+n) != (unsigned char)cc[n] )
    {
      return EXIT_FAILURE;
    }
  }
  return EXIT_SUCCESS;
}
int main(void)
{
  return f(  (unsigned char*)cc );
}
Am I correct to assume that this program may fail on an implementation
where plain char is signed and has padding bits?

No.
The letters of "hello" all have positive values.

This does not matter. If a signed char had a padding bit, the value of
that bit would not affect its value. When accessed via a pointer cast,
the concern is that the padding bit could map to a value bit of an
unsigned char, and thus produce incorrect values when accessed as an
unsigned char. The question is: Can signed char have padding bits in
positions that correspond to value bits in an unsigned char?

Change
     int f(const unsigned char * p)
to
     int f(const char *p)

The function f() is meant to test an assumption. Chaning it, of
course, defeats the purpose of the test. If "assert(i < 0)" fails,
removing the assert() is not a solution to the problem.


Jason
 
J

JC

Consider the following program:
#include <string.h>
#include <stdlib.h>
const char cc[] = "hello";
int f(const unsigned char * p)
{
  unsigned n = 0;
  for(; n < strlen(cc); ++n)
  {
    if( *(p+n) != (unsigned char)cc[n] )
    {
      return EXIT_FAILURE;
    }
  }
  return EXIT_SUCCESS;

int main(void)
{
  return f(  (unsigned char*)cc );

Am I correct to assume that this program may fail on an implementation
where plain char is signed and has padding bits?

No you're not. All characters of the basic character set have positive
values in the range (0, SCHAR_MAX] and they can correctly be
represented by char.

But can signed char have padding bits in positions that are value bits
in an unsigned char? If so, then those padding bits don't affect the
value of the signed char but would undesirably affect the value when
mapped to an unsigned char via a pointer cast.


Jason
 
V

vippstar

Consider the following program:
#include <string.h>
#include <stdlib.h>
const char cc[] = "hello";
int f(const unsigned char * p)
{
  unsigned n = 0;
  for(; n < strlen(cc); ++n)
  {
    if( *(p+n) != (unsigned char)cc[n] )
    {
      return EXIT_FAILURE;
    }
  }
  return EXIT_SUCCESS;
}
int main(void)
{
  return f(  (unsigned char*)cc );
}
Am I correct to assume that this program may fail on an implementation
where plain char is signed and has padding bits?
No you're not. All characters of the basic character set have positive
values in the range (0, SCHAR_MAX] and they can correctly be
represented by char.

But can signed char have padding bits in positions that are value bits
in an unsigned char? If so, then those padding bits don't affect the
value of the signed char but would undesirably affect the value when
mapped to an unsigned char via a pointer cast.

A cast is not a reinterprentation of the object representation.
 
J

JC

Am I correct to assume that this program may fail on an implementation
where plain char is signed and has padding bits?
No you're not. All characters of the basic character set have positive
values in the range (0, SCHAR_MAX] and they can correctly be
represented by char.
But can signed char have padding bits in positions that are value bits
in an unsigned char? If so, then those padding bits don't affect the
value of the signed char but would undesirably affect the value when
mapped to an unsigned char via a pointer cast.

A cast is not a reinterprentation of the object representation.

It seems you misread his example. The following is, in fact, a
reinterpretation of the object representation:

signed char c = 0;
unsigned char u = *(unsigned char *)&c;

What you are referring to is just a cast of the value:

signed char c = 0;
unsigned char u = (unsigned char)c;

That's distinctly different than what Triple-DES was asking about.

Jason
 
V

vippstar

On Jan 6, 8:34 am, (e-mail address removed) wrote:
Consider the following program:
#include <string.h>
#include <stdlib.h>
const char cc[] = "hello";
int f(const unsigned char * p)
{
  unsigned n = 0;
  for(; n < strlen(cc); ++n)
  {
    if( *(p+n) != (unsigned char)cc[n] )
    {
      return EXIT_FAILURE;
    }
  }
  return EXIT_SUCCESS;
}
int main(void)
{
  return f(  (unsigned char*)cc );
}
Am I correct to assume that this program may fail on an implementation
where plain char is signed and has padding bits?
No you're not. All characters of the basic character set have positive
values in the range (0, SCHAR_MAX] and they can correctly be
represented by char.
But can signed char have padding bits in positions that are value bits
in an unsigned char? If so, then those padding bits don't affect the
value of the signed char but would undesirably affect the value when
mapped to an unsigned char via a pointer cast.

A cast is not a reinterprentation of the object representation.

Hmm I'm sorry I reread what you said and I misunderstood the first
time it seems.
I hadn't thought of that. I had doubts about 'signed char' being able
to have padding bits though.
I think you're right, signed char can't have padding bits.
 
J

JC

On Jan 6, 8:34 am, (e-mail address removed) wrote:
Consider the following program:
#include <string.h>
#include <stdlib.h>
const char cc[] = "hello";
int f(const unsigned char * p)
{
  unsigned n = 0;
  for(; n < strlen(cc); ++n)
  {
    if( *(p+n) != (unsigned char)cc[n] )
    {
      return EXIT_FAILURE;
    }
  }
  return EXIT_SUCCESS;
}
int main(void)
{
  return f(  (unsigned char*)cc );
}
Am I correct to assume that this program may fail on an implementation
where plain char is signed and has padding bits?
No you're not. All characters of the basic character set have positive
values in the range (0, SCHAR_MAX] and they can correctly be
represented by char.
But can signed char have padding bits in positions that are value bits
in an unsigned char? If so, then those padding bits don't affect the
value of the signed char but would undesirably affect the value when
mapped to an unsigned char via a pointer cast.
A cast is not a reinterprentation of the object representation.
Hmm I'm sorry I reread what you said and I misunderstood the first
time it seems.
I hadn't thought of that. I had doubts about 'signed char' being able
to have padding bits though.
I think you're right, signed char can't have padding bits.

Since C99, all signed integer types, including signed char, may have
padding bits and trap representations.

Are you sure signed char can have padding bits?

In any case, the standard at least seems to imply a constraint on
which bits may be padding bits in a signed char -- if bit position N
is a padding bit in a signed char, and value V is the value of bit N
in an unsigned char, then all bit positions with value > V in an
unsigned char must also be padding bits in a signed char.

This is a consequence of a few things the standard defines:

- Unsigned char does not have padding bits (C99 6.2.6.2/1).
- All value bits in signed char have same value as corresponding bit
position in unsigned char (C99 6.2.6.2/2).
- Integer value ranges do not have continuity gaps (not explicitly
stated but there is strong evidence for this, see my initial reply to
Triple-DES).

That means that whether or not signed char can have padding bits
depends entirely on whether or not the total number of non-padding
bits in a signed char (sign bit + value bits) must equal the total
number of non-padding bits in an unsigned char (value bits). That's
the part I'm still not clear on.

Does the standard mandate (or otherwise indirectly constrain) that
signed char and unsigned char have the same number of non-padding bits
(in other words, that SCHAR_MIN + UCHAR_MAX == SCHAR_MAX)? If so,
Triple-DES's conversion is guaranteed to succeed. If not, then the
conversion may fail (on systems with signed chars with padding bits).


Jason
Jason
 
T

Triple-DES

Does the standard mandate (or otherwise indirectly constrain) that
signed char and unsigned char have the same number of non-padding bits
(in other words, that SCHAR_MIN + UCHAR_MAX == SCHAR_MAX)? If so,
Triple-DES's conversion is guaranteed to succeed. If not, then the
conversion may fail (on systems with signed chars with padding bits).

Hi, see my answer elsewhere for an example where the conversion
fails.
 
T

Triple-DES

[snip lengthy discussion of padding bits for (signed) char]
So what I have so far is:

  - Unsigned char can not have padding bits.
  - Signed char value bits must have same values as unsigned char
value bits in corresponding location.
  - Signed char *probably* can't have gaps in its range (not sure, but
seems likely).

Consider a system where CHAR_BIT is 16. Per 5.2.4.2.1/2 UCHAR_MAX is
65535. But SCHAR_MIN and SCHAR_MAX could still be -127 / 127, leaving
room for 8 padding bits.

In this case, if you read this value into an unsigned char using e.g a
pointer conversion or memcpy, the behaviour is still well-defined
according to 6.2.6.1/5, but the resulting unsigned char will hold an
implementation-defined value.

On the other hand, explicitly or implicitly converting the char value
to unsigned char correctly yields the corresponding unsigned value.
 
T

Tim Rentsch

JC said:
Triple-DES said:
Consider the following program:
#include <string.h>
#include <stdlib.h>
const char cc[] =3D "hello";
int f(const unsigned char * p)
{
=A0 unsigned n =3D 0;
=A0 for(; n < strlen(cc); ++n)
=A0 {
=A0 =A0 if( *(p+n) !=3D (unsigned char)cc[n] )
=A0 =A0 {
=A0 =A0 =A0 return EXIT_FAILURE;
=A0 =A0 }
=A0 }
=A0 return EXIT_SUCCESS;
}
int main(void)
{
=A0 return f( =A0(unsigned char*)cc );
}
Am I correct to assume that this program may fail on an implementation
where plain char is signed and has padding bits?

No.
The letters of "hello" all have positive values.

This does not matter. If a signed char had a padding bit, the value of
that bit would not affect its value. When accessed via a pointer cast,
the concern is that the padding bit could map to a value bit of an
unsigned char, and thus produce incorrect values when accessed as an
unsigned char. The question is: Can signed char have padding bits in
positions that correspond to value bits in an unsigned char?

Do you have a copy of the Standard? If you don't, try googling for
n1256, which should get you pretty easily to the latest revision of
C99 standard. Download it now! :)

In answer to your question - the answer is yes, but that's not the end
of the story. There is this sentence in 6.2.6.2 p 5, on the
representation of integer types -

A valid (non-trap) object representation of a signed integer type
where the sign bit is zero is a valid object representation of the
corresponding unsigned type, and shall represent the same value.

So if (signed char) has padding bits (and by definition they must be
value bits in (unsigned char), which has no padding bits), then the
values of the padding bits must be zero in any valid (signed char)
value where the sign bit is zero.
 
T

Tim Rentsch

JC said:
Consider the following program:

#include <string.h>
#include <stdlib.h>

const char cc[] =3D "hello";

int f(const unsigned char * p)
{
unsigned n =3D 0;
for(; n < strlen(cc); ++n)
{
if( *(p+n) !=3D (unsigned char)cc[n] )
{
return EXIT_FAILURE;
}
}
return EXIT_SUCCESS;

}

int main(void)
{

return f( (unsigned char*)cc );

}

Am I correct to assume that this program may fail on an implementation
where plain char is signed and has padding bits?


I'm referring to C99 TC2 here. I've been looking through the standard
for a while, and I thought I was able to prove that your conversion
was valid, except I'm hung up on one thing. Here's what I've got so
far:

6.2.6.2/1 states: "For unsigned integer types other than unsigned
char, the bits of the object
representation shall be divided into two groups: value bits and
padding bits (there need
not be any of the latter). If there are N value bits, each bit shall
represent a different power of 2 between 1 and 2N-1"

I take this to mean that unsigned char can not have padding bits.

6.2.6.2/2 states: "For signed integer types, the bits of the object
representation shall be divided into three groups: value bits, padding
bits, and the sign bit. There need not be any padding bits; there
shall be exactly one sign bit. Each bit that is a value bit shall have
the same value as the same bit in the object representation of the
corresponding unsigned type (if there are M value bits in the signed
type and N in the unsigned type, then M =98 N)."

On its own, this implies that signed char may have padding bits, but
that the value bits in a signed char must have the same value as the
corresponding value bits in an unsigned char. If an unsigned char can
not have padding bits, this implies that a signed char can *only* have
padding bits if the missing values from the padding bits are accounted
for elsewhere under the assumption that a signed char can not have any
continuity gaps in its value range. E.g. say you have a traditional 8-
bit unsigned char, and a signed char where the value '16' bit was
padding (here, each bit's value is 2^N where N is the number I've
placed in that bit's position, a '+' indicates the sign bit, a '.'
indicates padding):

unsigned char: 76543210 (8 bits total)
signed char: +65.3210 (8 bits total)

If signed char can have range gaps, then this representation is
allowable. If signed char can not have range gaps, then the only way
to compensate is to have signed char occupy more bits than an unsigned
char (placing the missing value bit in the extra position):

unsigned char: 76543210 (8 bits total)
signed char: 4+65.3210 (9 bits total)

However, this is not possible because of the constraint in 6.2.6.2/2
-- the signed char can not have any value bits that an unsigned char
does not have.

By the way, I'm not sure that signed char can't have range gaps in it.
I can't find anywhere in the standard that explicitly states that all
values representable by a given integer type must be contiguous.
However, there is strong evidence that suggests that it can't. For
example, if the representation of a signed char was
"+65.3210" (missing 2^4 bit), then the following would be undefined:

signed char c =3D 16;

I do not think the standard intends for that behavior to be allowable.
Another strong piece of evidence is 6.2.5/3, which states:

"An object declared as type char is large enough to store any member
of the basic execution character set. If a member of the basic
execution character set is stored in a char object, its value is
guaranteed to be nonnegative."

On a system where a char is a signed char, this implies that signed
char at least can't be missing value bits that are required to
represent characters in the basic execution character set.

Anyways, this is where I'm stuck. The above is not enough to show the
conversion is always valid because of this case: Consider the case
where the 2^6 bit is missing from signed char (assume a 2's-complement
representation):

signed char: +.543210 (8 bits total)

In this case, all value bits correspond to unsigned char value bits,
and there are no continuity gaps in its range, it simply has a smaller
range than an unsigned char. On such a system, your conversion would
break. Does anybody know if this representation is possible?


So what I have so far is:

- Unsigned char can not have padding bits.
- Signed char value bits must have same values as unsigned char
value bits in corresponding location.
- Signed char *probably* can't have gaps in its range (not sure, but
seems likely).

Therefore, whether or not your conversion is safe seems to rely
entirely on whether or not "+.543210" is a valid representation of a
signed char on a system where unsigned char is "76543210". If its
valid, then your conversion may not be defined. If its invalid, then
it shows that signed char simply can not have padding bits, period.

Somebody else needs to fill in the missing info here, I've been
staring at it too long.

Again, what you're missing is 6.2.6.2 p 5, which guarantees that a
(signed char) object can be interpreted as an (unsigned char) object
as long as the (signed char) value doesn't have the sign bit set.
 
S

s.dhilipkumar

JC said:
Consider the following program:
#include <string.h>
#include <stdlib.h>
const char cc[] =3D "hello";
int f(const unsigned char * p)
{
unsigned n =3D 0;
for(; n < strlen(cc); ++n)
{
if( *(p+n) !=3D (unsigned char)cc[n] )
{
return EXIT_FAILURE;
}
}
return EXIT_SUCCESS;
}
int main(void)
{
return f( (unsigned char*)cc );
}
Am I correct to assume that this program may fail on an implementation
where plain char is signed and has padding bits?
I'm referring to C99 TC2 here. I've been looking through the standard
for a while, and I thought I was able to prove that your conversion
was valid, except I'm hung up on one thing. Here's what I've got so
far:
6.2.6.2/1 states: "For unsigned integer types other than unsigned
char, the bits of the object
representation shall be divided into two groups: value bits and
padding bits (there need
not be any of the latter). If there are N value bits, each bit shall
represent a different power of 2 between 1 and 2N-1"
I take this to mean that unsigned char can not have padding bits.
6.2.6.2/2 states: "For signed integer types, the bits of the object
representation shall be divided into three groups: value bits, padding
bits, and the sign bit. There need not be any padding bits; there
shall be exactly one sign bit. Each bit that is a value bit shall have
the same value as the same bit in the object representation of the
corresponding unsigned type (if there are M value bits in the signed
type and N in the unsigned type, then M =98 N)."
On its own, this implies that signed char may have padding bits, but
that the value bits in a signed char must have the same value as the
corresponding value bits in an unsigned char. If an unsigned char can
not have padding bits, this implies that a signed char can *only* have
padding bits if the missing values from the padding bits are accounted
for elsewhere under the assumption that a signed char can not have any
continuity gaps in its value range. E.g. say you have a traditional 8-
bit unsigned char, and a signed char where the value '16' bit was
padding (here, each bit's value is 2^N where N is the number I've
placed in that bit's position, a '+' indicates the sign bit, a '.'
indicates padding):
unsigned char: 76543210 (8 bits total)
signed char: +65.3210 (8 bits total)
If signed char can have range gaps, then this representation is
allowable. If signed char can not have range gaps, then the only way
to compensate is to have signed char occupy more bits than an unsigned
char (placing the missing value bit in the extra position):
unsigned char: 76543210 (8 bits total)
signed char: 4+65.3210 (9 bits total)
However, this is not possible because of the constraint in 6.2.6.2/2
-- the signed char can not have any value bits that an unsigned char
does not have.
By the way, I'm not sure that signed char can't have range gaps in it.
I can't find anywhere in the standard that explicitly states that all
values representable by a given integer type must be contiguous.
However, there is strong evidence that suggests that it can't. For
example, if the representation of a signed char was
"+65.3210" (missing 2^4 bit), then the following would be undefined:
signed char c =3D 16;
I do not think the standard intends for that behavior to be allowable.
Another strong piece of evidence is 6.2.5/3, which states:
"An object declared as type char is large enough to store any member
of the basic execution character set. If a member of the basic
execution character set is stored in a char object, its value is
guaranteed to be nonnegative."
On a system where a char is a signed char, this implies that signed
char at least can't be missing value bits that are required to
represent characters in the basic execution character set.
Anyways, this is where I'm stuck. The above is not enough to show the
conversion is always valid because of this case: Consider the case
where the 2^6 bit is missing from signed char (assume a 2's-complement
representation):
signed char: +.543210 (8 bits total)
In this case, all value bits correspond to unsigned char value bits,
and there are no continuity gaps in its range, it simply has a smaller
range than an unsigned char. On such a system, your conversion would
break. Does anybody know if this representation is possible?
So what I have so far is:
- Unsigned char can not have padding bits.
- Signed char value bits must have same values as unsigned char
value bits in corresponding location.
- Signed char *probably* can't have gaps in its range (not sure, but
seems likely).
Therefore, whether or not your conversion is safe seems to rely
entirely on whether or not "+.543210" is a valid representation of a
signed char on a system where unsigned char is "76543210". If its
valid, then your conversion may not be defined. If its invalid, then
it shows that signed char simply can not have padding bits, period.
Somebody else needs to fill in the missing info here, I've been
staring at it too long.

Again, what you're missing is 6.2.6.2 p 5, which guarantees that a
(signed char) object can be interpreted as an (unsigned char) object
as long as the (signed char) value doesn't have the sign bit set.

to be on the safe side i would put something like this if performance
is not an issue, isn't this compiler loader linker independent?

#include <string.h>
#include <stdlib.h>

const char cc[] = "hello";

int f(const unsigned char * p)
{
unsigned n = 0;
unsigned char nc[2]={0};
for(; n < strlen(cc); n++)
{

sprintf( &(nc),"%c",cc[n]);

if( *(p + n) != nc[0] )
{
return EXIT_FAILURE;
}

}

return EXIT_SUCCESS;

}

int main(void)
{
return f( (unsigned char*)cc );
}
 
T

Triple-DES

In answer to your question - the answer is yes, but that's not the end
of the story.  There is this sentence in 6.2.6.2 p 5, on the
representation of integer types -

    A valid (non-trap) object representation of a signed integer type
    where the sign bit is zero is a valid object representation of the
    corresponding unsigned type, and shall represent the same value.

So if (signed char) has padding bits (and by definition they must be
value bits in (unsigned char), which has no padding bits), then the
values of the padding bits must be zero in any valid (signed char)
value where the sign bit is zero.

You're correct. This implies that given a valid non-negative char c,
it is always true that:
(unsigned char)c == *(unsigned char*)&c

But is it really guaranteed that all the members of the basic
execution character set be non-negative? In C++ this is explicitly
stated in 2.2/3, but I couldn't find such a guarantee in the C
standard.
 
J

James Kuyper

Triple-DES wrote:
....
But is it really guaranteed that all the members of the basic
execution character set be non-negative? In C++ this is explicitly
stated in 2.2/3, but I couldn't find such a guarantee in the C
standard.

6.2.5p3: "If a member of the basic execution character set is stored in
a char object, its value is guaranteed to be nonnegative."
 
T

Tim Rentsch

Han from China - Master Troll said:
Tim said:
In answer to your question - the answer is yes, but that's not the end
of the story. There is this sentence in 6.2.6.2 p 5, on the
representation of integer types -

A valid (non-trap) object representation of a signed integer type
where the sign bit is zero is a valid object representation of the
corresponding unsigned type, and shall represent the same value.

[Paraphrasing - may signed types have "holes" in the values
that they hold, with padding bits in the signed types
corresponding to "small" value bits in the unsigned types.]

Depending on how the question is meant, the answer is either "No" or
"Maybe, but at least mostly No".

If the question is meant in a "does this make sense?" way, it seems
clear the answer is No. It's easy to find text in the Standard that
suggests contiguous ranges are required, for example the word
"overflow" which is used in several places; if the authors had
anticipated non-contiguous ranges then different language would have
been used in all these places.

If the question is meant in a "what happens if statements are read
literally, or in more of a language lawyer mode?" way, the answer is
No for any signed type no bigger than int. This happens because of
the guaranteed relationships between value ranges of types with
different conversion ranks, and because bit-fields don't have padding
bits. With bit-fields being legal up to the width of an int, int and
all signed types larger than int must have all the values that fit in
an unsigned integer type of (width - 1) bits.

I haven't found any specific language that would rule out
discontiguous ranges above INT_MAX for signed types larger than int.
I have a vague memory at some point in the past of finding a statement
or statements in the Standard that clearly ruled out discontiguous
ranges for integer types, including types larger than int, but
I haven't been able to find it (again?).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top