char * signedness

P

Pietro Cerutti

Hi group,
is it always safe to pass unsigned char * variables as parameters to
functions accepting char * arguments?

For instance, I have to compare two unsigned char * strings.
Can I safely use strcmp? Do I need to cast the two strings to char *?

Thank you
 
H

Harald van =?UTF-8?B?RMSzaw==?=

Pietro said:
Hi group,
is it always safe to pass unsigned char * variables as parameters to
functions accepting char * arguments?

For the standard library functions, yes, because while they take char *
arguments, they convert it to unsigned char * internally anyway.
For instance, I have to compare two unsigned char * strings.
Can I safely use strcmp?
Yes.

Do I need to cast the two strings to char *?

You need to convert them to char *. You do not necessarily need a cast for
that; you could use an implicit convertion from unsigned char * to void *,
and then another implicit convertion from void * to char *. In this case, a
cast would be a good idea though.
 
P

Pietro Cerutti

Harald said:
For the standard library functions, yes, because while they take char *
arguments, they convert it to unsigned char * internally anyway.

This is not true for the implementation of strncmp on my system, which is:

/*** BEGIN STRNCMP ON FREEBSD ***/
int
strncmp(s1, s2, n)
const char *s1, *s2;
size_t n;
{

if (n == 0)
return (0);
do {
if (*s1 != *s2++)
return (*(const unsigned char *)s1 -
*(const unsigned char *)(s2 - 1));
if (*s1++ == 0)
break;
} while (--n != 0);
return (0);
}
/*** END STRNCMP ON FREEBSD ***/

I think I'm missing something about chars and/or implicit conversions.

Could you please explain the output of the following program to me?
The two chars c[0] and d[0] have different values (220 and -36), are not
equal (the comparison operator returns 0) but the two strings c and d
are equal to strncmp (which returns 0) and represent the same string to
printf ("ü").

/*** BEGIN DUMMY TEST PROGRAM ***/
#include <stdio.h>
#include <string.h>

int main(void)
{
unsigned char c[2];
char d[2];

c[0] = 220; c[1] = '\0';
d[0] = c[0]; d[1] = '\0';

printf("c is %s\n", c);
printf("d is %s\n", d);
printf("c[0] is %02x\n", c[0]);
printf("d[0] is %02x\n", d[0]);
printf("c[0] == d[0] is %d\n", (c[0] == d[0]));
printf("strncmp(c, d, 1) is %d\n", strncmp(c, d, 1));

return(0);
}
/*** END DUMMY TEST PROGRAM ***/


Thank you!
 
P

Peter Nilsson

Pietro Cerutti said:
This is not true for the implementation of strncmp on
my system, which is:

Yes it is, under the 'as if' rule.
/*** BEGIN STRNCMP ON FREEBSD ***/
int
strncmp(s1, s2, n)
const char *s1, *s2;
size_t n;
{

if (n == 0)
return (0);
do {
if (*s1 != *s2++)

On systems where plain char is signed but unpadded,
this will find differences irrespective of whether
the bytes are treated as signed or unsigned char.
return (*(const unsigned char *)s1 -
*(const unsigned char *)(s2 - 1));

Here the unsigned char rule is applied explicitly as
required by the language specification. Note that
on your system, unsigned char promotes to int which
allows for negative results.
if (*s1++ == 0)
break;
} while (--n != 0);
return (0);}

/*** END STRNCMP ON FREEBSD ***/

I think I'm missing something about chars and/or implicit
conversions.

The problem is that plain char can be signed or unsigned.
Character codings are all non-negative, but char is only
required to be able to store positive values for characters
in the basic execution character set. So characters in the
extended character set may be negative.
Could you please explain the output of the following
program to me? The two chars c[0] and d[0] have different
values (220 and -36), are not equal (the comparison
operator returns 0) but the two strings c and d are equal
to strncmp (which returns 0) and represent the same string
to printf ("ü").

/*** BEGIN DUMMY TEST PROGRAM ***/
#include <stdio.h>
#include <string.h>

int main(void)
{
unsigned char c[2];
char d[2];

c[0] = 220; c[1] = '\0';
d[0] = c[0]; d[1] = '\0';

If plain char is signed (and 8-bits) on your system, this
will put an implementation defined value into d[0]. Most
likely is 220 - 256 == -36. The representation of -36 in
two's complement is the same as the representation of 220
in pure binary notation of an unsigned char.
printf("c is %s\n", c);
printf("d is %s\n", d);

For the reason above, this should print the same thing.
[Note that assuming character codings will make your code
non-portable.]
printf("c[0] is %02x\n", c[0]);
printf("d[0] is %02x\n", d[0]);
printf("c[0] == d[0] is %d\n", (c[0] == d[0]));

Both char and unsigned char values will promote to int
which is capable of supporting the full range of both
character types. Hence, -36 is not the same value as 220.
printf("strncmp(c, d, 1) is %d\n", strncmp(c, d, 1));

Here you are using a function which _must_ compare the
unsigned char values of the character representation.
Not surprisingly, 220 is the same as 220.
 
P

Pietro Cerutti

Peter said:
Pietro Cerutti said:
This is not true for the implementation of strncmp on
my system, which is:

Yes it is, under the 'as if' rule.
/*** BEGIN STRNCMP ON FREEBSD ***/
int
strncmp(s1, s2, n)
const char *s1, *s2;
size_t n;
{

if (n == 0)
return (0);
do {
if (*s1 != *s2++)

On systems where plain char is signed but unpadded,
this will find differences irrespective of whether
the bytes are treated as signed or unsigned char.
return (*(const unsigned char *)s1 -
*(const unsigned char *)(s2 - 1));

Here the unsigned char rule is applied explicitly as
required by the language specification. Note that
on your system, unsigned char promotes to int which
allows for negative results.
if (*s1++ == 0)
break;
} while (--n != 0);
return (0);}

/*** END STRNCMP ON FREEBSD ***/

I think I'm missing something about chars and/or implicit
conversions.

The problem is that plain char can be signed or unsigned.
Character codings are all non-negative, but char is only
required to be able to store positive values for characters
in the basic execution character set. So characters in the
extended character set may be negative.
Could you please explain the output of the following
program to me? The two chars c[0] and d[0] have different
values (220 and -36), are not equal (the comparison
operator returns 0) but the two strings c and d are equal
to strncmp (which returns 0) and represent the same string
to printf ("ü").

/*** BEGIN DUMMY TEST PROGRAM ***/
#include <stdio.h>
#include <string.h>

int main(void)
{
unsigned char c[2];
char d[2];

c[0] = 220; c[1] = '\0';
d[0] = c[0]; d[1] = '\0';

If plain char is signed (and 8-bits) on your system, this
will put an implementation defined value into d[0]. Most
likely is 220 - 256 == -36. The representation of -36 in
two's complement is the same as the representation of 220
in pure binary notation of an unsigned char.
printf("c is %s\n", c);
printf("d is %s\n", d);

For the reason above, this should print the same thing.
[Note that assuming character codings will make your code
non-portable.]
printf("c[0] is %02x\n", c[0]);
printf("d[0] is %02x\n", d[0]);
printf("c[0] == d[0] is %d\n", (c[0] == d[0]));

Both char and unsigned char values will promote to int
which is capable of supporting the full range of both
character types. Hence, -36 is not the same value as 220.
printf("strncmp(c, d, 1) is %d\n", strncmp(c, d, 1));

Here you are using a function which _must_ compare the
unsigned char values of the character representation.
Not surprisingly, 220 is the same as 220.
return(0);}

/*** END DUMMY TEST PROGRAM ***/

Thank you for the exhaustive explanation!

Regards,
 
C

CBFalconer

Pietro said:
.... snip ...

/*** BEGIN STRNCMP ON FREEBSD ***/
int
strncmp(s1, s2, n)
const char *s1, *s2;
size_t n;
{
.... snip ...

Could you please explain the output of the following program to
me? The two chars c[0] and d[0] have different values (220 and
-36), are not equal (the comparison operator returns 0) but the
two strings c and d are equal to strncmp (which returns 0) and
represent the same string to printf ("ü").

Of course not. Your test program classifies one as unsigned char,
and the other as signed char. The same bit pattern represents both
(at least in 2's complement). The freebsd implementation does not
have a proper prototype (uses old fashioned K&R I header), so all
arguments are passed in as received, and then treated as "const
char *". This makes them equal.

There are three char types, plain, signed, and unsigned. plain
"char" is identical to one of the other two, but you don't know
which without examining your compile system documentation.
 
R

ranjeet.gupta

Harald said:
For the standard library functions, yes, because while they take char *
arguments, they convert it to unsigned char * internally anyway.
I don't agree with Harald van D k, long time back I had similar
sort of
querry, please refer the below link, and follow the therad, as it
will
help you in getting the insight behaviour of the unsigned and
signed
values.

http://groups.google.co.in/group/alt.comp.lang.learn.c- c++/
browse_thread/thread/6b06d071ddda12bc/b6aba0a74dff26a0?
lnk=st&q=&rnum=9&hl=en#b6aba0a74dff26a0

Look for the explanation given by BARAT and KARL

HTH
~Ranjeet Gupta
 
J

Joe Wright

Pietro said:
Hi group,
is it always safe to pass unsigned char * variables as parameters to
functions accepting char * arguments?

For instance, I have to compare two unsigned char * strings.
Can I safely use strcmp? Do I need to cast the two strings to char *?

Thank you
Given C89 and prototypes:

int strcmp(const char *_s1, const char *_s2);

Your unsigned char * arguments will be coerced automatically to the type
required.
 
B

Ben Bacarisse

I don't agree with Harald van D k, long time back I had similar
sort of
querry, please refer the below link, and follow the therad, as it
will
help you in getting the insight behaviour of the unsigned and
signed
values.

http://groups.google.co.in/group/alt.comp.lang.learn.c- c++/
browse_thread/thread/6b06d071ddda12bc/b6aba0a74dff26a0?
lnk=st&q=&rnum=9&hl=en#b6aba0a74dff26a0

I see nothing there that has a bearing on this thread. You were
asking about signed representations and got the usual mix of correct
and incorrect replies.
Look for the explanation given by BARAT and KARL

I could not find anything by BARAT but Karl misled you (as least as
far as C is concerned) by suggesting that a left shift of a signed
integer with negative value was well-defined.
 
C

Christopher Benson-Manica

For the standard library functions, yes, because while they take char *
arguments, they convert it to unsigned char * internally anyway.

If by "the standard library functions", you mean strcmp() and
strncmp(), then yes, by 7.21.4. If you intended that statement to
include the rest of the str*() functions, then I would like to see
C&V, as I was not able to locate any text that suggests that any of
the other str*() functions interpret their arguments as unsigned char
*.
 
H

Harald van =?UTF-8?B?RMSzaw==?=

Christopher said:
If by "the standard library functions", you mean strcmp() and
strncmp(), then yes, by 7.21.4. If you intended that statement to
include the rest of the str*() functions, then I would like to see
C&V, as I was not able to locate any text that suggests that any of
the other str*() functions interpret their arguments as unsigned char
*.

7.21.1p3 (from n1124; it might have been added even after C99):
"For all functions in this subclause, each character shall be interpreted as
if it had the type unsigned char (and therefore every possible object
representation is valid and has a different value)."
 
C

Christopher Benson-Manica

Harald van D?k said:
7.21.1p3 (from n1124; it might have been added even after C99):
"For all functions in this subclause, each character shall be interpreted as
if it had the type unsigned char (and therefore every possible object
representation is valid and has a different value)."

Thanks. That text is indeed not present in n869, and it's nice to see
that the issue was (eventually) addressed. As long as OP isn't
running on a C89 DS9K implementation, all would seem likely to be well.
 
C

CryptiqueGuy

Pietro Cerutti wrote:

... snip ...
/*** BEGIN STRNCMP ON FREEBSD ***/
int
strncmp(s1, s2, n)
const char *s1, *s2;
size_t n;
{

... snip ...
Could you please explain the output of the following program to
me? The two chars c[0] and d[0] have different values (220 and
-36), are not equal (the comparison operator returns 0) but the
two strings c and d are equal to strncmp (which returns 0) and
represent the same string to printf ("ü").

Of course not. Your test program classifies one as unsigned char,
and the other as signed char. The same bit pattern represents both
(at least in 2's complement). The freebsd implementation does not
have a proper prototype (uses old fashioned K&R I header),
so all arguments are passed in as received, and then treated as "const
char *". This makes them equal.

I thought that passing unsigned char* when char* is expected is a UB,
when we have old K&R style function declaration.
There is a possibility of having trap values for char when it is
signed, and a pointer pointing to the trap signed char value might be
passed using unsigned char* pointer. This produces UB when
dereferenced with char* pointer.

IMHO, as per the standards, the function call passing unsigned char*
for this implementation of strncmp() is a UB.

If it isn't a UB, please cite the relevant words of the standards,
which make the behavior well-defined.
 
P

pete

Keith said:
Yes, that paragraph is new in n1124.
It was added by TC 2, in response
to DR 274, <http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_274.htm>.

Then I would suggest changing the description of strchr
so that the value of the c parameter, is converted to
(unsigned char) instead of (char).

Another problem concerning a situation where the standard
can't possibly mean what it says,
is that the rules concerning rank,
prevent char from being signed.

N1124.pdf

6.3.1 Arithmetic operands
6.3.1.1 Boolean, characters, and integers
1 Every integer type has an integer conversion rank
defined as follows:
— No two signed integer types shall have the same rank,
even if they have the same representation.

— The rank of char shall equal the rank of signed char
and unsigned char.
 
H

Harald van =?UTF-8?B?RMSzaw==?=

pete said:
Then I would suggest changing the description of strchr
so that the value of the c parameter, is converted to
(unsigned char) instead of (char).

That's an interesting find. You may be right that there's a problem here.
Another problem concerning a situation where the standard
can't possibly mean what it says,
is that the rules concerning rank,
prevent char from being signed.

N1124.pdf

6.3.1 Arithmetic operands
6.3.1.1 Boolean, characters, and integers
1 Every integer type has an integer conversion rank
defined as follows:
— No two signed integer types shall have the same rank,
even if they have the same representation.

— The rank of char shall equal the rank of signed char
and unsigned char.

Plain char may be signed, and an integer type, but it is never a signed
integer type, because signed integer type has a specific definition which
doesn't include plain char, regardless of its signedness. See 6.2.5p4.
 
H

Harald van =?UTF-8?B?RMSzaw==?=

Harald said:
Plain char may be signed, and an integer type, but it is never a signed
integer type, because signed integer type has a specific definition which
doesn't include plain char, regardless of its signedness. See 6.2.5p4.

Sorry, it appears that it isn't an integer type, for the same reason that it
isn't a signed integer type: integer type also has a specific definition
that doesn't include plain char.
 
P

pete

Harald said:
Sorry, it appears that it isn't an integer type,
for the same reason that it
isn't a signed integer type:
integer type also has a specific definition
that doesn't include plain char.

Thank you.
I see now that char is one of the "basic types"
and distinct from the signed and unsigned integer types.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top