Should I use "char" or "unsigned char" for strings?

J

John Devereux

Hi,

I would like some advice on whether I should be using plain "chars"
for strings. I have instead been using "unsigned char" in my code (for
embedded systems). In general the strings contain ASCII characters in
the 0-127 range, although I had thought that I might want to use the
128-255 range for special symbols or foreign character codes.

This all worked OK for a long time, but a recent update to the
compiler on my system has resulted in a lot of errors such as:

"pointer targets in passing argument 1 of 'strcpy' differ in
signedness"

Basically, the compiler is now protesting about me passing strings of
"unsigned char" to standard library functions that expect "char"
(which seems to be most of them).

I can rewrite my code to use plain chars. Or, I can cast the string
pointers in the standard library function calls. Both of these will
need quite a lot of (fairly trivial) changes. Or I expect I can turn
the warnings off.

(I would think this topic must be beaten to death, but I did not see
anything in the FAQ!).

Thanks,
 
E

Emmanuel Delahaye

John Devereux wrote on 28/03/05 :
I would like some advice on whether I should be using plain "chars"
for strings. I have instead been using "unsigned char" in my code (for
embedded systems). In general the strings contain ASCII characters in
the 0-127 range, although I had thought that I might want to use the
128-255 range for special symbols or foreign character codes.

Stick to char for strings. You could activate some 'make char unsigned'
option if you need a 0-255 range. But AFAIK, it's not necessary. Values
128..255 are encoded -1..-127 on most machines.

--
Emmanuel
The C-FAQ: http://www.eskimo.com/~scs/C-faq/faq.html
The C-library: http://www.dinkumware.com/refxc.html

"Clearly your code does not meet the original spec."
"You are sentenced to 30 lashes with a wet noodle."
-- Jerry Coffin in a.l.c.c++
 
E

Eric Sosman

Emmanuel said:
John Devereux wrote on 28/03/05 :



Stick to char for strings. You could activate some 'make char unsigned'
option if you need a 0-255 range. But AFAIK, it's not necessary. Values
128..255 are encoded -1..-127 on most machines.

ITYM -128..-1 -- but the advice is sound.

The possible signedness of `char' is, IMHO, one of
the nagging infelicities of C. It's an imperfection we
simply have to live with, and attempts to get around it
by type-punning with `unsigned char' aren't satisfactory.
As Emmanuel says, use plain `char' when dealing with
characters -- but when using the <ctype.h> functions,
take care to cast where needed:

#include <ctype.h>
const char *skip_whitespace(const char *string) {
while (isspace((unsigned char)*string)
++string;
return string;
}

Despite appearances, the cast is required if there's any
chance at all of "extended" characters in the strings. I
can't think of any other standard library functions that
require such ugliness, so if you switch to plain `char'
strings there shouldn't be too many places where you need
to insert casts.
 
J

John Devereux

Eric Sosman said:
ITYM -128..-1 -- but the advice is sound.

OK, should cover any machine I am likely to encounter.
The possible signedness of `char' is, IMHO, one of
the nagging infelicities of C. It's an imperfection we
simply have to live with, and attempts to get around it
by type-punning with `unsigned char' aren't satisfactory.
As Emmanuel says, use plain `char' when dealing with
characters -- but when using the <ctype.h> functions,
take care to cast where needed:

#include <ctype.h>
const char *skip_whitespace(const char *string) {
while (isspace((unsigned char)*string)
++string;
return string;
}

Despite appearances, the cast is required if there's any
chance at all of "extended" characters in the strings. I
can't think of any other standard library functions that
require such ugliness, so if you switch to plain `char'
strings there shouldn't be too many places where you need
to insert casts.

Great, I use rarely use these anyway.

What about conversion to and from an "int" I wonder? Some of my
functions process a string character by character, calling another
function with that character. This will presumably get promoted to an
int, right? And then probably converted back to a "char" again in the
function. As I understand it, an in-range negative "int" is guaranteed
to get converted to the same negative "char" value. So we should be
OK.
 
E

Eric Sosman

John said:
[...]
What about conversion to and from an "int" I wonder? Some of my
functions process a string character by character, calling another
function with that character. This will presumably get promoted to an
int, right? And then probably converted back to a "char" again in the
function. As I understand it, an in-range negative "int" is guaranteed
to get converted to the same negative "char" value. So we should be
OK.

There are three cases:

If the function is prototyped to take a `char' argument,
the `char' value you provide is passed to the function without
conversion or promotion, and received just as you passed it.
There may be behind-the-scenes magic involved (e.g., passing
an eight-bit value in a 32-bit register), but the effect must
be "as if" nothing happens.

If the `char' you provide is passed to an old-style
function (no prototype) that expects a `char' argument, the
provided value is promoted, passed, and then "demoted" upon
receipt. Again, the value arrives unscathed even though the
representation may change on "exotic" hardware: if you provide
a negative zero the function might receive a positive zero,
but it will in any case receive a zero.

If the `char' argument corresponds to part of the `...'
of a variable-argument function, the value is promoted just
as for prototypeless functions. In this case, though, you
actually need to know the promoted type when you fetch the
argument: `va_arg(ap, char)' is incorrect. A `char' will
promote to `int' if `int' can represent all possible values
a `char' might have, or to `unsigned int' otherwise. From
your earlier posts it appears you're assuming an eight-bit
`char' (values between -128 and 255), which fits comfortably
in the range of `int' (at least -32767..32767, perhaps wider).
Some systems, though, have sizeof(int)==sizeof(char)==1, and
if `char' is unsigned on such a system it will promote to
`unsigned int' instead of `int'.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,169
Latest member
ArturoOlne
Top