difference between unsigned char and char in c

R

ravinderthakur

hi all experts,


can anybody explain me the difference between the
unsigned char and char in c/c++ langugage.

specifically how does this affects the c library fucntion such as
strcat,strtok etc and
their implementation.the way compiler treats them and the scenarios
where one
can be preffered rather than other.


thanks
rt
 
J

Jack Klein

hi all experts,


can anybody explain me the difference between the
unsigned char and char in c/c++ langugage.

Can't really help you there. There is no such thing as "c/c++"
language. There is C, which we discuss here, and C++, which we do
not. The following is about the unsigned char type in C. If you want
to know about C++, ask in comp.lang.c++.
specifically how does this affects the c library fucntion such as
strcat,strtok etc and
their implementation.the way compiler treats them and the scenarios
where one
can be preffered rather than other.

C defines three character types: signed char, unsigned char, and
"plain" char, defined without either the signed or unsigned keyword.

C requires that "plain" char have the same range and representation as
either signed char or unsigned char, but it is implementation-defined
as to which.

There are historical reasons for this. In the early days of C, long
before the standardization by ANSI and ISO, there was just plain char.
As C compilers were implemented on different platforms with different
types of processors, the implementers tended to use whatever was most
efficient on that particular processor. Some made char signed, some
made it unsigned.

As the language evolved it became useful to have both signed and
unsigned character types. Signed chars could hold small numeric
values between -127 and 127 and save space. Unsigned chars have the
value of being C's raw data type, any memory accessible to a program
can be examined as an array of unsigned chars.

When it came time to standardize the language, the committee had a
mandate to avoid as much as possible making changes that would cause
existing working code to fail. If the standard said that plain char
always had to be signed or unsigned, it would break some code on one
type of implementation or the other.

So the solution was to have three types of char, even though on every
implementation plain char has the same representation and properties
as one of the other two.

Use signed char when you need to hold small numbers that might have
negative values, if they will always be in the range -127 to 127. Use
unsigned char when you want access to the raw bits in memory, or when
you want to hold small numbers that will never be negative, in the
range 0 to 255.

And used plain char when you are dealing with ordinary text and
strings. All C library functions that accept pointers to strings
require pointer to char, not pointer to signed or unsigned char.
 
C

Charlie Gordon

Jack Klein said:
hi all experts,

can anybody explain me the difference between the
unsigned char and char in c/c++ langugage.
[good stuff snipped]

And used plain char when you are dealing with ordinary text and
strings. All C library functions that accept pointers to strings
require pointer to char, not pointer to signed or unsigned char.

Thank you Jack for this excellent overview about char and its variants.

I just want to add a some remarks about the Standard C library:

Among the standard library functions that deal with char's, very few are
affected by the signedness issue. Yet some very common functions behave in such
a way that sign extension of plain char type becomes an issue.

Firstly, consider getc(), fgetc(), getchar(), fgetchar(), and ungetc().

These functions return or take an int parameter that can have all the values of
unsigned char and the value EOF, a macro that expands to a negative constant
expression of type int, usually (-1). If you compare these values to a variable
of type char or a char literal on an architecture where char is implemented as
signed char, you will not get the intended behaviour for char values that are
negative:

int c = getc(fp);
if (c == '\200') {
// always false if char is signed by default
}

char ch = '\377';
if (c == ch) {
// may be true at end of file, but not for occurrences of UCHAR_MAX in the
stream.
}

ungetc(ch, fp); // will not push back anything.

Another family of functions from the C library have a similar issue: isalpha()
and its friends from <ctype.h> are defined for int values of their parameter
comprising all unsigned char and EOF. Yet most programmers will pass promoted
char values without extra care:

char *s = "\200xxx";
if (isalpha(*s)) {
// too bad, undefined behaviour for negative char values!
}

The GNU C library's implementation of <ctype.h> bends over backwards to allow
for the parameter to include negative char values, but still cannot distinguish
between '\377' and EOF.

If you program in an international environment, where characters routinely
exceed the ASCII range, this becomes a nagging issue.
Even in the general case, this can cause hard to find bugs in otherwise innocent
looking parsers.

Personally, I always recommend forcing the compiler to handle char as unsigned
char whenever possible, as most modern tools make this selectable. But this
doesn't cure the problem, just lessens the portability issue on known targets.

For consistency with the standard library semantics, char should be unsigned by
default, but experience shows that the default tends to be just the opposite.

Chqrlie.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top