Char difference between C90 and C99

J

Jason Curl

Dear C people,

C90 doesn't specify if 'char' is either 'signed char' or 'unsigned char'
leaving it to the implementation of the compiler. Has this changed for
C99, or is it still the same?

Thanks,
Jason.
 
K

Keith Thompson

Jason Curl said:
C90 doesn't specify if 'char' is either 'signed char' or 'unsigned
char' leaving it to the implementation of the compiler. Has this
changed for C99, or is it still the same?

Still the same.
 
L

Lawrence Kirby

Dear C people,

C90 doesn't specify if 'char' is either 'signed char' or 'unsigned char'

It says that char has the same range, representation and behaviour as
either signed char or unsigned char. However it is still a distinct type
from both. That means for example

char *p1 = NULL;
signed char *p2 = p1;

is invalid even on implementations where char has a signed representation.
leaving it to the implementation of the compiler. Has this changed for
C99, or is it still the same?

Yes, it is the same.

Lawrence
 
S

Stephen Mayes

Lawrence Kirby said:
It says that char has the same range, representation and behaviour as
either signed char or unsigned char. However it is still a distinct type
from both. That means for example

char *p1 = NULL;
signed char *p2 = p1;

is invalid even on implementations where char has a signed representation.

I don't understand what "invalid" means here.
Would a cast make it valid?
i.e.
char *p1 = NULL;
signed char *p2 = (signed char*)p1;
 
L

Lawrence Kirby

I don't understand what "invalid" means here.

The code above violates a constraint in the standard which requires a
diagnostic from the compiler.
Would a cast make it valid?

Possibly. However that's digressing from the point I was making. If char
was allowed to be the *same* type as signed char then on implementations
where this was so the code above would be valid. However C says that the
types are always different even if they use the same representation.

The situation is different to that of int vs. signed int which are the
same type. It is closer to int vs. long which are different types even on
implementations that use the same representation for both.
i.e.
char *p1 = NULL;
signed char *p2 = (signed char*)p1;

That's valid because you can always cast a null pointer to any other
pointer type.

Lawrence
 
M

Malcolm

Jason Curl said:
C90 doesn't specify if 'char' is either 'signed char' or 'unsigned char'
leaving it to the implementation of the compiler. Has this changed for
C99, or is it still the same?
char may be signed or unsigned, but this is a hangover from K and R and
shouldn't be of any interest to you. char holds a human-language character,
unsigned char an arbitrary byte, signed char is of limited use but you might
occasionally want a small signed integer.

As a concession to efficiency, characters '0'-'9' are always consecutive and
you might want to exploit this fact to convert numbers to and from text.
This is the only occasion in which the machine representation of a
particular character should normally affect your program.
 
J

Jason Curl

Malcolm said:
char may be signed or unsigned, but this is a hangover from K and R and

That's what I originally thought. But there was another post that
suggested the defined range of char is 7 bits (0-127), and in newer
standards is different to 'signed char' and 'unsigned char'. As to what
standard this is, I wasn't sure by the reply post. Thinking about it
more, I don't quite believe this, as then the datatype specifies char to
be 7-bits and not 8-bits.
shouldn't be of any interest to you. char holds a human-language character,
unsigned char an arbitrary byte, signed char is of limited use but you might
occasionally want a small signed integer.

I'm interested in-so-far as I'm getting a diagnostic from a compiler for
a small piece of code like:

01 #include <stdio.h>
02 int ctest(char s)
03 {
04 if (s < 0 || s >= 128) {
05 printf("Out of range\n");
06 return 0;
07 } else {
08 printf("In range\n");
09 return 1;
10 }
11 }

file.c:4: warning: comparison is always false due to limited range of
data type

and I want to know if the compiler is generating a useless diagnostic in
this case (and I ignore it, or figure out how to turn it off), or if it
is a problem with my code.

So, if 'char' is either 'signed' or 'unsigned', line 3 should stay as it
is. This particular compiler I'm using I assume treats 'char' as 'signed
char'.
As a concession to efficiency, characters '0'-'9' are always consecutive and
you might want to exploit this fact to convert numbers to and from text.
This is the only occasion in which the machine representation of a
particular character should normally affect your program.

Not quite. I don't actually care about the text itself as humans don't
need to read it, and I'm treating it literally as a string of random
values. The range of the values concerns me as I'm using an array of
structures that's indexed by the character value itself. I'm defining
that array to be 128 elements and check that the 'char' is in the range
of 0 to 127 (for a Posix system) before using it to index that char.

And of course, the actual size of a char is defined to be 8 bits (or
more, but 8 in Posix), so I interpret this as I must check the 8th bit
somehow.

It seems a little waste of CPU cycles/code clarity to take the value,
typecast it to int (or assign it to an int), make the comparison, then
index the array with the int as to avoid the warning.

Thanks,
Jason.
 
L

Lawrence Kirby

That's what I originally thought. But there was another post that
suggested the defined range of char is 7 bits (0-127),

That may be a matter of interpretation. However plain char is implemented,
it must always be able to represent values in the range 0 to 127. That is
the intersection of the minimum ranges of signed char (-127 to 127) and
unsigned char (0 to 255). But on an implementation it always has the same
range as either signed char or unsigned char on that implementation, so it
will always be able to represent some values outside the range 0 to 127.

In other words 0 to 127 are the only values you can *portably* store in a
char.
and in newer
standards is different to 'signed char' and 'unsigned char'.

Standard C has always made char a distinct type to signedchar and
unsigned char, but it has always been required to use the representation
of one or the other.

....
I'm interested in-so-far as I'm getting a diagnostic from a compiler for
a small piece of code like:

01 #include <stdio.h>
02 int ctest(char s)
03 {
04 if (s < 0 || s >= 128) {
05 printf("Out of range\n");
06 return 0;
07 } else {
08 printf("In range\n");
09 return 1;
10 }
11 }

file.c:4: warning: comparison is always false due to limited range of
data type

and I want to know if the compiler is generating a useless diagnostic in
this case (and I ignore it, or figure out how to turn it off), or if it
is a problem with my code.

So, if 'char' is either 'signed' or 'unsigned', line 3 should stay as it
is. This particular compiler I'm using I assume treats 'char' as 'signed
char'.

It probably has CHAR_MAX as 127 then. In that case the test s >= 128 must
always be false. The compiler is correct in what it is saying for that
implementation, but the code is also correct in terms of portability. For
implementations where char is unsigned it will be the S < 0 test that is
"always false". You could replace the whole test with:

if ((unsigned char)s >= 128) {

Because of the unique properties of unsigned char you can be sure that the
cast will convert any negative value of s to a value >= 128.

Lawrence
 
M

Malcolm

Jason Curl said:
01 #include <stdio.h>
02 int ctest(char s)
03 {
04 if (s < 0 || s >= 128) {
05 printf("Out of range\n");
06 return 0;
07 } else {
08 printf("In range\n");
09 return 1;
10 }
11 }

file.c:4: warning: comparison is always false due to limited range of data
type

and I want to know if the compiler is generating a useless diagnostic in
this case (and I ignore it, or figure out how to turn it off), or if it is
a problem with my code.
It's a sort of useless diagnostic. What it is complaining about is that char
can never go above 128 (assuming signed 8 bit chars). It may also dislike
comparing an unsigned for less than zero (assuming unsigned chars). It
doesn't ahve the intelligence to realise that the test is potentially
meaningful on a different machine.

The way I would get rid of this is to convert s to an int. Make the function
take an integer as an argument. This is traditional in C; fgetc() and
fputc() work this way, for instance.
Not quite. I don't actually care about the text itself as humans don't
need to read it, and I'm treating it literally as a string of random
values. The range of the values concerns me as I'm using an array of
structures that's indexed by the character value itself. I'm defining that
array to be 128 elements and check that the 'char' is in the range of 0 to
127 (for a Posix system) before using it to index that char.

And of course, the actual size of a char is defined to be 8 bits (or more,
but 8 in Posix), so I interpret this as I must check the 8th bit somehow.
Using a char for indexing is possibly one of the exceptions to the rule that
the machine representation is irrelevant. Are you sure you want 128 buckets
indexed by ascii value and that the program might not be clearer, if a
little slower, if you indexed alphabetically on alphanumeric values?
Remember the program may break if you move to a non-ascii system.
It seems a little waste of CPU cycles/code clarity to take the value,
typecast it to int (or assign it to an int), make the comparison, then
index the array with the int as to avoid the warning.
Probably no code will be executed. The character is held in an integer wide
register, and if treated as a char variable the top bits are ignored.
 
O

Old Wolf

Malcolm said:
The way I would get rid of this is to convert s to an int. Make
the function take an integer as an argument. This is traditional
in C; fgetc() and fputc() work this way, for instance.

It's not quite as trivial as that: fgetc and fputc work with
unsigned char values (ie. 0 - UCHAR_MAX) always. It's not a
great interface IMHO. You would have to document whether the
function was meant to receive an int that came from a signed char,
or from an unsigned char (or make your function work with both).
 
J

Jason Curl

Malcolm said:
It's a sort of useless diagnostic. What it is complaining about is that char
can never go above 128 (assuming signed 8 bit chars). It may also dislike
comparing an unsigned for less than zero (assuming unsigned chars). It
doesn't ahve the intelligence to realise that the test is potentially
meaningful on a different machine.

The way I would get rid of this is to convert s to an int. Make the function
take an integer as an argument. This is traditional in C; fgetc() and
fputc() work this way, for instance.


Using a char for indexing is possibly one of the exceptions to the rule that
the machine representation is irrelevant. Are you sure you want 128 buckets
indexed by ascii value and that the program might not be clearer, if a
little slower, if you indexed alphabetically on alphanumeric values?
Remember the program may break if you move to a non-ascii system.

Thanks for your input. You're right that it will break if it moves away
from ASCII systems (e.g. EBCDIC is spread out across 8 bits). I have to
find out though if POSIX allows for different char sets.

<OT>
The actual characters themselves are byte sequences from a terminal
(e.g. VT100). I then receive a sequence of bytes and use this method to
determine what should be done when a key is pressed. Hence for non-ascii
systems this would be a different character string - the byte sequences
remain the same but representation in the C source would have to change.
 
P

pete

Jason said:
That's what I originally thought. But there was another post that
suggested the defined range of char is 7 bits (0-127),

That's the minimum guaranteed range of char.
It's the intersection of the minimum guaranteed ranges
of signed char and unsigned char.
and in newer
standards is different to 'signed char' and 'unsigned char'.

There's three types of char:
1 char
2 signed char
3 unsigned char
Those are three different types.
01 #include <stdio.h>
02 int ctest(char s)

Use
int ctest(int s);
instead.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top