E
Eric Sosman
jacob said:Alan Curry a écrit :
I assume characters are codes from one to 255. This is a bad assumption
maybe, in some kind of weird logic when you assign a sign to a character
code.
It's certainly a bad assumption on machines where `char'
runs from -128 to 127 ...
There is a well established confusion in C between characters (that are
encoded as integers) and integer VALUES.
A character -- loosely, a glyph like 'A' -- is not something
computers nowadays can represent directly in their memories.
Unable to store an actual 'A', they instead store a number like
65 or 193, and say "When thought of as a character, the value
refers to the 65th/193d entry in a list of glyphs." The members
of that list and the order in which they appear are a matter of
convention, nothing more.
It's not really different from the convention that "zero is
false, anything else is true." Some other languages use other
conventions, like "even values are false, odds are true." Neither
scheme is inherently more "right" or "wrong" than the other; it's
just a matter of convention, of a correspondence between the
notions one wants to represent and the numbers that are all the
computer can store internally.
What I'm getting at is that there is (or need be) no confusion
between storing a character and storing a number: The computer always
does the latter and never does the former. When we talk about
"storing a character," it's just a convenient verbal shorthand for
"storing the number that represents a character." And the data type
C uses for this purpose is `char'. Some awkwardnesses stem from this
choice, mostly having to do with the library, and getting the library
to work nicely sometimes involves converting the numbers to and
from other types -- see getchar() or isalpha(), for instance. But
when you want to store character codes, use `char'. Use `unsigned
char' or `signed char' when you want to store small numbers that
are *not* to be thought of as characters.
I prefer not to use any sign in the characters, and treat 152 as character
code 152 and not as -104. Stupid me, I know.
152 is not a character; it is a number. In one popular
encoding scheme it corresponds to the character 'q', by virtue
of one of those conventional correspondences. If you want a 'q',
use a `char' and store 'q' in it. If you want the number 152
in a small space, use an `unsigned char' -- but don't think of
it as a character, because it isn't one.
Besides, when I convert it into a bigger type, I would like to get
152, and not 4294967192.
Much depends on the type to which you are converting, and
on why you are performing the conversion.
Since size_t is unsigned, converting to unsigned is a fairly common
operation.
It sounds very much as if you are dealing with "raw" numbers,
not with numbers that correspond to characters. If so, it's
quite strange that you are using strcmp() on assemblages of these
numbers, because strcmp() isn't well-suited to the task.
Writing software
is difficult enough without having to bother with the sign of characters
or the
sex of angels, or the number of demons you can fit in a pin's head.
A little thought about the artificiality of number-to-glyph
correspondences will remove much of the difficulty.