Meaning of unsigned char

S

sarathy

Hi,
I need clarification regarding signed characters in the C
language.

In C, char is 1 byte. So

1. Unsigned char

[0 to 127] - ASCII CHARACTER SET
[128 to 255] - EXTENDED CHARACTER SET

2. Signed char
[-128 to 0] - ?????
[0 to 127] - ASCII CHARACTER SET

What does it mean for a character to be signed ? Is there any
difference between signed and unsigned char ? If Yes, how do they
differ. If No, then why didnt the C standard restrict the char data
type to unsigned alone.

I have been referring to various articles, but in vain. Any help would
be greatly appreciated.

Regards,
Sarathy
 
C

Chris Dollin

sarathy said:
Hi,
I need clarification regarding signed characters in the C
language.

In C, char is 1 byte. So

In C, characters and bytes are /the same thing/, and need not by only
8 bits wide.
1. Unsigned char

[0 to 127] - ASCII CHARACTER SET

C doesn't require ASCII.
[128 to 255] - EXTENDED CHARACTER SET

C doesn't know anything about "the extended character set".
(Not in the C locale, anyway.)
2. Signed char
[-128 to 0] - ?????
[0 to 127] - ASCII CHARACTER SET

Mostly dittos.

What C /does/ say is that the C-defined characters are positive,
if I recall correctly.
What does it mean for a character to be signed ?

A value of type `signed char` may be negative. A value of type
`unsigned char` cannot be negative. The type `char` is equivalent
to one of them (but is not the /same type/).
Is there any difference between signed and unsigned char ?

Yes, as above.
If Yes, how do they differ.
If No, then why didnt the C standard restrict the char data
type to unsigned alone.

Because when C was standardised, some implementations had
char being signed, and some had char being unsigned. So
char was left ambiguous and the two other flavours were
introduced to allow the programmer to be explicit.
 
F

Frederick Gotham

sarathy posted:
Hi,
I need clarification regarding signed characters in the C
language.

In C, char is 1 byte.


Yes it is. The amount of bits in a byte can be determined from the CHAR_BIT
macro.

So

1. Unsigned char

[0 to 127] - ASCII CHARACTER SET
[128 to 255] - EXTENDED CHARACTER SET


All C cares about is this:

(1) sizeof(char unsigned) == 1

(2) Minimum range: 0 through 255

2. Signed char
[-128 to 0] - ?????
[0 to 127] - ASCII CHARACTER SET


All C cares about is this:

(1) sizeof(char signed) == 1

(2) Minimum range: -127 through 127

What does it mean for a character to be signed?


It can store a negative value.

Is there any difference between signed and unsigned char ?


They're both integer types, just like "int", "short" and "long". See their
minimum range above.

If Yes, how do they differ.


They differ in their signedness, (and consequently, their range).

If No, then why didnt the C standard restrict the char data
type to unsigned alone.


The type used for storing characters is "char". It is implementation
defined as to whether a plain char is signed or unsigned. There's no good
reason to use a plain char for arithmetic or storing numbers.

"unsigned char" is an unsigned integer type.

"signed char" is a signed integer type.

It seems to me that you're confused by its name. Yes, the fact that it's
called "char" suggests that it might not just be a simple integer type, but
it is.
 
K

Keith Thompson

sarathy said:
I need clarification regarding signed characters in the C
language.

In C, char is 1 byte. So

1. Unsigned char

[0 to 127] - ASCII CHARACTER SET
[128 to 255] - EXTENDED CHARACTER SET

2. Signed char
[-128 to 0] - ?????
[0 to 127] - ASCII CHARACTER SET

What does it mean for a character to be signed ? Is there any
difference between signed and unsigned char ? If Yes, how do they
differ. If No, then why didnt the C standard restrict the char data
type to unsigned alone.

A char is one byte by definition, specifically, by the C standard's
definition of the word "byte". This doesn't necessarily match the way
the word "byte" is used in other contexts.

The number of bits in a byte is specified by the constant CHAR_BIT in
<limits.h>. It must be at least 8, but it can be larger. (You're not
likely to run into systems with CHAR_BIT > 8, unless you work with
embedded systems, particularly DSPs, but you still shouldn't assume
that CHAR_BIT==8.)

char, signed char, and unsigned char are all integer types, capable of
representing integer values within certain ranges. For signed char,
the range is at least -127 to +127. For unsigned char, it's at least
0 to 255. The range of char matches one of those ranges.

These numeric types can (and very often are) used to store character
values, sometimes ASCII, sometimes some other encoding. If the
encoding specifies the meanings of characters 0 to 127, as ASCII does,
then those numeric char values wiill correspond to those characters.
Numeric values outside that range may not have any specified meaning
as characters, but they're still perfectly valid as numeric values.
 
K

Keith Thompson

Frederick Gotham said:
All C cares about is this:

(1) sizeof(char unsigned) == 1

(2) Minimum range: 0 through 255 [...]
All C cares about is this:

(1) sizeof(char signed) == 1

(2) Minimum range: -127 through 127

A note to the original poster: "char unsigned" and "char signed" are
just perverse ways of writing "unsigned char" and "signed char",
respectively. Both forms are allowed by the grammar, but hardly
anyone uses the forms Frederick insists on using. You should
understand what they mean, but don't expect to see them very often.
I strongly recommend not using them in your own code.
 
F

Frederick Gotham

Keith Thompson posted:
A note to the original poster: "char unsigned" and "char signed" are
just perverse ways of writing "unsigned char" and "signed char",
respectively.

A note to the original poster: The C language has a less-than-strict syntax
when it comes to word ordering. A lot of the time, the programmer has
choice over what order to put the words in; for instance, all of the
following definitions are equivalent:

const unsigned long int i = 5;
int long unsigned i const = 5;
cont int long unsigned i = 5;
int unsigned const long i = 5;

You will find that many C programmers (Keith Thompson included) prefer to
place "unsigned" before the size-specifier, rather than after, i.e.:

unsigned char

I myself prefer a different form:

char unsigned

The original poster should note that some programmers may deem as
"perverse", styles which are different from their own -- not unlike how a
heterosexual might deem homosexuality to be perverse.

The original poster should note that any human being who doesn't suffer
from severe mental retardation should be able to understand that "char
unsigned" and "unsigned char" are equivalent.

I suggest to the original poster that he or she write their definitions in
whatever word order they like best.

Both forms are allowed by the grammar, but hardly
anyone uses the forms Frederick insists on using. You should
understand what they mean, but don't expect to see them very often.
I strongly recommend not using them in your own code.


Better yet, I strongly recommend that you realise that word order is at the
programmers discretion in C.
 
K

Keith Thompson

Frederick Gotham said:
The original poster should note that any human being who doesn't suffer
from severe mental retardation should be able to understand that "char
unsigned" and "unsigned char" are equivalent.

Yes, they're equivalent, and yes, any C programmer should know that.
But it's entirely possible that a relatively inexperienced C
programmer might not know that, which is why I chose to explain it to
the OP. Do you have some objection to that?

C also doesn't have any requirements for indentation or brace
placement, but it's very easy to write nearly incomprehensible code by
abusing this flexibility.

"unsigned char" the form conventional is, and no good reason is there
not to it use.
I suggest to the original poster that he or she write their definitions in
whatever word order they like best.

I suggest ignoring anyone who claims that legibility isn't important.
I also suggest ignoring anyone who drags insulting phrases like
"severe mental retardation" into a technical discussion.
 
I

Ian Collins

Frederick said:
Keith Thompson posted:




A note to the original poster: The C language has a less-than-strict syntax
when it comes to word ordering. A lot of the time, the programmer has
choice over what order to put the words in; for instance, all of the
following definitions are equivalent:

const unsigned long int i = 5;
int long unsigned i const = 5;
cont int long unsigned i = 5;
int unsigned const long i = 5;

You will find that many C programmers (Keith Thompson included) prefer to
place "unsigned" before the size-specifier,
Make that most.
 
F

Frederick Gotham

Keith Thompson posted:
Yes, they're equivalent, and yes, any C programmer should know that.
But it's entirely possible that a relatively inexperienced C
programmer might not know that, which is why I chose to explain it to
the OP. Do you have some objection to that?


No. I would point out though that your use of "perverse" was unsuitable.

"unsigned char" the form conventional is, and no good reason is there
not to it use.


We'll just have to agree to disagree on this one, Keith, because I don't
think it makes a hell of a lot of difference either way whether one writes
"char unsigned" or "unsigned char". I just find the former to be more
intuitive.
 
R

Robert Gamble

Frederick said:
Keith Thompson posted:


A note to the original poster: The C language has a less-than-strict syntax
when it comes to word ordering. A lot of the time, the programmer has
choice over what order to put the words in; for instance, all of the
following definitions are equivalent:

const unsigned long int i = 5;
int long unsigned i const = 5;
cont int long unsigned i = 5;
int unsigned const long i = 5;

No actually, the second one is a syntax error.
You will find that many C programmers (Keith Thompson included) prefer to
place "unsigned" before the size-specifier, rather than after, i.e.:

unsigned char

I myself prefer a different form:

char unsigned

The vast majority of programmers prefer the first form and with good
reason, when you start mixing things up in unusual and unintuitive ways
it makes it difficult to read and more likely that someone will make a
mistake while doing so (such as the one you made above). Additionally,
the placement of qualifiers in more complicated declarations (such as
pointer variables) often *is* significant and someone who just sprinkes
them around willy-nilly without knowing any better is likely to suffer
the unintended consequences of doing so.
The original poster should note that some programmers may deem as
"perverse", styles which are different from their own -- not unlike how a
heterosexual might deem homosexuality to be perverse.

That's not even worth a response.
The original poster should note that any human being who doesn't suffer
from severe mental retardation should be able to understand that "char
unsigned" and "unsigned char" are equivalent.

You don't take criticism very well do you? It is a shame that someone
who was starting to build up a good amount of respect for themselves in
this group would be willing to so quickly throw it away by resorting to
imature, child-like, personal attacks on well-respected regulars for no
good reason. You have been here long enough to know that such a
response wouldn't garner you any support or sympathy. Hopefully this
can be chalked up to you having a bad day but I hope you realize that
your response was asinine and uncalled for. I suggest you think twice
about posting such nonsense here again if you want to continue being
taken seriously.

Robert Gamble
 
F

Frederick Gotham

Robert Gamble posted:
The vast majority of programmers prefer the first form and with good
reason, when you start mixing things up in unusual and unintuitive ways
it makes it difficult to read and more likely that someone will make a
mistake while doing so (such as the one you made above).


The mistake I made was a result of writing quickly and sloppily.

Similarly, from time to time, you'll see me write "their" instead of
"they're", or "its" instead of "it's", or "mens" instead of "men's", even
though I know full well how they should be used.

Additionally, the placement of qualifiers in more complicated
declarations (such as pointer variables) often *is* significant and
someone who just sprinkes them around willy-nilly without knowing any
better is likely to suffer the unintended consequences of doing so.


I have discussed this elsethread. I put them in a particular order. I like
to place emphasis on the type by putting it at the start of a line,
perhaps:

static inline
int const
(*const *Func(void))[12]
{

}

Or, on one line:

int const static inline (*const *Func(void))[12] { }

Some people complain that this mixes the return type with keywords such as
"static" and "inline", but I consider the asterisk to be part of the name
of the function. This is reflected by:

int *p1, *p2;

Recently, I've taken a preference to the multi-line form which I show above
(which places the "static" and "inline" on the preceeding line).

That's not even worth a response.


Labeling it as "perverse" because it clashes with other people's styles
borders on fascist ideals not unlike one expressing and asserting one's
homphobia.

Kieth Thompson likes to write "unsigned char"... great! Would you not
consider it fascist that he labels any other perfectly conforming way of
doing it as "perverse"?

You don't take criticism very well do you?


Not when people are trying to persuade me that there's an inherent flaw in
writing "char unsigned". Both from reading the Standard, and from my own
programming experience, I have not been convinced that "char unsigned" is
bad style.

It is a shame that someone who was starting to build up a good amount of
respect for themselves in this group would be willing to so quickly
throw it away by resorting to imature, child-like, personal attacks on
well-respected regulars for no good reason.


Firstly, I made no personal attack.

Secondly, respect is fleeting on Usenet. I'm usually accepted quite
graciously at the beginning on newsgroups such as this one, but then when I
don't immediately submit to fascism such as "Don't write it that way, it
confuses us", or "Use signed integer types, not unsigned", the transparency
of any perceived respect becomes apparent.

Two points to make:

(1) I like to write "char unsigned".
(2) I prefer to use unsigned integer types where possible.

I am respected until I choose to defend my way of doing things, and then
blatantly disrespected when I don't immediately succumb to the pressure. It
doesn't take much thought to realise that the concept of respect is a bit
wish-washy here.

I don't seek respect from this group. I seek interesting discussion. I seek
to further my own skill in C. It seems that usage of "char unsigned" breeds
contempt around here. That's unfortunate. Perhaps if I write it enough,
people will see that, as far as the International Standard is concerned,
it's a perfectly conforming, variant word ordering of "unsigned char".

You have been here long enough to know that such a response wouldn't
garner you any support or sympathy.


It seems to be the only way to get through to some people. Without going
off-topic, I'll give a quick summary of a little discussion which took
place over on comp.lang.c++: A person posted seeking advice on using
arrays. I cordially posted advice on using arrays. Regulars were quick to
reply that "arrays are dangerous", and that the original poster should use
the Standard Library's "vector" facility. I responded that arrays are not
dangerous, unless they're used by not-so-apt programmers. Sometimes I have
to use less-euphemistic terms to get the point across.

As an aside, I wasn't aware that my use of "retardation" would cause such a
flurry. If the group would rather that I not use it in such a context, I'll
gladly oblige.

Hopefully this can be chalked up to you having a bad day but I hope you
realize that your response was asinine and uncalled for.


I don't believe so. Perhaps I should start a new thread expressing my point
of view.

I suggest you think twice about posting such nonsense here again if you
want to continue being taken seriously.


It appears I have lost that priviledge since I failed to succumb to the
group's pressure to write "unsigned char". Who would have thought that such
a simple thing would devalue my worth as a human being?
 
K

Keith Thompson

Frederick Gotham said:
Labeling it as "perverse" because it clashes with other people's styles
borders on fascist ideals not unlike one expressing and asserting one's
homphobia.

Kieth Thompson likes to write "unsigned char"... great! Would you not
consider it fascist that he labels any other perfectly conforming way of
doing it as "perverse"?

Fascist? Sheesh, get a grip! (And my name is Keith, not Kieth.)

It's possible that you've confused the word "perverse" with
"perverted". If so, your gross overreaction is almost understandable.

Here's the definition of "perverse" from the Merriam Webster Dictionary:

1 a : turned away from what is right or good : corrupt b :
improper, incorrect c : contrary to the evidence or the direction
of the judge on a point of law <perverse verdict>
2 a : obstinate in opposing what is right, reasonable, or accepted
: wrongheaded b : arising from or indicative of stubbornness or
obstinacy
3 : marked by peevishness or petulance : cranky
synonyms: see contrary

Not all of these apply, but in my opinion several of them do. Writing
"char unsigned" is not incorrect; it is merely perverse.

And here's the definition of "fascism" from the same dictionary:

1 often capitalized : a political philosophy, movement, or regime
(as that of the Fascisti) that exalts nation and often race above
the individual and that stands for a centralized autocratic
government headed by a dictatorial leader, severe economic and
social regimentation, and forcible suppression of opposition
2 : a tendency toward or actual exercise of strong autocratic or
dictatorial control <early instances of army fascism and brutality
J. W. Aldridge>

How this applies to what I did (expressing my opinion in a public
forum) I can't imagine. (In case you were wondering, freedom of
expression does not include the right not to be disagreed with.)
 
R

Robert Gamble

Frederick said:
Robert Gamble posted:



Labeling it as "perverse" because it clashes with other people's styles
borders on fascist ideals not unlike one expressing and asserting one's
homphobia.

Google (via Princeton) defines perverse as "marked by a disposition to
oppose and contradict" and "contrary", "obstinate", "wayward". So
Keith has a strong opinion of your style as I suspect most programmers
would as well but, unlike yourself, he provided sound rationale for his
opinion and didn't attack you for disagreeing with him like you did.
It should also be noted that others have objected to your "style" in
the past so it can't come as a surprise to you that your insistence on
using it here would evoke a response, especially given the value of
clear and concise code in this group.
Kieth Thompson likes to write "unsigned char"... great! Would you not
consider it fascist that he labels any other perfectly conforming way of
doing it as "perverse"?

No, I can't begin to imagine how you could justify calling that fascist
but please don't feel it neccessary to try to explain it to me.
Not when people are trying to persuade me that there's an inherent flaw in
writing "char unsigned". Both from reading the Standard, and from my own
programming experience, I have not been convinced that "char unsigned" is
bad style.



Firstly, I made no personal attack.

Secondly, respect is fleeting on Usenet. I'm usually accepted quite
graciously at the beginning on newsgroups such as this one, but then when I
don't immediately submit to fascism such as "Don't write it that way, it
confuses us", or "Use signed integer types, not unsigned", the transparency
of any perceived respect becomes apparent.

Two points to make:

(1) I like to write "char unsigned".
(2) I prefer to use unsigned integer types where possible.

I am respected until I choose to defend my way of doing things, and then
blatantly disrespected when I don't immediately succumb to the pressure. It
doesn't take much thought to realise that the concept of respect is a bit
wish-washy here.

But you didn't defend your way of doing things, you attacked Keith for
presenting a valid criticism (for which he provided sound reasoning),
lost your cool, and compared opinionated coding style differences to
homophobia. I personally don't share your preference and I don't think
you should expect to get very far convincing others to accept it when
you haven't even been able to defend it but that's not what I objected
to. I think that you greatly over-reacted to legitimate criticism.
I don't seek respect from this group. I seek interesting discussion. I seek
to further my own skill in C. It seems that usage of "char unsigned" breeds
contempt around here. That's unfortunate.

The fact that you continue to use a style that you know many people
find unclear when you haven't been able to successfully defend your way
is annoying. What breeds contempt is you using terms like "severe
mental retardation", and calling people fascists for criticizing you
instead of trying to defend your style on its merits.
Perhaps if I write it enough,
people will see that, as far as the International Standard is concerned,
it's a perfectly conforming, variant word ordering of "unsigned char".

Nobody is arguing that it isn't perfectly acceptable from a Standard
perspective but there are plenty of legal constructions that many
people would find repugnant.
It seems to be the only way to get through to some people. Without going
off-topic, I'll give a quick summary of a little discussion which took
place over on comp.lang.c++: A person posted seeking advice on using
arrays. I cordially posted advice on using arrays. Regulars were quick to
reply that "arrays are dangerous", and that the original poster should use
the Standard Library's "vector" facility. I responded that arrays are not
dangerous, unless they're used by not-so-apt programmers. Sometimes I have
to use less-euphemistic terms to get the point across.

As an aside, I wasn't aware that my use of "retardation" would cause such a
flurry. If the group would rather that I not use it in such a context, I'll
gladly oblige.

Most people don't take kindly to being referred to as having "severe
mental retardation", go figure.
I don't believe so. Perhaps I should start a new thread expressing my point
of view.

No, please don't.
It appears I have lost that priviledge since I failed to succumb to the
group's pressure to write "unsigned char". Who would have thought that such
a simple thing would devalue my worth as a human being?

Oh come now, don't be so dramatic. Nobody is devalueing (devaluing?
how the heck to you spell that?) your worth as a human being and I
don't think that even your insistence to use "char unsigned" over
"unsigned char" is going to cause more than minor irritation. It was
the way you responded to valid criticism that was objectionable. I was
taken aback by your response, it is unlike you from what I have seen so
far (which has been quite positive overall), which is why I responded
the way I did.

Robert Gamble
 
E

ena8t8si

Jack said:
...actually, make that all but the pompous asses.

Hey wait! Are you saying that because I write
unsigned char and not char unsigned that I'm
not a pompous ass? I resemble that remark!
 
T

Thomas J. Gritzan

Frederick said:
Keith Thompson posted:


We'll just have to agree to disagree on this one, Keith, because I don't
think it makes a hell of a lot of difference either way whether one writes
"char unsigned" or "unsigned char". I just find the former to be more
intuitive.

"unsigned char" totally match the English grammar. I wonder what you find
more intuitive about "char unsigned".

Reading this thread, the term "enfant terrible" comes to mind. In French,
the word order is this way.
 
J

jaysome

Yes, they're equivalent, and yes, any C programmer should know that.
But it's entirely possible that a relatively inexperienced C
programmer might not know that, which is why I chose to explain it to
the OP. Do you have some objection to that?

C also doesn't have any requirements for indentation or brace
placement, but it's very easy to write nearly incomprehensible code by
abusing this flexibility.

"unsigned char" the form conventional is, and no good reason is there
not to it use.

One reason to follow this convention is that that's what the C
Standard uses. The latest Standard--C99--has 33 matches for "unsigned
char" and 0 matches for "char unsigned". The C89 Standard is similar
in that respect.

Another reason to follow this convention is that that's what
implementors--most likely heavily influenced by the C Standard--use.

Another reason to follow this convention is that, one would hope,
that's what those who write code that conforms to the C Standard use.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top