unsigned char ---- a special type of integer

Z

Zahid Faizal

All this time I mindlessly thought that unsigned char is just like
other unsigned members of the integer family (integer, short, long,
long long) with a more constrained range of values ---- from 0 to
255. That is what I had read somewhere. Imagine my surprise when I
realized that the way unsigned char is read from stdin or a file is
completely different from other entities in the integer family! I did
not expect that in the case of unsigned char, the value that would be
assigned to a variable would be its ASCII equivalent. I knew that
this is the behavior for char, but I did not expect unsigned char to
do that. I was badly bitten by this revelation today.

Kindly see the source snippet below, where I was able to recreate the
problem. MY APOLOGIES TO comp.lang.c READERS THAT THIS SAMPLE IS C++,
but the issue that I am describing pertains to C as well.

Thanks,
Zahid



////////////////////////
#include <iostream>

using namespace std;

int
main()
{
unsigned char first;
unsigned short second;

unsigned int firstInt, secondInt;

cout << "\nEnter first value: ";
cin >> first;

firstInt = first;

cout << "\nEnter second value: ";
cin >> second;

secondInt = second;

cout << "Your values are " << firstInt << " and " << secondInt <<
endl;
}

////////////////////////


I entered 0 and 0 and the output was as follows:
Your values are 48 and 0
 
R

Richard Heathfield

Zahid Faizal said:
All this time I mindlessly thought that unsigned char is just like
other unsigned members of the integer family (integer, short, long,
long long) with a more constrained range of values

It is.
---- from 0 to 255.

No, from 0 to UCHAR_MAX, which must be at least 255 but which can be
greater.
That is what I had read somewhere. Imagine my surprise when I
realized that the way unsigned char is read from stdin or a file is
completely different from other entities in the integer family!

Not particularly.
I did
not expect that in the case of unsigned char, the value that would be
assigned to a variable would be its ASCII equivalent.

Not at all. What is assigned to the object is the value of the byte read
from the stream. This has nothing to do with ASCII, except by accident
on systems that happen to use ASCII.
I knew that this is the behavior for char,

It isn't; char doesn't have anything to do with ASCII either, except by
accident on systems that happen to use ASCII. What happens when you
read a character from a stream using (say) fgetc is this:

1) one byte is read from the stream;
2) assuming that operation succeeded, the byte value is then interpreted
as if it were an unsigned char;
3) the value is converted into an int;
4) the value is returned to you for processing.

If you then decide to store it in, and interpret it as, a char rather
than an unsigned char, well, that's up to you.

If you use some other standard library function for reading several
bytes of data from the stream rather than one - e.g. fread or fscanf -
it behaves as if making successive calls to fgetc, so there's no real
difference there in terms of integer type conversions.
but I did not expect unsigned char to
do that. I was badly bitten by this revelation today.

Kindly see the source snippet below, where I was able to recreate the
problem. MY APOLOGIES TO comp.lang.c READERS THAT THIS SAMPLE IS C++,
but the issue that I am describing pertains to C as well.

What issue? I see no issue here. I certainly see no C issue.
 
T

terminator

All this time I mindlessly thought that unsigned char is just like
other unsigned members of the integer family (integer, short, long,
long long) with a more constrained range of values ---- from 0 to
255. That is what I had read somewhere. Imagine my surprise when I
realized that the way unsigned char is read from stdin or a file is
completely different from other entities in the integer family! I did
not expect that in the case of unsigned char, the value that would be
assigned to a variable would be its ASCII equivalent. I knew that
this is the behavior for char, but I did not expect unsigned char to
do that. I was badly bitten by this revelation today.

Kindly see the source snippet below, where I was able to recreate the
problem. MY APOLOGIES TO comp.lang.c READERS THAT THIS SAMPLE IS C++,
but the issue that I am describing pertains to C as well.

Thanks,
Zahid

////////////////////////
#include <iostream>

using namespace std;

int
main()
{
unsigned char first;
unsigned short second;

unsigned int firstInt, secondInt;

cout << "\nEnter first value: ";
cin >> first;

firstInt = first;

cout << "\nEnter second value: ";
cin >> second;

secondInt = second;

cout << "Your values are " << firstInt << " and " << secondInt <<
endl;

}

////////////////////////

I entered 0 and 0 and the output was as follows:
Your values are 48 and 0

the 'char' key word is used to tell the compiler that we are going to
store character values in it. every character has a complex graphical
look(actually more than that,considering different fonts), but we need
to give a code number to every character ,so that we can store the
character on a digital machine. In C(++) contrary to many other
programming languages,you do not need any special keyword to get the
code associated with the 'char';just because this code is what
actually is stored in memmory.'char' is usually the same size as the
smallest word(integer type) that a machine knows . therefore, it can
be treated as a very small integer and you can mark a char - just like
any other integer type - as 'signed' or 'unsigned' and if you do not
specify either, then compiler defaults to 'signed'.

regards,
FM.
 
R

Rolf Magnus

Zahid said:
All this time I mindlessly thought that unsigned char is just like
other unsigned members of the integer family

It is.
Imagine my surprise when I realized that the way unsigned char is read
from stdin or a file is completely different from other entities in the
integer family!

Well, that's due to an overloaded version of the C++ stream input operator.
MY APOLOGIES TO comp.lang.c READERS THAT THIS SAMPLE IS C++, but the
issue that I am describing pertains to C as well.

Actually, it doesn't.
 
W

Walter Roberson

'char' is usually the same size as the
smallest word(integer type) that a machine knows . therefore, it can
be treated as a very small integer and you can mark a char - just like
any other integer type - as 'signed' or 'unsigned' and if you do not
specify either, then compiler defaults to 'signed'.

Not quite correct: for any particular C compiler, char will be
either signed or unsigned. The C standards do *not* require
compilers to default char to signed. Indeed, in some character
sets, it would be disallowed:

C89 3.1.2.5 Types

An object declared as type char is large enough to store any
member of the basic execution character set. If a member of
the required source character set enumerated in 2.2.1 is stored
in a char object, its value is guaranteed to be positive.
If other quantities are stored in a char object, the behavior
is implementation-defined; the values are treated as either
signed or nonnegative integers.


In 2.2.1, the source character set is defined as:
+ the 26 uppercase letters of the English alphabet
+ the 26 lowercase letters of the English alphabet
+ the 10 decimal digits
+ the following 29 graphic characters:
! " # % & ' ( ) * + , - . / :
; < = > ? [ \ ] ^ { | } ~
+ the space character and control characters representing horizontal
tab, vertical tab, and form feed.

In EBCDIC, the lower case letters start at (decimal) 129 and
the upper case letters from (decimal) 193. Because of the 3.1.2.5
requirement that these source characters will have a positive value,
if the EBCDIC system has CHAR_BIT of 8 (as would be most likely,
since EBCDIC is an 8 bit code), then unmarked char would have
to be unsigned.
 
D

Default User

terminator wrote:

the 'char' key word is used to tell the compiler that we are going to
store character values in it.

Maybe, maybe not. I just created a project that used lots of chars
without storing any character data in them. That's because I was
working with ARINC 615 datawords. These words have fields of 8 bits or
less within them that represent integer values, so it's natural to use
char types when constructing and deconstructing the words.




Brian
 
A

Army1987

All this time I mindlessly thought that unsigned char is just like
other unsigned members of the integer family (integer, short, long,
long long) with a more constrained range of values ---- from 0 to
255. That is what I had read somewhere. Imagine my surprise when I
realized that the way unsigned char is read from stdin or a file is
completely different from other entities in the integer family! I did
not expect that in the case of unsigned char, the value that would be
assigned to a variable would be its ASCII equivalent. I knew that
this is the behavior for char, but I did not expect unsigned char to
do that. I was badly bitten by this revelation today.

Kindly see the source snippet below, where I was able to recreate the
problem. MY APOLOGIES TO comp.lang.c READERS THAT THIS SAMPLE IS C++,
but the issue that I am describing pertains to C as well.
It doesn't. The C++ code works just because << is overloaded, so
the operation it does depends on the type of the right operand,
too.
In C first = getchar() and second = getchar() would do the same
thing (except for the value EOF would get converted to).
 
F

Fred Kleinschmidt

Zahid Faizal said:
All this time I mindlessly thought that unsigned char is just like
other unsigned members of the integer family (integer, short, long,
long long) with a more constrained range of values ---- from 0 to
255. That is what I had read somewhere. Imagine my surprise when I
realized that the way unsigned char is read from stdin or a file is
completely different from other entities in the integer family! I did
not expect that in the case of unsigned char, the value that would be
assigned to a variable would be its ASCII equivalent. I knew that
this is the behavior for char, but I did not expect unsigned char to
do that. I was badly bitten by this revelation today.

Kindly see the source snippet below, where I was able to recreate the
problem. MY APOLOGIES TO comp.lang.c READERS THAT THIS SAMPLE IS C++,
but the issue that I am describing pertains to C as well.

Thanks,
Zahid



////////////////////////
#include <iostream>

using namespace std;

int
main()
{
unsigned char first;
unsigned short second;
unsigned int firstInt, secondInt;
cout << "\nEnter first value: ";
cin >> first;
firstInt = first;

Why would you ever think that the above would interpret the
input as an integer? You told cin that its argument is an
unsigned char, so it reads stdin as a char. Then you convert
it to an int.
What would you expect from this:
unsigned char first ='0';
int firstInt = first;

Surely you would not expect firstInt to have a value of zero,
unless the ASCII code for the character zero was zero
(it is not - it is 48)

In addition, this is NOT relevant to C.
C has no "cin".
In C, you would have used scanf, and the format
specifier would have told scanf how to interpret the input.
If you said %c, would you expect it to read it as
the integer zero? Or would you have expected it to
read it as the character zero?
 
S

santosh

terminator said:
the 'char' key word is used to tell the compiler that we are going to
store character values in it.

Not necessarily. In C a char is simply a small integer and is quite capable
of holding an arbitrary integer value. The value need not be a character
code, though that is the most common case.

In the case of storing an arbitrary integer value, it's better to explicitly
specify the signed'ness of the object, since a plain char can be either
signed or unsigned, depending on the implementation.
'char' is usually the same size as the
smallest word(integer type) that a machine knows . therefore, it can
be treated as a very small integer and you can mark a char - just like
any other integer type - as 'signed' or 'unsigned' and if you do not
specify either, then compiler defaults to 'signed'.

No, it does not default to signed char. A plain char can be either signed or
unsigned depending on the implementation. A char type is distinct from
signed char and unsigned char, though for any particular instance a char
object is always either signed or unsigned.
 
S

santosh

Zahid said:
All this time I mindlessly thought that unsigned char is just like
other unsigned members of the integer family (integer, short, long,
long long)

It is like the other unsigned integer types.
with a more constrained range of values ---- from 0 to
255.

No, it's from 0 to UCHAR_MAX which is defined in limits.h. This is often 255
on PCs, but could be something else for other architectures.
That is what I had read somewhere. Imagine my surprise when I
realized that the way unsigned char is read from stdin or a file is
completely different from other entities in the integer family!

It is not.
I did
not expect that in the case of unsigned char, the value that would be
assigned to a variable would be its ASCII equivalent.

C is independent of ASCII or another character code. When you assign a
character read from stdin or file to an unsigned char object, the
character's code in the execution character set is assigned to it. This
need not be an ASCII value.

However since all three char types are actually just small integers, you can
also store any arbitrary integer value into the corresponding objects.
I knew that
this is the behavior for char, but I did not expect unsigned char to
do that. I was badly bitten by this revelation today.

It's neither the behaviour for char nor unsigned char. It's something to do
with your C++ environment.
Kindly see the source snippet below, where I was able to recreate the
problem. MY APOLOGIES TO comp.lang.c READERS THAT THIS SAMPLE IS C++,
but the issue that I am describing pertains to C as well.

It doesn't. It's exclusive to your C++ code. There's no such problem at all,
as you imagine.

[snip]
 
G

Generic Usenet Account

Well, that's due to an overloaded version of the C++ stream input operator.


Actually, it doesn't.

I must admit that I was sceptical of Rolf's claim that this issue does
not pertain to the stdio library used in C. However, Rolf is exactly
right, as the following code snippet (modified version of OP's code
snippet) shows.

Are there any other hidden pitfalls with using switching from stdio to
stream libraries?

Song

/****************/

#include <stdio.h>

int
main()
{
unsigned char first;
unsigned short second;

unsigned int firstInt, secondInt;

printf("\nEnter first value: ");
scanf("%uc", &first);

firstInt = first;

printf("\nEnter second value: ");
scanf("%uhd", &second);

secondInt = second;

printf("Your values are %d and %d\n", firstInt, secondInt);
}
 
A

Army1987

I must admit that I was sceptical of Rolf's claim that this issue does
not pertain to the stdio library used in C. However, Rolf is exactly
right, as the following code snippet (modified version of OP's code
snippet) shows.

Are there any other hidden pitfalls with using switching from stdio to
stream libraries?

Song

/****************/

#include <stdio.h>

int
main()
{
unsigned char first;
unsigned short second;

unsigned int firstInt, secondInt;

printf("\nEnter first value: ");
scanf("%uc", &first);
The fact is the meaning of %c.
If you used "%hhu" (in C99) it would store a number in decimal,
not the value of a character.
Also, there is no modifier u in standard C.
firstInt = first;

printf("\nEnter second value: ");
scanf("%uhd", &second); You meant "%hu"?

secondInt = second;

printf("Your values are %d and %d\n", firstInt, secondInt);
}
Try this:
#include <stdio.h>
int main(void)
{
unsigned int a = 'A';
unsigned int b = 65;
unsigned char c = 'A';
unsigned char d = 65;
printf("%u %c\n", a, a);
printf("%u %c\n", b, b);
printf("%u %c\n", c, c);
printf("%u %c\n", d, d);
return 0;
}
 
M

Martin Ambuhl

Zahid said:
All this time I mindlessly thought that unsigned char is just like
other unsigned members of the integer family (integer, short, long,
long long) with a more constrained range of values ---- from 0 to
255. That is what I had read somewhere. Imagine my surprise when I
realized that the way unsigned char is read from stdin or a file is
completely different from other entities in the integer family! I did
not expect that in the case of unsigned char, the value that would be
assigned to a variable would be its ASCII equivalent. I knew that
this is the behavior for char, but I did not expect unsigned char to
do that. I was badly bitten by this revelation today.

Kindly see the source snippet below, where I was able to recreate the
problem. MY APOLOGIES TO comp.lang.c READERS THAT THIS SAMPLE IS C++,
but the issue that I am describing pertains to C as well.

Your problem is C++ specific. It is a result of the C++ <iostream>
functions trying to figure out what you mean to do with
cin >> whatever;
This is a price you pay for overloading.

In C you do not have this problem, since reading a char as an integer
value uses specifiers for integer values (%d, %i, %o, %x, %u, with
whatever modifiers are appropriate).

The C++ functions assume that reading a char is equivalent to using the
"%c" specifier, which is incorrect.

Since your problem is entirely with the assumptions C++ forces on you,
and has nothing at all to do with C, it was inappropriate to post to
comp.lang.c. I have removed it from the Follow-ups.

Nor is it all all clear why in the world you should think comp.sources.d
should give a flip. It, too, has been removed from the Follow-ups.
Your crossposting to irrelevant newsgroups is dangerously close to
newgroup abuse.
 
B

BobR

Zahid Faizal said:
All this time I mindlessly thought that unsigned char is just like
other unsigned members of the integer family (integer, short, long,
long long) with a more constrained range of values ---- from 0 to
255. [snip]
////////////////////////
#include <iostream>
using namespace std;

int main(){
unsigned char first;
unsigned short second;
unsigned int firstInt, secondInt;
cout << "\nEnter first value: ";
cin >> first;
firstInt = first;
cout << "\nEnter second value: ";
cin >> second;
secondInt = second;
cout << "Your values are " << firstInt << " and " << secondInt <<
endl;
}
////////////////////////
I entered 0 and 0 and the output was as follows:
Your values are 48 and 0

{ // main() or ? // C++
typedef unsigned char Uchar;
for( std::size_t a(0); a < UCHAR_MAX; ++a ){
if( std::isprint( a ) ){ // <cctype>
std::cout<<"int="<<int(a)
<<" hex="<<std::hex<<int(a)<<std::dec
<<" char="<<Uchar(a);
if( (a >= '0') && (a <= '9') ){
std::cout<<" int cnv="<<int( a & 0xF );
}
std::cout<<std::endl;
} // if()
} // for(a)
}

Look through the output for "48" in the first column.
 
G

Greg Herlihy

Not quite correct: for any particular C compiler, char will be
either signed or unsigned. The C standards do *not* require
compilers to default char to signed. Indeed, in some character
sets, it would be disallowed:

This correction still leaves the impression that the "char" type in C++
denotes either the "unsigned char" or "signed char" type - the actual
selection depending on the implementation. Now, although it is true that the
value of a "char" type may be signed or unsigned (depending on the
implementation) it is also the case that char is not "just like" the other
integer types. Whereas an "int" and a "signed int" do denote the same type,
"char" is never the same type as "signed char" and never the same type as
"unsigned char" - under any implementation.

As §3.9.1/1 from the C++ Standard states:

"Plain char, signed char, and unsigned char are three distinct types."

Greg
 
T

terminator

Not necessarily. In C a char is simply a small integer and is quite capable
of holding an arbitrary integer value. The value need not be a character
code, though that is the most common case.

In the case of storing an arbitrary integer value, it's better to explicitly
specify the signed'ness of the object, since a plain char can be either
signed or unsigned, depending on the implementation.


No, it does not default to signed char. A plain char can be either signed or
unsigned depending on the implementation. A char type is distinct from
signed char and unsigned char, though for any particular instance a char
object is always either signed or unsigned.- Hide quoted text -
It default for all integers and for char too.I think the default is
signed for other integer types ,but I did doubt about char.However,
the compiler should default either signed or unsigned.

thanks every 1 4 clarification,
FM.
 
T

terminator

This correction still leaves the impression that the "char" type in C++
denotes either the "unsigned char" or "signed char" type - the actual
selection depending on the implementation. Now, although it is true that the
value of a "char" type may be signed or unsigned (depending on the
implementation) it is also the case that char is not "just like" the other
integer types. Whereas an "int" and a "signed int" do denote the same type,
"char" is never the same type as "signed char" and never the same type as
"unsigned char" - under any implementation.

As §3.9.1/1 from the C++ Standard states:

"Plain char, signed char, and unsigned char are three distinct types."

Greg

and what is it supposed to mean?how is plain char different from the
other two?

thanks,
FM.
 
B

Bo Persson

terminator wrote:
:: On 8/3/07 9:21 AM, in article
:: [email protected], "Walter
::
::: In article <[email protected]>,
:::: 'char' is usually the same size as the
:::: smallest word(integer type) that a machine knows . therefore, it
:::: can be treated as a very small integer and you can mark a char -
:::: just like any other integer type - as 'signed' or 'unsigned' and
:::: if you do not specify either, then compiler defaults to 'signed'.
::
::: Not quite correct: for any particular C compiler, char will be
::: either signed or unsigned. The C standards do *not* require
::: compilers to default char to signed. Indeed, in some character
::: sets, it would be disallowed:
::
:: This correction still leaves the impression that the "char" type
:: in C++ denotes either the "unsigned char" or "signed char" type -
:: the actual selection depending on the implementation. Now,
:: although it is true that the value of a "char" type may be signed
:: or unsigned (depending on the implementation) it is also the case
:: that char is not "just like" the other integer types. Whereas an
:: "int" and a "signed int" do denote the same type, "char" is never
:: the same type as "signed char" and never the same type as
:: "unsigned char" - under any implementation.
::
:: As §3.9.1/1 from the C++ Standard states:
::
:: "Plain char, signed char, and unsigned char are three distinct
:: types."
::
:: Greg
:
: and what is it supposed to mean?how is plain char different from the
: other two?
:

It's a distinct type. You can overload on it, for example:

void f(char);
void f(unsigned char);
void f(signed char);

declares three different functions. This is not true for the other
integer types, which are always signed, unless you specify unsigned.

On a particular compiler char is either signed or unsigned (or perhaps
selectable with a compiler option). Whichever it is, other
requirements make it behave the same as either signed char or unsigned
char (same value range, same representation), but it is still a
distinct type.


Bo Persson
 
B

Barry

and what is it supposed to mean?how is plain char different from the
other two?

thanks,
FM.

long and int maybe both 32 bits
which does *NOT* mean they are the same type,

I think you can understand the case in this way
 
T

terminator

long and int maybe both 32 bits

maybe differs from certainly are. for long there is no guarantee to be
the same size as int.there is an equal or greater than restriction.And
I am unhappy to see that while there is no standard type for 64bit
ints on x86 family we are using long with the same size as
int(32bits).
char types seem to be of the same size everywhere.

It's a distinct type. You can overload on it, for example:

void f(char);
void f(unsigned char);
void f(signed char);

declares three different functions. This is not true for the other
integer types, which are always signed, unless you specify unsigned.

On a particular compiler char is either signed or unsigned (or perhaps
selectable with a compiler option). Whichever it is, other
requirements make it behave the same as either signed char or unsigned
char (same value range, same representation), but it is still a
distinct type.

So,is it just an attempt to be economical in the number of keywords in
order not to declare a type for bytes?
Do they have same size on different platforms or there is no
standard ?

yours,
FM.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top