unsigned char ---- a special type of integer

B

BobR

terminator said:
It default for all integers and for char too.I think the default is
signed for other integer types ,but I did doubt about char.However,
the compiler should default either signed or unsigned.

thanks every 1 4 clarification,
FM.

// include <iostream>, <limits>
size_t CharMax( std::numeric_limits<char>::max() );
size_t UCharMax( std::numeric_limits<unsigned char>::max() );
std::cout<<"This system is currently using";
if( CharMax == UCharMax ){
std::cout<<" an 'unsigned' type for type 'char'."<<std::endl;
}
else{
std::cout<<" an 'signed' type for type 'char'."<<std::endl;
}
 
B

Bo Persson

terminator wrote:
::: long and int maybe both 32 bits
::
:: maybe differs from certainly are. for long there is no guarantee
:: to be the same size as int.there is an equal or greater than
:: restriction.And I am unhappy to see that while there is no
:: standard type for 64bit ints on x86 family we are using long with
:: the same size as int(32bits).

Type long has always been 32 bits in Windows, even back when int was
only 16 bits. Obviously MS believes that it is not a good time to
change that now, even when going to 64 bits.

They did't ask me for my opinion though. :-(


On other x86 systems, like 64 bit *nix we see 64 bit longs.


:: char types seem to be of the same size everywhere.
::
::: It's a distinct type. You can overload on it, for example:
:::
::: void f(char);
::: void f(unsigned char);
::: void f(signed char);
:::
::: declares three different functions. This is not true for the other
::: integer types, which are always signed, unless you specify
::: unsigned.
:::
::: On a particular compiler char is either signed or unsigned (or
::: perhaps selectable with a compiler option). Whichever it is, other
::: requirements make it behave the same as either signed char or
::: unsigned char (same value range, same representation), but it is
::: still a distinct type.
::
:: So,is it just an attempt to be economical in the number of
:: keywords in order not to declare a type for bytes?
:: Do they have same size on different platforms or there is no
:: standard ?

Not everyone agree that a byte is an octet, even though very popular
hardware has had it that way for a long time.

Some machines are not byte addressed, but word addressed. I have seen
character sizes not only 8 bits, but also 9, 16, and 32 bits. There
are possibly others as well.

Here's a series of mainframes compatible with what has been around for
40+ years, with 9 bit bytes and 36 bit ints:

http://www.unisys.com/products/mainframes/os__2200__mainframes/model__specifications.htm


When designing a language, it is important not to make the spec so
narrow that it just by accident makes it impossible to implement the
language on some machines.


Bo Persson
 
B

BobR

terminator said:
maybe differs from certainly are. for long there is no guarantee to be
the same size as int.there is an equal or greater than restriction.And
I am unhappy to see that while there is no standard type for 64bit
ints on x86 family we are using long with the same size as
int(32bits).
char types seem to be of the same size everywhere.

Define 'size'. If you use 'sizeof', it might return '1' for all three char
types, BUT, look at 'CHAR_BIT' to find out what '1' means.

std::cout<<CHAR_BIT<<std::endl;
// CHAR_BIT==8 on my sys.

For a 'signed char', the MSB stores the sign, the other (7 bits, on my sys)
store the data.
So,is it just an attempt to be economical in the number of keywords in
order not to declare a type for bytes?

No, it's a way for C++ to work on different systems (where a 'char' ('byte')
might be 12 bits, or 8 bits, or ?).
Do they have same size on different platforms or there is no
standard ?

The standard defines the minimum (bits), it could be more.

In another thread (where I was pokeing a little fun), Richard Heathfield
answered:

"In both C and, I am given to understand, C++, there are
CHAR_BIT bits in a byte, where CHAR_BIT is at least 8.
A byte is exactly big enough to store one char."
 
J

James Kanze

I must admit that I was sceptical of Rolf's claim that this
issue does not pertain to the stdio library used in C.
However, Rolf is exactly right, as the following code snippet
(modified version of OP's code snippet) shows.

Rolf is right, but your example doesn't show it.
Are there any other hidden pitfalls with using switching from
stdio to stream libraries?

#include <stdio.h>
int
main()
{
unsigned char first;
unsigned short second;
unsigned int firstInt, secondInt;
printf("\nEnter first value: ");
scanf("%uc", &first);

This line has undefined behavior, which means that your program
can'd show us anything. You tell the library to read an
unsigned int, followed by the character 'c', and you give it the
address of an unsigned char in which to store it.
firstInt = first;
printf("\nEnter second value: ");
scanf("%uhd", &second);

Same problem as above (except that you give the library the
address of an unsigned short).
 
B

Barry

Bo said:
terminator wrote:
::: long and int maybe both 32 bits
::
:: maybe differs from certainly are. for long there is no guarantee
:: to be the same size as int.there is an equal or greater than
:: restriction.And I am unhappy to see that while there is no
:: standard type for 64bit ints on x86 family we are using long with
:: the same size as int(32bits).

Type long has always been 32 bits in Windows, even back when int was
only 16 bits. Obviously MS believes that it is not a good time to
change that now, even when going to 64 bits.

They did't ask me for my opinion though. :-(


On other x86 systems, like 64 bit *nix we see 64 bit longs.


:: char types seem to be of the same size everywhere.
::
::: It's a distinct type. You can overload on it, for example:
:::
::: void f(char);
::: void f(unsigned char);
::: void f(signed char);
:::
::: declares three different functions. This is not true for the other
::: integer types, which are always signed, unless you specify
::: unsigned.
:::
::: On a particular compiler char is either signed or unsigned (or
::: perhaps selectable with a compiler option). Whichever it is, other
::: requirements make it behave the same as either signed char or
::: unsigned char (same value range, same representation), but it is
::: still a distinct type.
::
:: So,is it just an attempt to be economical in the number of
:: keywords in order not to declare a type for bytes?
:: Do they have same size on different platforms or there is no
:: standard ?

Not everyone agree that a byte is an octet, even though very popular
hardware has had it that way for a long time.

Some machines are not byte addressed, but word addressed. I have seen
character sizes not only 8 bits, but also 9, 16, and 32 bits. There
are possibly others as well.

Here's a series of mainframes compatible with what has been around for
40+ years, with 9 bit bytes and 36 bit ints:

http://www.unisys.com/products/mainframes/os__2200__mainframes/model__specifications.htm


When designing a language, it is important not to make the spec so
narrow that it just by accident makes it impossible to implement the
language on some machines.

what I was trying to say is that

1. *int* and *long* may have the same *functionality* on some platform,
which does not mean they share the same type information

2. *unsigned char* and *char* may have the same ......

3. different types may have the same *functionality* on some platform

the similarity between the *xchar* and *int long* is they may overlap

Can I express the problem in this way? :)
 
S

santosh

Generic said:
I must admit that I was sceptical of Rolf's claim that this issue does
not pertain to the stdio library used in C. However, Rolf is exactly
right, as the following code snippet (modified version of OP's code
snippet) shows.

Are there any other hidden pitfalls with using switching from stdio to
stream libraries?

Song

/****************/

#include <stdio.h>

int
main()
{
unsigned char first;
unsigned short second;

unsigned int firstInt, secondInt;

printf("\nEnter first value: ");

End output with a newline to ensure that buffers are flushed.
scanf("%uc", &first);

ITYM %hhu.
firstInt = first;

printf("\nEnter second value: ");
scanf("%uhd", &second);

Again %hu
secondInt = second;

printf("Your values are %d and %d\n", firstInt, secondInt);

return 0;

And what does your program prove?
 
T

terminator

what I was trying to say is that

1. *int* and *long* may have the same *functionality* on some platform,
which does not mean they share the same type information

2. *unsigned char* and *char* may have the same ......

3. different types may have the same *functionality* on some platform

the similarity between the *xchar* and *int long* is they may overlap

Can I express the problem in this way? :)- Hide quoted text -

I do not have any problem understanding that point.I am asking about
the differnces of char types .I want to know if they are always of the
same size(number of bits if you mind)? and I am asking about the logic
of this separation.For long/int/short everything is clear(though the
number of supported words has been increasing from one(byte) to more
than three on modern machines).

regards,
FM.
 
B

Bo Persson

terminator wrote:
:::
::: what I was trying to say is that
:::
::: 1. *int* and *long* may have the same *functionality* on some
::: platform, which does not mean they share the same type information
:::
::: 2. *unsigned char* and *char* may have the same ......
:::
::: 3. different types may have the same *functionality* on some
::: platform
:::
::: the similarity between the *xchar* and *int long* is they may
::: overlap
:::
::: Can I express the problem in this way? :)- Hide quoted text -
:::
::
:: I do not have any problem understanding that point.I am asking
:: about the differnces of char types .I want to know if they are
:: always of the same size(number of bits if you mind)? and I am
:: asking about the logic of this separation.For long/int/short
:: everything is clear(though the number of supported words has been
:: increasing from one(byte) to more than three on modern machines).
::

The three character types all use the same number of bits on a
specific machine. They might use a different number of bits on another
machine, but all three use the same number. And the use ALL bits in
the 'byte' they define.

As we don't know if char is signed or not, we could say that it is
either the same as signed or unsigned char, but it varies from
compiler to compiler. Or we could say that it is neither - it is of
its own distinct type. The language designers obviously preferred
predictable overloading, at the expense of having three same size
character types.


Bo Persson
 
T

terminator

terminator wrote:

:::
::: what I was trying to say is that
:::
::: 1. *int* and *long* may have the same *functionality* on some
::: platform, which does not mean they share the same type information
:::
::: 2. *unsigned char* and *char* may have the same ......
:::
::: 3. different types may have the same *functionality* on some
::: platform
:::
::: the similarity between the *xchar* and *int long* is they may
::: overlap
:::
::: Can I express the problem in this way? :)- Hide quoted text -
:::
::
:: I do not have any problem understanding that point.I am asking
:: about the differnces of char types .I want to know if they are
:: always of the same size(number of bits if you mind)? and I am
:: asking about the logic of this separation.For long/int/short
:: everything is clear(though the number of supported words has been
:: increasing from one(byte) to more than three on modern machines).
::

The three character types all use the same number of bits on a
specific machine. They might use a different number of bits on another
machine, but all three use the same number. And the use ALL bits in
the 'byte' they define.

As we don't know if char is signed or not, we could say that it is
either the same as signed or unsigned char, but it varies from
compiler to compiler.

finally two or three or disagreement on a spesific standard?
or you mean implicit casting to either of types?
Or we could say that it is neither - it is of
its own distinct type. The language designers obviously preferred
predictable overloading, at the expense of having three same size
character types.

this is reserving(stinginess in definning) keywords.

thanks for the exact reply,
FM.
 
B

Bo Persson

terminator wrote:
::: terminator wrote:
:::
:::::
::::: I do not have any problem understanding that point.I am asking
::::: about the differnces of char types .I want to know if they are
::::: always of the same size(number of bits if you mind)? and I am
::::: asking about the logic of this separation.For long/int/short
::::: everything is clear(though the number of supported words has
::::: been increasing from one(byte) to more than three on modern
::::: machines).
:::::
:::
::: The three character types all use the same number of bits on a
::: specific machine. They might use a different number of bits on
::: another machine, but all three use the same number. And the use
::: ALL bits in the 'byte' they define.
:::
::: As we don't know if char is signed or not, we could say that it is
::: either the same as signed or unsigned char, but it varies from
::: compiler to compiler.
::
:: finally two or three or disagreement on a spesific standard?
:: or you mean implicit casting to either of types?

Any of this might have been possible, but is now rather hypothetical,
as it wasn't chosen.

::
::: Or we could say that it is neither - it is of
::: its own distinct type. The language designers obviously preferred
::: predictable overloading, at the expense of having three same size
::: character types.
::
:: this is reserving(stinginess in definning) keywords.

New keywords are troublesome, as it might break existing code. For
example, doing a google code search for 'byte' and C++ gives me more
than 500.000 hits. Making 'byte' a keyword would surely upset a lot of
the programmers supporting that code. Perhaps it is easier to live
with char meaning byte, sometimes?


Bo Persson
 
J

James Kanze

terminator wrote:

[...]
The three character types all use the same number of bits on a
specific machine. They might use a different number of bits on another
machine, but all three use the same number. And the use ALL bits in
the 'byte' they define.
As we don't know if char is signed or not, we could say that it is
either the same as signed or unsigned char, but it varies from
compiler to compiler. Or we could say that it is neither - it is of
its own distinct type. The language designers obviously preferred
predictable overloading, at the expense of having three same size
character types.

Overloading certainly didn't come into consideration, since the
rule for the three types goes back to C. I think it's more a
question of history: there were originally only two character
types: char and unsigned char. Just like for int and unsigned
int, short and unsigned short, etc. Except that unlike int or
short, char could be unsigned. Two different types, in each
case. At some point (ANSI normalization?), the need for a
guaranteed signed char was felt, and the keyword signed was
added. I don't know the exact motivation for the current
situation, but I imagine that it is based around the idea that
if two "types" are guaranteed to be identical (e.g. int and
signed int), they are the same type; if they can be different on
any legal implementation (e.g. int and short, or signed char and
char), then they are different types on all implementations.
The number of distinct integral types does not depend on the
implementation. Of course, now that C, and soon C++, has added
the extended integral types, this is no longer true. But it
sounds like the most logical and rigorous solution to me. And
of course, it happens to work out particularly well in C++,
where the type distinctions are more visible because of
overloading and templates. (Note that C++ made wchar_t a true
type, from the beginning, whereas it is just a typedef in C.)
 
T

terminator

terminator wrote:
[...]

The three character types all use the same number of bits on a
specific machine. They might use a different number of bits on another
machine, but all three use the same number. And the use ALL bits in
the 'byte' they define.
As we don't know if char is signed or not, we could say that it is
either the same as signed or unsigned char, but it varies from
compiler to compiler. Or we could say that it is neither - it is of
its own distinct type. The language designers obviously preferred
predictable overloading, at the expense of having three same size
character types.

Overloading certainly didn't come into consideration, since the
rule for the three types goes back to C. I think it's more a
question of history: there were originally only two character
types: char and unsigned char. Just like for int and unsigned
int, short and unsigned short, etc. Except that unlike int or
short, char could be unsigned. Two different types, in each
case. At some point (ANSI normalization?), the need for a
guaranteed signed char was felt, and the keyword signed was
added. I don't know the exact motivation for the current
situation, but I imagine that it is based around the idea that
if two "types" are guaranteed to be identical (e.g. int and
signed int), they are the same type; if they can be different on
any legal implementation (e.g. int and short, or signed char and
char), then they are different types on all implementations.
The number of distinct integral types does not depend on the
implementation. Of course, now that C, and soon C++, has added
the extended integral types, this is no longer true. But it
sounds like the most logical and rigorous solution to me. And
of course, it happens to work out particularly well in C++,
where the type distinctions are more visible because of
overloading and templates. (Note that C++ made wchar_t a true
type, from the beginning, whereas it is just a typedef in C.)

--
James Kanze (GABI Software) email:[email protected]
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

more confusion.I used to beleive in duality. some body spoke about
trinity and again you are talking about the duality.
Finally two or tthree?(please)
are there two distinct types or three ?
are the (un)signed char types used as somewhat bytes and the strip one
used normally as a character?

yours,
FM.
 
J

joe

more confusion.I used to beleive in duality. some body spoke about
trinity and again you are talking about the duality.
Finally two or tthree?(please)
are there two distinct types or three ?
are the (un)signed char types used as somewhat bytes and the strip one
used normally as a character?

James was giving a history lesson, so don't let it confuse you. There
are three (3) distinct types the size of a char. They are 'signed
char', 'unsigned char', and 'char'. Each of the three can be used in
function overloading. That is,
you can have:

int f(char c);
int f(unsigned c);
and
int f(signed c);


What gets tricky is that this is not true for the other types which
have signed variants. That is, 'signed int' is the same as 'int' as
far as overload resolution goes. So, char is a special case for the
type system. James' messages gives the why of this.

In general, if you want a tiny number which consumes a small amount of
space, you can use signed char and unsigned char. If you want
something to hold a character, you should use char (which should be
the appropriate type to manipulate characters).

HTH,
joe
 
J

James Kanze

terminator wrote:
[...]
The three character types all use the same number of bits on a
specific machine. They might use a different number of bits on another
machine, but all three use the same number. And the use ALL bits in
the 'byte' they define.
As we don't know if char is signed or not, we could say that it is
either the same as signed or unsigned char, but it varies from
compiler to compiler. Or we could say that it is neither - it is of
its own distinct type. The language designers obviously preferred
predictable overloading, at the expense of having three same size
character types.
Overloading certainly didn't come into consideration, since the
rule for the three types goes back to C. I think it's more a
question of history: there were originally only two character
types: char and unsigned char. Just like for int and unsigned
int, short and unsigned short, etc. Except that unlike int or
short, char could be unsigned. Two different types, in each
case. At some point (ANSI normalization?), the need for a
guaranteed signed char was felt, and the keyword signed was
added. I don't know the exact motivation for the current
situation, but I imagine that it is based around the idea that
if two "types" are guaranteed to be identical (e.g. int and
signed int), they are the same type; if they can be different on
any legal implementation (e.g. int and short, or signed char and
char), then they are different types on all implementations.
The number of distinct integral types does not depend on the
implementation. Of course, now that C, and soon C++, has added
the extended integral types, this is no longer true. But it
sounds like the most logical and rigorous solution to me. And
of course, it happens to work out particularly well in C++,
where the type distinctions are more visible because of
overloading and templates. (Note that C++ made wchar_t a true
type, from the beginning, whereas it is just a typedef in C.)
more confusion.I used to beleive in duality. some body spoke about
trinity and again you are talking about the duality.
Finally two or tthree?(please)
are there two distinct types or three ?
are the (un)signed char types used as somewhat bytes and the strip one
used normally as a character?

There are exactly 11 standard integral types in C++98: char,
signed char, unsigned char, signed short, unsigned short, signed
int, unsigned int, signed long, unsigned long, bool and wchar_t.
Amongst those types: the signed and unsigned variants are
guaranteed to have the same size, char is guaranteed to have the
same size as signed or unsigned char, and the same
representation as one of them (but it's implementation defined
which), and wchar_t is guaranteed to have the same size and
representation as one of the other integral types (but it's
implementation defined which---although bool seems pretty
unlikely).

Historically, as I said: the keyword signed wasn't present in
the beginning, and there were only eight integral types: char,
short, int and long, and their unsigned variants. But even
then, char was special, since plain char could be either signed
or unsigned (but was a distinct type from unsigned char, even if
it was unsigned), whereas the plain short, int or long had to
use a signed representation.

Not required by the standard, but the usual convention I've seen
is to use plain char for character data, signed or unsigned char
for very small integers, and unsigned char for raw memory
(bytes). (And of course, the type of a string literal is char
const[], and the type of a character constant is char.)
 
J

JohnQ

"Not required by the standard, but the usual convention I've seen
is to use plain char for character data, signed or unsigned char
for very small integers, and unsigned char for raw memory
(bytes). (And of course, the type of a string literal is char
const[], and the type of a character constant is char.)"

// character data
//
// typedef char char; (just for illustration)

// raw memory
//
typedef unsigned char byte;

// small integers (not pretty)
//
typedef signed char int8; // use int8 below instead
typedef unsigned char uint8; // use uint8 below. clashes with def of
'byte'.

// preferred small integer typedefs
//
typedef __int8 int8;
typedef unsigned __int8 uint8;


Comments?

John
 
J

James Kanze

On Aug 8, 12:33 pm, "JohnQ" <[email protected]>
wrote:

[...]
// character data
//
// typedef char char; (just for illustration)
// raw memory
//
typedef unsigned char byte;

Depending on context, I'd generally prefer byte_t or Byte.
// small integers (not pretty)
//
typedef signed char int8; // use int8 below instead
typedef unsigned char uint8; // use uint8 below. clashes with def of
'byte'.

The 8 could be a lie. If I need exactly 8, I just use int8_t or
uint8_t. (Standard C, but also in the current C++ draft. Any
reasonably modern compiler should support them. Don't forget to
// preferred small integer typedefs
//
typedef __int8 int8;
typedef unsigned __int8 uint8;
Comments?

Presumably, __int8 is something specificly defined by your
compiler. int8_t/uint8_t is a lot more portable.

Note that int8_t is only available if the hardware actually
supports 8 bit 2's complement integers. Why sufficiently
portable for many applications, if all you want is the smallest
possible integral type, with maximum portability, then:
typedef signed char smallInt ;
is better. (Don't put 8 in the name unless you're guaranteeing
8 bits.)
 
T

terminator

James was giving a history lesson, so don't let it confuse you. There
are three (3) distinct types the size of a char. They are 'signed
char', 'unsigned char', and 'char'. Each of the three can be used in
function overloading. That is,
you can have:

int f(char c);
int f(unsigned c);
and
int f(signed c);

What gets tricky is that this is not true for the other types which
have signed variants. That is, 'signed int' is the same as 'int' as
far as overload resolution goes. So, char is a special case for the
type system. James' messages gives the why of this.

In general, if you want a tiny number which consumes a small amount of
space, you can use signed char and unsigned char. If you want
something to hold a character, you should use char (which should be
the appropriate type to manipulate characters).

HTH,
joe

so its three .very nice.

thanks a lot,
FM.
 
R

Rolf Magnus

What are "the extended integral types"?

I'm wondering, is it visible in C at all? I wasn't sure if char even is a
distinct type in C, since I can't think of a situation where it matters.
which), and wchar_t is guaranteed to have the same size and
representation as one of the other integral types (but it's
implementation defined which---although bool seems pretty
unlikely).

lol... yes, bool wouldn't be my first choice ;-)
(And of course, the type of a string literal is char const[], and the type
of a character constant is char.)

For the latter, I wouldn't say "of course", considering that in C, the type
of a character constant is int.
 
P

Pete Becker

What are "the extended integral types"?

"Extended integer types" are optional implementation-defined signed and
unsigned integral types with rules that fit them into the integer
conversion hierarchy. See [basic.fundamental]/2,3 and [conv.rank]/1 in
the C++0x draft (most recent is N2369).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top