Internal representation of char == unsigned small int?

S

Sathyaish

When you say char in C, it internally means "an unsigned small integer
with 1-byte memory", right? More importantly, the internal
representation of char does not mean "int" as in
"machine-dependant/register-size dependant integer, which is normally
four-byte on 32-bit processors", right?
 
L

Lew Pitcher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
When you say char in C, it internally means "an unsigned small integer
with 1-byte memory", right? More importantly, the internal
representation of char does not mean "int" as in
"machine-dependant/register-size dependant integer, which is normally
four-byte on 32-bit processors", right?

Nope.

When you say char in C, you mean an object large enough to store any member of
the basic execution character set. You mean an object that is guaranteed to be
able to represent a range of unsigned values between 0 and 65535, and/or a
range of signed values between -128 and 127.

/How/ the compiler implements this object is up to the compiler. So long as it
meets the minimum requirements of a char, then any storage size is legal.

FWIW, by definition, a char takes 1 byte. However, that 1 byte /can/ be 8 or 9
or 32 or 64 or 128 or even 5000 bits wide, as required by the compiler.
/And/ an int object can have the same size as a char object (or, to put it
another way, a char object can have the same size as an int object).


- --
Lew Pitcher

Master Codewright & JOAT-in-training | GPG public key available on request
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFCv2gnagVFX4UWr64RAj35AJsEtCnI/c5kmX0+1dh7IkRySoMO1gCfe99o
p76e4oB8BjnsCvd+FxGOL84=
=RrKr
-----END PGP SIGNATURE-----
 
L

Lew Pitcher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lew said:
Nope.

When you say char in C, you mean an object large enough to store any member of
the basic execution character set. You mean an object that is guaranteed to be
able to represent a range of unsigned values between 0 and 65535, and/or a
Gakkk. When will I learn to proofread?? I meant unsigned values between 0 and 255.

range of signed values between -128 and 127.

/How/ the compiler implements this object is up to the compiler. So long as it
meets the minimum requirements of a char, then any storage size is legal.

FWIW, by definition, a char takes 1 byte. However, that 1 byte /can/ be 8 or 9
or 32 or 64 or 128 or even 5000 bits wide, as required by the compiler.
/And/ an int object can have the same size as a char object (or, to put it
another way, a char object can have the same size as an int object).


--
Lew Pitcher

Master Codewright & JOAT-in-training | GPG public key available on request
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.

- --
Lew Pitcher

Master Codewright & JOAT-in-training | GPG public key available on request
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFCv2iUagVFX4UWr64RAlNmAJ42HxK3pd0VgoxYrBcI9byxHvmvFgCffHM2
cQWmU/7JGCR4n+0zokWntwk=
=ACoz
-----END PGP SIGNATURE-----
 
K

Keith Thompson

Sathyaish said:
When you say char in C, it internally means "an unsigned small integer
with 1-byte memory", right? More importantly, the internal
representation of char does not mean "int" as in
"machine-dependant/register-size dependant integer, which is normally
four-byte on 32-bit processors", right?

C has three distinct one-byte types: char, signed char, and unsigned
char. "Plain" char has the same representation as either signed char
or unsigned char. The minimum ranges are -127..+127 for signed char,
0..255 for unsigned char.
 
M

Malcolm

Sathyaish said:
When you say char in C, it internally means "an unsigned small integer
with 1-byte memory", right? More importantly, the internal
representation of char does not mean "int" as in
"machine-dependant/register-size dependant integer, which is normally
four-byte on 32-bit processors", right?
In English, we use glyphs to repesent characters. So capital A is a upwards
pointing triangle with a raised lower edge, capital B is a straight line
with two semi circles, and so on.

This is a good system for pencil and paper, but trying to store such shapes
directly on computer would be very wasteful. So instead we use a code - 10
means A, 11 means B, 12 means C, and so on.
Usually this code will be ascii, and usually characters will occupy 8 bits.
However you normally don't have to worry about this. C abstracts the
representation, and handles it for you. If you want an A, you just type char
ch = 'A';

Unfortunately the designers of C made a mistake. On their machine, bytes,
the smallest addressable unit of memory, happend to be 8 bits, which was
also perfect for the ascii code. So they decided to use the same word for a
character and a byte, "char". This causes huge problems when we try to go to
non-Latin languages, but we have to live with it.

The result is that you will often see "unsigned char" or more occasionally
"signed char" used as a small integer. You are not guaranteed 8 bits, though
this is by far the most common value. The macro CHAR_BIT gives you the
number of bits in a char.
 
K

Keith Thompson

Malcolm said:
The result is that you will often see "unsigned char" or more occasionally
"signed char" used as a small integer. You are not guaranteed 8 bits, though
this is by far the most common value. The macro CHAR_BIT gives you the
number of bits in a char.

But you are guarantee *at least* 8 bits.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top