Size of char on a 64 bit machine

A

aruna

How is a character stored in a word aligned machine? Assuming on 64bit
machine, 1 byte is reserved for a char, is it the case that only 1
byte is used to store the character and the rest 7 bytes are wasted,
or my assumption is wrong? If my assumption is right, what are the
performance issues in retrieving value of a character variable over
other data types like integer, double or float.
 
E

Eric Sosman

aruna said:
How is a character stored in a word aligned machine? Assuming on 64bit
machine, 1 byte is reserved for a char, is it the case that only 1
byte is used to store the character and the rest 7 bytes are wasted,
or my assumption is wrong? If my assumption is right, what are the
performance issues in retrieving value of a character variable over
other data types like integer, double or float.

These questions cannot be answered by the C language
itself, but only by the particular implementations of it.
Different implementations will do things differently, and
their different choices will lead to different answers.
 
T

Thomas Stegen

aruna said:
How is a character stored in a word aligned machine?

Depends on the machine.
Assuming on 64bit
machine, 1 byte is reserved for a char, is it the case that only 1
byte is used to store the character and the rest 7 bytes are wasted,
or my assumption is wrong?

Your assumption is probably wrong, but it depends on the machine.
If my assumption is right, what are the
performance issues in retrieving value of a character variable over
other data types like integer, double or float.

That depends on the machine.

Essentially, usually, but not necessarily: Characters in an array are
usually stored sequentially without any padding in between. There can
be no padding in unsigned char.

These are not questions about C, but about C implementations.
 
S

Stephen Sprunk

aruna said:
How is a character stored in a word aligned machine?

You mean a machine where all variables, regardless of size, must be stored
with 8-byte alignment? I'm not aware of such a machine, but if one exists
your assumption would be correct.

The usual case is that variables must be aligned to their own size, e.g. a
char requires only 1-byte alignment, but a 32-bit int requires 4-byte
alignment. Depending on how your variables end up laid out in memory, it's
possible several may end up in the same 64-bit word.
Assuming on 64bit
machine, 1 byte is reserved for a char, is it the case that only 1
byte is used to store the character and the rest 7 bytes are wasted,
or my assumption is wrong? If my assumption is right, what are the
performance issues in retrieving value of a character variable over
other data types like integer, double or float.

Modern processors have the same latency for all loads up to the size of a
single cache line. However, there is a large performance benefit in having
multiple variables in each cache line because it reduces cache misses.

S
 
B

Ben Pfaff

Stephen Sprunk said:
aruna said:
How is a character stored in a word aligned machine?

You mean a machine where all variables, regardless of size, must be stored
with 8-byte alignment? I'm not aware of such a machine [...]

This description seems consistent with descriptions I've seen of
the Cray's architecture, but I've never used a Cray and don't
know any of the details.
 
K

Keith Thompson

Thomas Stegen said:
Essentially, usually, but not necessarily: Characters in an array are
usually stored sequentially without any padding in between. There can
be no padding in unsigned char.

Actually, that's required by the standard. There can be padding
between members of a structure, but not between elements of an array.
Given an array object arr, the number of elements can be computed by
sizeof arr / sizeof arr[0]
Padding would break that.
 
K

Keith Thompson

How is a character stored in a word aligned machine? Assuming on 64bit
machine, 1 byte is reserved for a char, is it the case that only 1
byte is used to store the character and the rest 7 bytes are wasted,
or my assumption is wrong? If my assumption is right, what are the
performance issues in retrieving value of a character variable over
other data types like integer, double or float.

The C standard doesn't say much about alignment or padding, beyond
allowing it to exist.

The required alignment for a given type can be no greater than the
size of the type. If an implementation supports 8-bit chars, it must
be able to represent an array of char by storing one char in each
byte. On the other hand, the implementation can add padding after a
standalone object or struct member.
 
P

Peter Nilsson

How is a character stored in a word aligned machine?

Within a single byte.
Assuming on 64bit machine, 1 byte is reserved for a char,

No need for the assumption. The C language defines what a byte is
(within the context of the language) and all character types are 1
byte in size on a conforming implementation. [CHAR_BIT (the number of
bits within a byte) can vary depending on implementation.]
is it the case that only 1 byte is used to store the character and the rest
7 bytes are wasted,

From the programmer perspective, the size of an object is the 'sizeof'
an object. An array of N elements of objects of size T will be N*T
bytes. Structures can have padding bytes, so a character member
followed by an int may well 'waste' 7 padding bytes for the purposes
of alignment, but this is nothing new and padding is not limited to
64-bit machines.
or my assumption is wrong? If my assumption is right, what are the
performance issues in retrieving value of a character variable over
other data types like integer, double or float.

Depends on the implementation. I believe the old Crays (at least) used
64-bit words and had no direct octet addressing. I also believe the
implementors of C compilers mimiced 8-bit bytes by storing the 0..7
octet offset of a word address in the high (unused) 3 bits of address
pointers.

Naturally, this would come at an efficiency cost for character
manipulation, but the alternative of trying to create a hosted
implementation (and subsequent programs) where UCHAR_MAX > INT_MAX
wasn't desirable. [Although I think it was due largely to memory
issues in days of yore. Today, there are certainly 32-bit
implementations where characters are 32-bit.]

All that said, the internal specifics where effectively hidden from
the programmers by the implementors.

If you want more detail on implementation specifics, this is not the
right forum as clc deals with the _virtual_ C machine.
 
K

Keith Thompson

Ben Pfaff said:
Stephen Sprunk said:
aruna said:
How is a character stored in a word aligned machine?

You mean a machine where all variables, regardless of size, must be stored
with 8-byte alignment? I'm not aware of such a machine [...]

This description seems consistent with descriptions I've seen of
the Cray's architecture, but I've never used a Cray and don't
know any of the details.

On a Cray vector machine, like the SV1, there are no machine-level
instructions to access quantities smaller than 64 bits, but the C
compiler uses CHAR_BIT==8 for compatibility with other systems. It
makes accessing character data less efficient, but that's not really
what the machine is for.

Based on the results of a couple of small test programs, standalone
variables of type char are stored on word boundaries, but struct
members and array elements of type char are packed into 8-bit bytes.
I'm not sure why standalone variables are word-aligned. As far as I
know, accessing an 8-bit quantity on a word boundary is no cheaper
than accessing an 8-bit quantity in the middle of a word. I was
thinking for some operations (such as when the value is promoted to
int) it can just grab the entire word, but it's a big-endian machine,
so that wouldn't work (storing 0xff in a char object and then
accessing the word containing it yields 0xff00000000000000).

In any case, the semantics of arrays are such that an implementation
cannot *require* an alignment boundary large than the size of a type
(there can be no gaps between array elements), but it can use a larger
alignment if it's convenient (or even if the compiler writer was in an
odd mood that day).
 
D

Dik T. Winter

>
> You mean a machine where all variables, regardless of size, must be stored
> with 8-byte alignment? I'm not aware of such a machine, but if one exists
> your assumption would be correct.

Cray 1 to YMP.
>
> Modern processors have the same latency for all loads up to the size of a
> single cache line. However, there is a large performance benefit in having
> multiple variables in each cache line because it reduces cache misses.

As those Cray's do not have caches, this is irrelevant for them.

As a variable a char is stored in a (64 bit) word, in an array 8 chars
are packed in a word. There are some performance issues for arrays, but
not as much as you would think. For variables the load time is
irrespective of the type (except that loading/storing a 128-bit double
takes one cycle more, but all operations on those things are in software).
 
D

Dik T. Winter

> Depends on the implementation. I believe the old Crays (at least) used
> 64-bit words and had no direct octet addressing. I also believe the
> implementors of C compilers mimiced 8-bit bytes by storing the 0..7
> octet offset of a word address in the high (unused) 3 bits of address
> pointers.

Actually in the upper 16 bits (there was a reason for that...).
> Naturally, this would come at an efficiency cost for character
> manipulation, but the alternative of trying to create a hosted
> implementation (and subsequent programs) where UCHAR_MAX > INT_MAX
> wasn't desirable. [Although I think it was due largely to memory
> issues in days of yore. Today, there are certainly 32-bit
> implementations where characters are 32-bit.]

I think not. Cray had quite some experience with handling characters
on those machines. The compilers (for instance) where *extremely*
fast. The last time I have seen a Fortran routine of 1200 lines
(nearly no comments) compiled with full optimisation in a few
milliseconds.
> All that said, the internal specifics where effectively hidden from
> the programmers by the implementors.

As long as you followed the standard. Casting a char pointer to an
int pointer and back would in many cases change the pointer. Assuming
that the low order bit of a long* would be 0 would result in problems
(seen in the Bourne shell and derivatives). All kinds of behaviour
that is undefined according to the standard would indeed give different
behaviour on that machine.
 
D

Dan Pop

In said:
How is a character stored in a word aligned machine? Assuming on 64bit
machine, 1 byte is reserved for a char, is it the case that only 1
byte is used to store the character and the rest 7 bytes are wasted,
or my assumption is wrong?

It is wrong, due to the special properties of the type unsigned char: it
can be used to examine the representation of any other type. Therefore,
this type cannot, by definition, have "wasted" bits (they are called
padding bits in the C99 standard).

So, possible sizes of char on a 64-bit machine are: 8, 16, 32 and 64-bit.
If the size is less than 64-bit, sizeof word > 1 and multiple chars
can be stored in a word (the word can be aliased with an array of char).

There is only one known architecture with 64-bit word addressing (no
octet-based addressing) where C was implemented: the Cray vector
processor used in the old Cray supercomputers. char is an 8-bit type
on that particular platform.
If my assumption is right, what are the
performance issues in retrieving value of a character variable over
other data types like integer, double or float.

Because the machine uses word addressing, char pointers need to store more
data than all other pointers (the address or position of the byte inside
the word). There are two ways of storing this additional information:
in the low bits, which optimises char pointer arithmetic, but requires
additional operations when the pointer is dereferenced, or in the upper
bits, which simplifies pointer dereferencing (the higher bits are
ignored, as the address space is only 48-bit) but complicates char
pointer arithmetic. I believe both ways have been uses in different
implementations. Either way, after retrieving the word containing the
char, the char itself has to be extracted from the word, and this takes
some additional shifting and masking, so char access is slower. Not
much of a problem in practice, as these machines were not intended for
intensive character manipulations, but as number crunchers.

The other, more common, 64-bit architectures use octet-based addressing
and things are no different from the more common 32-bit architectures.

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top