Abstraction layer between C and CPU

Luke Wu · Jan 21, 2005

Hello,

From spending some time in clc, I've come to realize that C's model of

the CPU can be totally different from the atual CPU.

Is it safe to say that almost nothing can be gleaned about physical CPU
behaviour from C level behaviour.

For example:

- do the addresses returned by & have to have (by Standard) any direct
relationship to real addresses?

-- if the address of 2 objects is different by 'x' in C (when using &
operator), are they so in the hardware?

- do the elements of an array usually end up being placed side by side
in most implementations (does the standard require this) ?
how about multidimensional arrays?

-- I've seen code where two or more arrays would be declared side by
side and then the first array with extended indexing would be used to
access the elements of the second/more array. Does this suggest that C
guarantees that declared variables of the same storage class are placed
in ascending order in memory?

int a[10], b[10], c[20];
int i;

for(i = 0; i < 40; i++)
{
a = 0; /* zero initializes all three arrays */
}

- I read in an article once (can't find it now) that a "byte" in C
doesn't necessarily have to be an octet of bits at the hardware level

Any help would be appreciated.
(Are there any links on the net that point to details of C's
abstraction layer? I can't seem to find any. I guess these details
are woven through the standards documents, but I'm talking about a
single cohesive document)

Peter Nilsson · Jan 21, 2005

Luke said:
From spending some time in clc, I've come to realize that C's
model of the CPU can be totally different from the atual CPU.

C doesn't model a CPU, it models an abstract machine.

Is it safe to say that almost nothing can be gleaned about
physical CPU behaviour from C level behaviour.

True. The whole point of high level languages is to avoid
dealing with low level implementation details.

For example: ...

Your examples are really questions on how implementations might
work. Comp.lang.c is not the place for such questions since the
standard merely supplies semantics. How those semantics are
actually implemented is not specified.[*]

Personally, I think your questions, whilst naturally curious,
are nonetheless dangerous. I've seen countless examples of
newbie programmers who try to analyse C semantics from things
like disassemblies, only to develop false conclusions.

When code based on such false conclusions is ported to other
machines, it can often lead to bugs which are difficult to
diagnose and debug.

...
- I read in an article once (can't find it now) that a "byte"
in C doesn't necessarily have to be an octet of bits at the
hardware level.

Correct. Some architectures are incapable of addressing octets.

Any help would be appreciated.
(Are there any links on the net that point to details of C's
abstraction layer?

The standards _are_ the abstraction layer. Your questions are
about realisations of that abstraction.

I can't seem to find any. I guess these details
are woven through the standards documents, but I'm talking
about a single cohesive document)

You should perhaps look at compiler writing books.

[*] Of course, the standard authors are quite mindful of what
can be implemented efficiently on various existing and future
architectures.

Andrey Tarasevich · Jan 21, 2005

Luke said:
...
the CPU can be totally different from the atual CPU.

From purely abstract theoretical point of view: yes, of course it can be.

Is it safe to say that almost nothing can be gleaned about physical CPU
behaviour from C level behaviour.

That's correct.

For example:

- do the addresses returned by & have to have (by Standard) any direct
relationship to real addresses?

If by "real addresses" you mean machine addresses, then no, they don't
have to have any relationship.

-- if the address of 2 objects is different by 'x' in C (when using &
operator), are they so in the hardware?

I don't exactly understand what you mean by "different by 'x'". By 'x'
what? "Bytes" in C sense of the word? Machine bytes? Difference returned
by binary '-' operator?

- do the elements of an array usually end up being placed side by side
in most implementations (does the standard require this) ?

Yes, they do. This means that any padding present between the elements
of the array is part of the element itself, not something added
specifically by the array object. This follows from the fact that in C

sizeof(array) = sizeof(element) * number_of_elements

This is required by the standard.

how about multidimensional arrays?

Multidimensional arrays in C are just arrays of arrays, which means that
the above applies to them as well. Arrays cannot "insert" extra padding
between elements.

-- I've seen code where two or more arrays would be declared side by
side and then the first array with extended indexing would be used to
access the elements of the second/more array.
Does this suggest that C
guarantees that declared variables of the same storage class are placed
in ascending order in memory?

No, there's no such guarantee. Such access is completely illegal in C.
The behavior is undefined.

- I read in an article once (can't find it now) that a "byte" in C
doesn't necessarily have to be an octet of bits at the hardware level

That's true. "Byte" in C (C-byte) is essentially synonymous with 'char'
type. '[unsigned|signed] char' objects in C always consist of 1 C-byte
by definition. A C-byte might consist of any number of machine bytes,
which means that the number of bits in C-byte might be different from 8
(could be 16, for example).

(Are there any links on the net that point to details of C's
abstraction layer? I can't seem to find any. I guess these details
are woven through the standards documents, but I'm talking about a
single cohesive document)

C99 standard has a number of sections specifically dedicated to these
issues.

Mike Wahler · Jan 21, 2005

Luke Wu said:
Hello,

the CPU

C doesn't really model 'a CPU' it defines an 'abstract machine',
and doesn't directly refer to a "CPU" component.

can be totally different from the atual CPU.

Is it safe to say that almost nothing can be gleaned about physical CPU
behaviour from C level behaviour.

Not almost nothing, but nothing. However, by examining an
assembly listing which many compilers can emit, one can
glean some platform-specific information. But then you're
outside the realm of C.

For example:

- do the addresses returned by & have to have (by Standard) any direct
relationship to real addresses?

No. This is especially true for platforms which feature
'virtual memory' and/or separate 'process spaces' as in
e.g. Microsoft Windows.

-- if the address of 2 objects is different by 'x' in C (when using &
operator), are they so in the hardware?

Not necessarily. Also note that the addresses of two separate
objects will not necessarily reflect their relationship in
source code. e.g.:

int i;
int j;

the address of 'j' need not be greater than address of 'i'
nor is their difference guaranteed to be sizeof(int).
(the only time this *is* guaranteed is when the
objects are adjacent elements (the subscript of one
is one more or less than the subscript of the other)
of the same array).

(but the adddresses of two separate objects are always
guaranteed to be different)

- do the elements of an array usually

Not usually, but always.

end up being placed side by side

At contiguous addresses (as reported by the & operator),
whose difference is sizeof(array's element type).

int array[2];

&array[1] is guaranteed to be exactly
sizeof(int) larger than &array[0];

in most implementations

All conforming implementations.

(does the standard require this) ?
Yes.

how about multidimensional arrays?

Yes. "multi-dimensional arrays" in C are really
"arrays of arrays"

for the array:

int arr2d[2][3] = {1, 2, 3, 4, 5, 6};
(sometimes written for clarity as:
int arr2d[2][3] = { {1,2,3}, {4,5,6} };

the values are stored (contiguously) in memory in the
order in which the intializer values appear above. That is:
arr2d[0][0] == 1
arr2d[0][1] == 2
arr2d[0][2] == 3
arr2d[1][0] == 4
arr2d[1][1] == 5
arr2d[1][2] == 6

That is, C arrays are stored in 'row major' order, unlike
some other languages.

-- I've seen code where two or more arrays would be declared side by
side

C has rather 'free' formatting rules, e.g. more than one
declaration or statment can appear on a single line.
int array1[] = {1,2,3}; int array2[] = {4,5,6};

However I recommend against this practice.

and then the first array with extended indexing

What do you mean by 'extended indexing'? C does not define
such a term.

would be used to
access the elements of the second/more array.

Any integral expression whose value when added to the address
of an array's first element is within the bounds of that array
can be used to index into it. The fact that these values might
themselves be stored in an array is of no consequence. As a
matter of fact, some 'convoluted' code could be written in which
array element values are used to index that same array. But imo
this is a rather dangerous practice.

Does this suggest that C
guarantees that declared variables of the same storage class are placed
in ascending order in memory?

No. This is only guaranteed for elements of the same array.

int a[10], b[10], c[20];

This is a valid way to define several objects, but I recommend
one object per line. Easier to read and maintain.

int i;

for(i = 0; i < 40; i++)
{
a = 0; /* zero initializes all three arrays */

NO, NO, NO!

You must process each array individually. Their positions
in memory relative to one another is not specified. Also
note that what you wrote above is *not* intiialization,
but assignment, not the same thing. An object is intitialized
when it is defined:

int a[10] = {1,2,3}; /* first three elements are intialized with
1, 2, and 3, respectively, all others to zero */

FWIW, you can initialize all the elements of an array to zero like this:

int a[10] = {0};

(If this definition appears at file scope, or is qualified
with 'static' at block scope, all elements are initialized to
zero implicitly -- but I like to include the initializer(s)
anyway, for clarity, but that is a 'style' issue).

}

- I read in an article once (can't find it now) that a "byte" in C
doesn't necessarily have to be an octet of bits at the hardware level

Click to expand...

Correct. It's simply the 'smallest addressible unit of storage',
which is required to have a minimum size of eight bits, but can
be larger (and often is on certain architectures). From a C
perspective, 'byte' and 'character' are synonymous.

This 'abstraction' is there to make the language as platform
neutral as possible, allowing for implementation on the widest
possible variety of existing architectures as well as those that
have yet to be concieved.

Any help would be appreciated.
(Are there any links on the net that point to details of C's
abstraction layer? I can't seem to find any. I guess these details
are woven through the standards documents, but I'm talking about a
single cohesive document)

Click to expand...

This single cohesive document *is* the ISO standard, but I'll be
the first to admit it's not easy to read. What you need are
some books. See www.accu.org for peer reviews.

-Mike

Jens.Toerring · Jan 21, 2005

Luke Wu said:
the CPU can be totally different from the atual CPU.

Is it safe to say that almost nothing can be gleaned about physical CPU
behaviour from C level behaviour.

That's why there is a standard, i.e. in order to be able to write
programs that _don't_ depend on the specific CPU you are using but
that can be ported easily from one to the next system. Otherwise
you wouldn't have much more that a (high-level) assembler.

For example:

- do the addresses returned by & have to have (by Standard) any direct
relationship to real addresses?

No. With many modern operating systems the concept of "real addresses"
(in the sense of physical addresses) don't even make much sense, since
there's what's called "virtually memory", and the mapping between phy-
sical addresses and what a program sees is completely at the discretion
of the operating system. What the program sees as a fixed address can
be mapped to varying physical addresses (or even get written out to swap
space).

-- if the address of 2 objects is different by 'x' in C (when using &
operator), are they so in the hardware?

No - one of the objects could even be in swap space on the disk while
the other is in memory.

- do the elements of an array usually end up being placed side by side
in most implementations (does the standard require this) ?
how about multidimensional arrays?

As long as what the program sees as the addresses are continous in
(virtual) memory everything is fine. But in the sense of physical mem-
mory the elements could be far apart.

-- I've seen code where two or more arrays would be declared side by
side and then the first array with extended indexing would be used to
access the elements of the second/more array. Does this suggest that C
guarantees that declared variables of the same storage class are placed
in ascending order in memory?

int a[10], b[10], c[20];
int i;

for(i = 0; i < 40; i++)
{
a = 0; /* zero initializes all three arrays */
}

No, you can't rely on that, even if you only care about the "virtual"
addresses. Accessing an array element outside of its defined range
of indices is forbidden and leads to undefined behaviour. That code
may work on a certain platform when compiled with a certain compiler
but there's no guarantee that it works with any other compiler or on
a different platform.

- I read in an article once (can't find it now) that a "byte" in C
doesn't necessarily have to be an octet of bits at the hardware level

Click to expand...

There's no "byte" in C. What you have is a char (as the smallest
type), and how many bits a char has on the system you're working on
can be found out from the CHAR_BIT macro from <limits.h>. The only
guarantee you have is that CHAR_BIT is at least 8, i.e. a char has
at least 8 bits - but it can be more.

(Are there any links on the net that point to details of C's
abstraction layer? I can't seem to find any. I guess these details
are woven through the standards documents, but I'm talking about a
single cohesive document)

Click to expand...

Most of the things you're asking about you won't find in the standard
because they aren't relevant from a C language point of view. How C
code gets compiled to have the resulting executable work as expected
(i.e. as required by the standard) is due to the people writing the
compiler. The standard does not make any requirements how they use the
CPU they are dealing with to manage this. The C standard is basically
a recipe along the lines of "Given this code as input the resulting
program must behave in the that way", but how this it's achieved (and
with what kind of hardware) isn't relevant.

Regards, Jens

Luke Wu · Jan 21, 2005

Thank you for the responses.

I am now getting the 'feel' for C's abstraction away from hardware
details from reading clc posts. I think I'm almost done erasing all
the assumptions that I got into my head from reading books like The C
Companion, by Allen I. Holub.

Andrey Tarasevich · Jan 21, 2005

Andrey said:
...

Yes, they do. This means that any padding present between the elements
of the array is part of the element itself, not something added
specifically by the array object. This follows from the fact that in C

sizeof(array) = sizeof(element) * number_of_elements

This is required by the standard.

Multidimensional arrays in C are just arrays of arrays, which means that
the above applies to them as well. Arrays cannot "insert" extra padding
between elements.
...

Although it is worth noting that the above requirements are still
formulated at language level. Which means that if some compiler by means
of "compiler magic" can satisfy these requirements and at the same time
place array elements out of order/apart from each other in machine
memory, there wouldn't be anything wrong with it.

Mike Wahler · Jan 21, 2005

There's no "byte" in C.

Au contraire.

ISO/IEC 9899:1999 (E)

3.6

1 byte
addressable unit of data storage large enough to hold
any member of the basic character set of the execution
environment

-Mike

E. Robert Tisdale · Jan 21, 2005

Mike said:
Au contraire.

ISO/IEC 9899:1999 (E)

3.6

1 byte
addressable unit of data storage large enough to hold
any member of the basic character set of the execution
environment

Note that a byte is not a data type
but the *size* of a unit of storage.

In practice, a byte is 8 binary digits (bits) almost everywhere
including machines where four characters are normally "packed"
into 32 bit "words".

Mike Wahler · Jan 21, 2005

E. Robert Tisdale said:
Note that a byte is not a data type

Note that I never claimed that it is.

but the *size* of a unit of storage.

In practice, a byte is 8 binary digits (bits) almost everywhere

Then imo your 'everywhere' is rather limited.

including machines where four characters are normally "packed"
into 32 bit "words".

Note that on some machines a byte is 32 bits.

-Mike

E. Robert Tisdale · Jan 21, 2005

Mike said:
Note that I never claimed that it is.

I never claimed that you claimed that it is.

Then imo your 'everywhere' is rather limited.

Note that on some machines a byte is 32 bits.

Name ten.

Perspective is important.

I know that you don't mean to imply
that this is a real problem for C programmers.

Most C programmers will never write a single line of code
that will be ported to a processor with 32 bit bytes.

Thomas Stegen · Jan 21, 2005

Mike said:
No. This is especially true for platforms which feature
'virtual memory' and/or separate 'process spaces' as in
e.g. Microsoft Windows.

Well, there must somewhere be a mapping between pointer
values and actual addresses (even in the abstract machine).
Though there can be several layers of mappings. So there
must be a relationship, but depending on what you mean by
direct, it might not be direct.

But even though this mapping must exist even in the abstract
machine one cannot one cannot portably use this for anything
as there is a) no specified mechanism for doing so and b) it
will be very different between platforms.

Thomas Stegen · Jan 21, 2005

Mike said:
news:[email protected]...

[snip usual char byte 8 bit not 8 bit discussion]

Semi OT perhaps but...

Outside a C perspective, didn't IBM first coin the term byte to
refer to 8 bit entities? As far as I know machines such as the
pdp-11 (I think) had 9 bit entities, but never used the term byte.

It is also clear though that in a C context byte does not mean
this. It is also clear that one should establish a context,
implicitly or explicitly, when discussing bytes with anyone.
Are we in the C locale, or in the mere mortals locale?

Here in comp.lang.c thw context should be clear to everyone.

Jonathan Burd · Jan 21, 2005

Thomas said:
Mike said:

news:[email protected]...

Click to expand...

[snip usual char byte 8 bit not 8 bit discussion]

Semi OT perhaps but...

Outside a C perspective, didn't IBM first coin the term byte to
refer to 8 bit entities? As far as I know machines such as the
pdp-11 (I think) had 9 bit entities, but never used the term byte.

It is also clear though that in a C context byte does not mean
this. It is also clear that one should establish a context,
implicitly or explicitly, when discussing bytes with anyone.
Are we in the C locale, or in the mere mortals locale?

Here in comp.lang.c thw context should be clear to everyone.

Perhaps, using the term ``octet" for a group of 8 bits would be
much better. A byte may be an octet and is the most basic
addressable unit in an execution environment. Therefore, a byte,
according to this definition, may also be 4 bits.

To reply to the original context, C does not have
a ``byte" data type. In C, a char contains, at least, enough
bits to represent any element of the basic character set.
A char may at least be a byte or higher.

I don't see how you can safely assume a char to contain at least
8 bits. The standard doesn't say so explicitly.

CBFalconer · Jan 21, 2005

Mike said:
.... snip ...

FWIW, you can initialize all the elements of an array to zero
like this:

int a[10] = {0};

(If this definition appears at file scope, or is qualified
with 'static' at block scope, all elements are initialized to
zero implicitly -- but I like to include the initializer(s)
anyway, for clarity, but that is a 'style' issue).

But be aware that, on some systems, this may result in heavy
bloating of the final executable file with long strings of zero
bytes. This has nothing whatsoever to do with the language, but
you should be aware of the possibility.

Jonathan Burd · Jan 21, 2005

Jonathan said:
Thomas said:

Mike said:

news:[email protected]...

Click to expand...

[snip usual char byte 8 bit not 8 bit discussion]

Semi OT perhaps but...

Outside a C perspective, didn't IBM first coin the term byte to
refer to 8 bit entities? As far as I know machines such as the
pdp-11 (I think) had 9 bit entities, but never used the term byte.

It is also clear though that in a C context byte does not mean
this. It is also clear that one should establish a context,
implicitly or explicitly, when discussing bytes with anyone.
Are we in the C locale, or in the mere mortals locale?

Here in comp.lang.c thw context should be clear to everyone.

Click to expand...

Perhaps, using the term ``octet" for a group of 8 bits would be
much better. A byte may be an octet and is the most basic
addressable unit in an execution environment. Therefore, a byte,
according to this definition, may also be 4 bits.

To reply to the original context, C does not have
a ``byte" data type. In C, a char contains, at least, enough
bits to represent any element of the basic character set.
A char may at least be a byte or higher.

Correction: A char must at least be a byte.

Jonathan Burd · Jan 21, 2005

Jonathan said:
Thomas said:

Mike said:

news:[email protected]...

Click to expand...

[snip usual char byte 8 bit not 8 bit discussion]

Semi OT perhaps but...

Outside a C perspective, didn't IBM first coin the term byte to
refer to 8 bit entities? As far as I know machines such as the
pdp-11 (I think) had 9 bit entities, but never used the term byte.

It is also clear though that in a C context byte does not mean
this. It is also clear that one should establish a context,
implicitly or explicitly, when discussing bytes with anyone.
Are we in the C locale, or in the mere mortals locale?

Here in comp.lang.c thw context should be clear to everyone.

Click to expand...

Perhaps, using the term ``octet" for a group of 8 bits would be
much better. A byte may be an octet and is the most basic
addressable unit in an execution environment. Therefore, a byte,
according to this definition, may also be 4 bits.

To reply to the original context, C does not have
a ``byte" data type. In C, a char contains, at least, enough
bits to represent any element of the basic character set.
A char may at least be a byte or higher.

I don't see how you can safely assume a char to contain at least
8 bits. The standard doesn't say so explicitly.

Alright, CHAR_BIT is at least 8 bits. My bad.

Regards,
Jonathan.

pete · Jan 21, 2005

(sizeof(char) == 1) /* always just exactly one. */

Chris Croughton · Jan 21, 2005

Mike said:
Mike said:

news:[email protected]...

Click to expand...

[snip usual char byte 8 bit not 8 bit discussion]

Semi OT perhaps but...

Outside a C perspective, didn't IBM first coin the term byte to
refer to 8 bit entities? As far as I know machines such as the
pdp-11 (I think) had 9 bit entities, but never used the term byte.

This was discussed here recently.

Yes, Werner Buchholz at IBM invented the term in 1956, originally just
as a 1 to 6 bit field used for I/O but by the end of the year it had
come to refer to 8 bit quantities. The DEC PDP-11 was a 16 bit machine,
and DEC did use the term byte to refer to half-words of 8 bits (as far
as I know no PDP-11 actually used wrds like 'byte' at all, they couldn't

It is also clear though that in a C context byte does not mean
this. It is also clear that one should establish a context,
implicitly or explicitly, when discussing bytes with anyone.
Are we in the C locale, or in the mere mortals locale?

Use "characters" or "chars" to refer to the C entities and "octets" to
refer to the 8 bit quanities, and shun the overloaded term "bytes".

Here in comp.lang.c thw context should be clear to everyone.

It isn't, even those of use who have worked on machines with odd byte
lengths often now use it only ablut 8 bit quantities, because that
represents the vast majority of machines these days (most of the DSP
programmers I know refer to the basic -- and only -- memory units as
"words").

Chris C

Mike Wahler · Jan 21, 2005

E. Robert Tisdale said:
I never claimed that you claimed that it is.

Name ten.

No need.

Perspective is important.

More important is abstraction and portability.

I know that you don't mean to imply
that this is a real problem for C programmers.

It can be for some.

Most C programmers will never write a single line of code
that will be ported to a processor with 32 bit bytes.

There you go again, with your 'most [insert whatever]'.
You can't have any idea what 'most' C programmers do
or don't do. You can't know who they are, or how many
of them there are.

-Mike

C/C++ abstraction layer	11	Jan 28, 2013
The container abstraction and parallel programming	38	Jan 6, 2012
C exercise	1	Feb 3, 2022
C Programming functions	2	Dec 3, 2021
memmove: works on C's abstraction layer? or no?	18	Jul 22, 2006
What are the distinctions between StringBuilder in Java and StringBuilder in C#?	0	Jul 12, 2022
Unraveling Pointers and Arrays in C++: Seeking Expert Advice.	1	Jan 26, 2024
Question of throttling CPU usage	16	Apr 25, 2012

Abstraction layer between C and CPU

Luke Wu

Peter Nilsson

Andrey Tarasevich

Mike Wahler

Jens.Toerring

Luke Wu

Andrey Tarasevich

Mike Wahler

E. Robert Tisdale

Mike Wahler

E. Robert Tisdale

Thomas Stegen

Thomas Stegen

Jonathan Burd

CBFalconer

Jonathan Burd

Jonathan Burd

pete

Chris Croughton

Mike Wahler

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads