run-time vs compile-time

N

newbiecpp

I have hard time to understand run-time environment. Let assume that I have
a program that has a simple variable alpha. When this variable is
statically allocated, the compiler can use the absolute address of alpha to
access to it. What confuses me is that when the variable is dynamically
allocated, how does the compiler implement it? We know the address of the
variable until run-time. During the compilation, how can we access to the
alpha variable since we don't know its address yet?

Thanks in advance
 
H

Howard

newbiecpp said:
I have hard time to understand run-time environment. Let assume that I have
a program that has a simple variable alpha. When this variable is
statically allocated, the compiler can use the absolute address of alpha to
access to it. What confuses me is that when the variable is dynamically
allocated, how does the compiler implement it? We know the address of the
variable until run-time. During the compilation, how can we access to the
alpha variable since we don't know its address yet?

Thanks in advance

You assumption about the "statically allocated" variable is incorrect. The
compiler has no idea where in memory the entire program will be loaded, so
no "absolute address" can be given.

How a variable is addressed depends greatly on where it is declared, not
just on whether new was used to allocate space for it.

For example, a local variable is likely to be addressed by using an offset
from the "stack" pointer. But since there is no actual requirement for the
existence of a "stack" in the C++ language specification, compiler writers
are technically free to implement such details as they see fit.

To answer your question about dynamically allocated variables, what you're
describing is the use of a pointer variable and the function new (or malloc
or alloc, but let's stick with new, ok?). The pointer variable is just like
any other variable. If it's a local pointer, then it may be addressed as I
described above (using an offset from the stack pointer). But it's the
*contents* of that variable that get changed dynamically when allocating the
object it points to. An object gets space allocated for it when new is
called, the object's constructor gets called, and the address of the
allocated space gets put into the pointer variable, as its *value*, not its
location!

The compiler doesn't actually arrange any of the memory, outside of these
local Stack offsets I mentioned and perhaps a few other cases. The linker
comes into play, as does the operating system's loading mechanism. You
might want to get a book on compiler theory if you're really interested.
It's at least a whole semester's worth of study in college.

-Howard
 
J

JKop

newbiecpp posted:
I have hard time to understand run-time environment. Let assume that I
have a program that has a simple variable alpha.

unsigned int alpha;
When this variable is
statically allocated, the compiler can use the absolute address of
alpha to access to it.

Not really. "alpha" may have a different address each time the program is
run.
What confuses me is that when the variable is
dynamically allocated, how does the compiler implement it?

unsigned int* p_alpha = new unsigned int;

It stores its address. Or then again, it can mysteriously achieve it in
whatever way it sees fit in the following:

unsigned int& alpha = *new unsigned int;
We DON'T know the
address of the variable until run-time. During the compilation, how
can we access to the alpha variable since we don't know its address
yet?

The program starts up. Somehow it asks the system where it can store certain
data in memory. The system reponds by specifying addresses. From here on the
program accesses these specific addresses.
Thanks in advance


Hope that helps.


-JKop
 
J

Jonathan Mcdougall

I have hard time to understand run-time environment.

It is possible I messed up some things in my explanations, I am no
compiler implementor nor an operating system developper. Wait for some
corrections to appear before believing all that.


Let's review the common steps :

1) Writing source code
You specify, in a given language, the commands to be sent to the
computer. In C++, that means creating classes and functions and
using them. An object O, for example, is only a name, something
for the programmer to make his life easier.

Always think of a programming language in term of assembly
language : the machine does not understand classes, objects,
inheritance or whatever. All this high-level code is translated
into machine code, so all your variable names, functions, cool
class hierarchies will be replaced by memory locations, additions
and substractions. In fact, high level languages like C++ could
be viewed only as "sytactic sugar", as far as everything you do
could be done in assembly language (that's what your code will
become ultimately anyways).

2) Compiling
Nothing really interesting here, syntax checking and traduction
into intermediary code. The only thing here is that all names
are checked, so for example

// main.cpp
int main()
{
a = 2;
}

will fail to compile since 'a' does not exist anywhere. The
compiler makes sure every name is _potentially_ defined, so the
linker will only have to try to find them. So when you write

// main.cpp
extern int a;

int main()
{
a = 2;
}

the compiler tells the linker : "'a' refers to some int defined
somewhere else, you'll have to find it. good luck".

3) Linking
That's getting interesting. The linker makes sure everything is
in place : each use of a name is resolved to its definition. If
that definition does not exist or if it exists more than once,
an error is generated (in most cases). The linker uses the
information generated by the compiler to associate names. By
using the example above, it searches other files (actually
"translation units") for a definition of 'a'. If it finds it,
for example with

// another file.cpp
int a=0;

, it associates 'a' in main.cpp with the 'a' in file.cpp. If 'a'
is not found anywhere, it stops. If two or more 'a's are found,
it stops since there is ambiguity. The the basis of the ODR
(One Definition Rule).

The linker then translates all data access by an address, but
don't get it wrong : that address is not the real address in
memory. Actually, you could see that address as an offset in the
real memory. When the operating system runs the program, it
loads it somewhere and reserves some space in memory for that
program. Each variable/address/offset is resolved by the OS to
that memory space.

How? An easy answer could be "automagically". The real long
answer perhaps could be provided by someone in a newgroup
supporting your operating system. The thing is, that kind of
detail is left to the implementation : the C++ standard only
mandates _behavior_, not implementation. So as long as it
behaves as specified, an implementation can do anything it wants.

You must understand that the only time memory is allocated, and
therefore real addresses are defined, is when the program is
executed, not during compilation or linking.

Please, note that this is a bit of oversimplification.

Let assume that I have
a program that has a simple variable alpha. When this variable is
statically allocated, the compiler can use the absolute address of alpha to
access to it.

What's important to understand is that nothing is allocated when you
compile/link a program. The program merely becomes some machine code.
The operating system is then in charge of running it. Yes, the linker
assigns some addresses to your variables, but these do not refer to a
specific place in memory. As I said, you could consider these
"addresses" to be offsets in memory, the starting address being defined
by the operating system.

For what you said specifically, know that static data is handled
differently by most implementations. Actually, three types of data are
typically recognized : the stack, the heap and the static data.

After re-reading, what do you mean by "statically allocated"? On the
stack such as

int main()
{
int i = 0;
}

or really statically allocated such as

int main()
{
static int i = 0;
}

Be sure to make the difference between the two.
What confuses me is that when the variable is dynamically
allocated, how does the compiler implement it?

That's something else, though not entirely different. What I described
about the linker only applies to the stack. For the heap, it works a
bit differently.

The operating system manages a pool of memory typically called "heap" or
"free store". That memory is different from the stack because, first of
all, of its longevity. For example, by using the stack :

void f()
{
int i=0; // i is on stack
}

you have no way of extending the life of 'i'. Once the function
terminates, 'i' is destroyed. This is mandated by the C++ standard and
it is the usual practice in all languages, including assembly, so it's
no big deal.

The heap however is a pool of memory. You can reserve and release
memory when you want. Typically, in pratice, the operating system has
some functions for allocating and deleting memory at low-level,
typically in big chunks (several kilobytes). These functions are then
used by the malloc() system, which usually maintains another pool of
memory itself. Finally, operator new is usually implemented in terms of
malloc().

The low level OS functions return the address of the allocated memory on
the heap. That address can be absolute (which is pretty rare today) or
relative, allowing in particular some sort of protection. Relative
addressing (also called virtual addressing) behaves like the relative
stack address I described earlier.

So actually, what the OS returns is only an address, so neither the
compiler or the linker has nothing to do with that. That address is
determined by the OS at run-time, depending on the loaded programs and
the content of the heap. These functions can fail if the heap is full.
We know the address of the
variable until run-time.

That should read : "We don't know the real address of the variable until
run-time", since the program is not running. For memory to be
allocated, the program must run! Compiling it only translates it into
machine code and assigns "dummy" addresses which have to be resolved by
the operating system.
During the compilation, how can we access to the
alpha variable since we don't know its address yet?

As I said earlier, the linker uses some kind of relative address which
is resolved by the operating system. If you are asking how the compiler
/linker do with

void f(int &i)
{
i = 2;
}

to know what 'i' refers to, well it is only a matter of keeping a list
of the variables in a given scope with their names. When you do

int main()
{
int a = 10;
f(a);
}

The compiler enters 'a' in its list for main() and associates 'i' in f()
with it. That's a simple assocation map. Once every name has been
resolved, the linker only has to take that list of variable and give
them addresses. The operating system is then in charge, when running
the program, of allocating memory and resolving the addresses made by
the linker to the real addresses in memory.

It is important for you to understand that all the things I just
explained are not specified by the C++ standard, and therefore you
cannot rely on it, altough it is commonly implemented that way.
Remember : C++ describes behavior, not implementation.


Jonathan
 
H

Howard

Jonathan Mcdougall said:
2) Compiling
Nothing really interesting here, syntax checking and traduction

?What's "traduction"? (It's not in my dictionary.)
 
K

Kai-Uwe Bux

Howard said:
?What's "traduction"? (It's not in my dictionary.)

Try the Oxford English Dictionary. It lists as the second main (though
obsolete) meaning:

{dag}2. Translation into another language; concr. a translation. Obs. or
arch.

Dates: 1549 a1533 1663 1716 1822 1823

a1533 LD. BERNERS Gold. Bk. M. Aurel. (1546) Bv, I confesse to deserue no
merytes for my traduction. 1549 Compl. Scot. To Rdr. 10 He that hes the
gyft of traductione, compiling or teching, his faculte is..honest. 1663
COWLEY Pind. Odes Pref., The verbal Traduction of him into Latin Prose.
1716 M. DAVIES Athen. Brit. III. 5 The Jesuit Rapin's Critical Parallels
(whereof the English Traduction was so greedily bought up). 1822 SCOTT
Nigel xxxii, Whilk we do not perceive even in the Latin version of the
Septuagint, much less in the English traduction. 1823 BYRON Juan XI. xix.
note, If there be any gem'man so ignorant as to require a traduction.




Best

Kai-Uwe Bux
 
H

Howard

Kai-Uwe Bux said:
Try the Oxford English Dictionary. It lists as the second main (though
obsolete) meaning:

{dag}2. Translation into another language; concr. a translation. Obs. or
arch.

Dates: 1549 a1533 1663 1716 1822 1823

a1533 LD. BERNERS Gold. Bk. M. Aurel. (1546) Bv, I confesse to deserue no
merytes for my traduction. 1549 Compl. Scot. To Rdr. 10 He that hes the
gyft of traductione, compiling or teching, his faculte is..honest. 1663
COWLEY Pind. Odes Pref., The verbal Traduction of him into Latin Prose.
1716 M. DAVIES Athen. Brit. III. 5 The Jesuit Rapin's Critical Parallels
(whereof the English Traduction was so greedily bought up). 1822 SCOTT
Nigel xxxii, Whilk we do not perceive even in the Latin version of the
Septuagint, much less in the English traduction. 1823 BYRON Juan XI. xix.
note, If there be any gem'man so ignorant as to require a traduction.

You sure that's not the Olde English dictionary? :)
 
J

Jonathan Mcdougall

Howard said:
?What's "traduction"? (It's not in my dictionary.)

Transforming C++ code into machine code, though I think it was clear in
the context.

Jonathan
 
N

newbiecpp

Jonathan Mcdougall said:
It is possible I messed up some things in my explanations, I am no
compiler implementor nor an operating system developper. Wait for some
corrections to appear before believing all that.


Let's review the common steps :

1) Writing source code
You specify, in a given language, the commands to be sent to the
computer. In C++, that means creating classes and functions and
using them. An object O, for example, is only a name, something
for the programmer to make his life easier.

Always think of a programming language in term of assembly
language : the machine does not understand classes, objects,
inheritance or whatever. All this high-level code is translated
into machine code, so all your variable names, functions, cool
class hierarchies will be replaced by memory locations, additions
and substractions. In fact, high level languages like C++ could
be viewed only as "sytactic sugar", as far as everything you do
could be done in assembly language (that's what your code will
become ultimately anyways).

2) Compiling
Nothing really interesting here, syntax checking and traduction
into intermediary code. The only thing here is that all names
are checked, so for example

// main.cpp
int main()
{
a = 2;
}

will fail to compile since 'a' does not exist anywhere. The
compiler makes sure every name is _potentially_ defined, so the
linker will only have to try to find them. So when you write

// main.cpp
extern int a;

int main()
{
a = 2;
}

the compiler tells the linker : "'a' refers to some int defined
somewhere else, you'll have to find it. good luck".

3) Linking
That's getting interesting. The linker makes sure everything is
in place : each use of a name is resolved to its definition. If
that definition does not exist or if it exists more than once,
an error is generated (in most cases). The linker uses the
information generated by the compiler to associate names. By
using the example above, it searches other files (actually
"translation units") for a definition of 'a'. If it finds it,
for example with

// another file.cpp
int a=0;

, it associates 'a' in main.cpp with the 'a' in file.cpp. If 'a'
is not found anywhere, it stops. If two or more 'a's are found,
it stops since there is ambiguity. The the basis of the ODR
(One Definition Rule).

The linker then translates all data access by an address, but
don't get it wrong : that address is not the real address in
memory. Actually, you could see that address as an offset in the
real memory. When the operating system runs the program, it
loads it somewhere and reserves some space in memory for that
program. Each variable/address/offset is resolved by the OS to
that memory space.

How? An easy answer could be "automagically". The real long
answer perhaps could be provided by someone in a newgroup
supporting your operating system. The thing is, that kind of
detail is left to the implementation : the C++ standard only
mandates _behavior_, not implementation. So as long as it
behaves as specified, an implementation can do anything it wants.

You must understand that the only time memory is allocated, and
therefore real addresses are defined, is when the program is
executed, not during compilation or linking.

Please, note that this is a bit of oversimplification.



What's important to understand is that nothing is allocated when you
compile/link a program. The program merely becomes some machine code.
The operating system is then in charge of running it. Yes, the linker
assigns some addresses to your variables, but these do not refer to a
specific place in memory. As I said, you could consider these
"addresses" to be offsets in memory, the starting address being defined
by the operating system.

For what you said specifically, know that static data is handled
differently by most implementations. Actually, three types of data are
typically recognized : the stack, the heap and the static data.

After re-reading, what do you mean by "statically allocated"? On the
stack such as

int main()
{
int i = 0;
}

or really statically allocated such as

int main()
{
static int i = 0;
}

Be sure to make the difference between the two.


That's something else, though not entirely different. What I described
about the linker only applies to the stack. For the heap, it works a
bit differently.

The operating system manages a pool of memory typically called "heap" or
"free store". That memory is different from the stack because, first of
all, of its longevity. For example, by using the stack :

void f()
{
int i=0; // i is on stack
}

you have no way of extending the life of 'i'. Once the function
terminates, 'i' is destroyed. This is mandated by the C++ standard and
it is the usual practice in all languages, including assembly, so it's
no big deal.

The heap however is a pool of memory. You can reserve and release
memory when you want. Typically, in pratice, the operating system has
some functions for allocating and deleting memory at low-level,
typically in big chunks (several kilobytes). These functions are then
used by the malloc() system, which usually maintains another pool of
memory itself. Finally, operator new is usually implemented in terms of
malloc().

The low level OS functions return the address of the allocated memory on
the heap. That address can be absolute (which is pretty rare today) or
relative, allowing in particular some sort of protection. Relative
addressing (also called virtual addressing) behaves like the relative
stack address I described earlier.

So actually, what the OS returns is only an address, so neither the
compiler or the linker has nothing to do with that. That address is
determined by the OS at run-time, depending on the loaded programs and
the content of the heap. These functions can fail if the heap is full.


That should read : "We don't know the real address of the variable until
run-time", since the program is not running. For memory to be
allocated, the program must run! Compiling it only translates it into
machine code and assigns "dummy" addresses which have to be resolved by
the operating system.


As I said earlier, the linker uses some kind of relative address which
is resolved by the operating system. If you are asking how the compiler
/linker do with

void f(int &i)
{
i = 2;
}

to know what 'i' refers to, well it is only a matter of keeping a list
of the variables in a given scope with their names. When you do

int main()
{
int a = 10;
f(a);
}

The compiler enters 'a' in its list for main() and associates 'i' in f()
with it. That's a simple assocation map. Once every name has been
resolved, the linker only has to take that list of variable and give
them addresses. The operating system is then in charge, when running
the program, of allocating memory and resolving the addresses made by
the linker to the real addresses in memory.

It is important for you to understand that all the things I just
explained are not specified by the C++ standard, and therefore you
cannot rely on it, altough it is commonly implemented that way.
Remember : C++ describes behavior, not implementation.


Jonathan

Thank you very much. I really appreciate your time and help.

My confusion is from here: C++ says that static allocation, such as

static int i;

is binding at compile-time, while dynamic allocation, such as

int* pi = new int;

is binding at run-time. I understand now that for i, compiler can put an
offset related to some location (like stack base) somewhere. But I still
have some confusion about dynamic binding. To me, the compiler may put an
offset from heap to pi. But why we call it run-time binding? To me, they
all decide at compile-time, that is, compiler record an offset from stack or
heap. During run-time, the OS will decide the addresses of stack and heap
so that we can have real addresses of i and pi. But why we call one as
compile-time binding and the other as run-time binding? Or my understanding
was wrong?

I appreciate your insight and your time.
 
H

Howard

Jonathan Mcdougall said:
Transforming C++ code into machine code, though I think it was clear in
the context.

Well, I agree it was fairly clear what he was tring to say, but he could
have said "splurbification" and accomplished that. :) It doesn't make the
word itself legitimate. I checked several on-line dictionaries and found no
such word. I only found it on-line by doing a French-to-English translation
on AltaVista (BabelFish). As the other responder noted, it's listed as
"obscure" or "archaic" even in the Oxford dictionary. So you can't blame me
for asking, can you? (Besides, it's how I learn!)

-Howard
 
R

Richard Herring

In message said:
Well, I agree it was fairly clear what he was tring to say, but he could
have said "splurbification" and accomplished that. :) It doesn't make the
word itself legitimate. I checked several on-line dictionaries and found no
such word. I only found it on-line by doing a French-to-English translation
on AltaVista (BabelFish). As the other responder noted, it's listed as
"obscure" or "archaic" even in the Oxford dictionary. So you can't blame me
for asking, can you? (Besides, it's how I learn!)

It does have a current meaning, but you have to look under "traduce".

Personally I wouldn't want a compiler that did that to my code, though
the Italians have a saying "traduttore -- tradittore" for it.
 
N

Niklas Borson

newbiecpp said:
My confusion is from here: C++ says that static allocation, such as

static int i;

is binding at compile-time, while dynamic allocation, such as

int* pi = new int;

is binding at run-time.

Actually there are three types of storage in C and C++: static, dynamic,
and automatic. Following are examples of all three:

// The integer g has static storage duration, like all global variables.
int g;
int main()
{
// The integer s has static duration because of the 'static' keyword.
static int s = 0;

// The integer a has automatic duration, which is the default for
// local variables.
int a = 0;

// The pointer p has automatic duration, but the integer it points
// to has dynamic duration.
int* p = new int(0);

// The following statement frees the integer that p points to, which
// means the integer no longer exists and the memory it occupied can
// be reclaimed for other purposes. The pointer p still exists, but
// it is now an invalid pointer, which means you're not allowed to
// do anything with it except assign to it.
delete p;

return 0;
}

The amount of static storage, and the number of static variables, in a
program is known "in advance" (specifically, at link time) and does not
change over the life of the program -- hence the name "static". In other
words, the lifetime of a static variable is the lifetime of the program.

Automatic variables are created and destroyed "automatically" based on
the flow of control. An automatic variable is created when the control
reaches the point where it is declared, and is destroyed when the flow
of control leaves the block where it is declared. Thus, unlike static
variables, automatic variables may be bound to different addresses at
different times.

Dynamic objects are allocated on the free store (informally known as the
heap) and are allocated and freed explicitly by the programmer -- though
perhaps indirectly, e.g., via objects with constructors and destructors.
As with automatic objects, the number of dynamic objects changes over
time. A reference variable may be bound to different dynamic objects at
different times, and a pointer may point to different dynamic objects at
different times.
I understand now that for i, compiler can put an
offset related to some location (like stack base) somewhere.

In your example, i is static so the stack is not involved. The actual
address of a static variable may not be known until the program is
loaded, but from that point forward it does not change. The way this
works on some common platforms (e.g., Windows) is that the linker
includes "relocation information" in an executable file, which the
operating system them uses to fix up all references to static variables
when it loads the program. Function calls work in a similar manner.

Automatic variables work quite differently. The C++ standard does not
require it, but most implementations for common hardware platforms
allocate automatic variables on the stack. This takes advantage of
the fact that automatic variables are always destroyed in the reverse
order they are created (i.e., LIFO, last-in, first-out). Stacks are
well suited to this. A function can reserve space for its local
variables by simply moving the stack pointer (usually a register)
when it is called, and restoring the previous stack pointer when it
returns.

Following is a more detailed description of how the stack might be
managed in a typical x86 implementation.

Two special registers are used to manage the stack: the stack pointer
(ESP), which points to the top of the stack; and the base pointer (EBP),
which points to the current "stack frame", i.e., the part of the stack
reserved for a particular function invocation. A typical function call
sequence might work like this:

1. The caller pushes parameters onto the stack.
2. The caller pushes the return address (i.e., the address of
the next instruction) into the stack, and transfers control
to the entry point of the function.
3. The function (i.e., the callee) saves the old value of the
base pointer by pushing it into the stack, and stores the
current stack pointer in the base pointer; it then adjusts
the stack pointer to make room for its local variables.
4. Local variables, parameters, and the old base pointer all
now have known offsets from the current base pointer.
5. Before the function returns it restores the old stack and
base pointers (reversing step 3). It then "returns", i.e.,
pops the return address off the stack and transfers control
there (reversing step 2).
6. The caller then fixes the stack pointer (reversing step 1).

There are many variations on this idea. For example, parameters may
be passed in registers instead of on the stack; step 6 may be carried
out by the callee instead of the caller; etc. The specific protocol
used is known as a calling convention, but the basic idea is usually
fairly close to what I described above.

Note that each function invocation has its own stack frame. This
explains how recursion is possible. Consider the following example:

#include <iostream>
int factorial(int n)
{
return (n > 2) ?
n * factorial(n - 1) :
n;
}
int main()
{
std::cout << factorial(3) << std::endl;
return 0;
}

In the above program, main calls factorial with n=3. This first
invocation of factorial calls factorial again with n=2. The second
invocation then returns control to the first invocation, which in
turn returns control to main. Assuming the compiler translates the
program quite literally and uses the calling convention I described
above, here's what the stack looks like during the second
invocation of the factorial function:

(Note: ASCII art best viewed in a monospaced font; the
diagram assumes the stack growns downward.)

+------------------------------------------+
| params to main | stack
| | frame for
| return address (to startup code?) | main
| |
? <--- old base pointer |
| |
+--> | local vars for main |
| | |
| +------------------------------------------+
| | n=3 |
| | | stack
| | return address (somewhere in main) | frame for
| | | factorial
+----- old base pointer | (1st call)
| |
+--> | (local vars if there were any) |
| +------------------------------------------+
| | n=2 |
| | | stack
| | return address (somewhere in factorial) | frame for
| | | factorial
+----- old base pointer | (2nd call)
| |
EBP -> | (local vars if there were any) |
+------------------------------------------+
ESP -> (top of the stack)

Mind you, things are less clear cut in practice. For example,
the recursive call in the factorial function is what's know as
a tail call, and the compiler may just turn it into a jump
instead of an actual function call, in effect replacing recursion
with a loop.
But I still
have some confusion about dynamic binding. To me, the compiler may put an
offset from heap to pi.

It is important to distinguish between the pointer itself (pi) and the
object it points to (*pi). If pi is a local variable it is allocated in
the manner of all automatic variables (e.g., on the stack as described
above).

The memory for the pointee (i.e., *pi) is allocated by a function
called operator new. How that works is quite complex and implementation
dependant, but basically memory management functions like operator new,
operator delete, malloc, and free must maintain some sort of data
structures to keep track of which chunks of memory are in use and which
are free. Something more complex than a stack is required because
(unlike automatic objects) objects on the free store may be freed in
any order. Some cooperation with the operating system is usually often
involved.

In any case, pi can't just point to some static offset from the heap
because pi may point to different objects at different times. Consider
the following example:

struct Node {
Node* next;
};

int main()
{
Node* first = 0;

for (int i = 0; i < 1000; ++i)
{
Node* p = new Node;
p->next = first;
first = p;
}

while (first != 0)
{
p = first->next;
delete first;
first = p;
}

return 0;
}

In the above program, the local variable first points, at different times,
to 1000 different objects. Obviously, then, its value (i.e., the address
of the node it points to) cannot be determined at compile time -- even as
an offset from the beginning of the heap.
But why we call it run-time binding? To me, they
all decide at compile-time, that is, compiler record an offset from stack or
heap.

A static variable is always bound to a single object at a fixed memory
address. The address may be determined at program load time (or in some
other way) rather than compile time; however, it does not change during
the execution of the program.

In contrast, the location of an automatic variable is relative to a
particular stack frame (assuming a stack is even used, which is not
required). Parameters work like automatic variables so consider the
parameter n in the example above. At one point n is bound to a memory
address in one stack frame (containing the value 2). Later it is bound
to another memory address in a different stack frame (containing the
value 3). In fact, both stack frames exist at the same time, although
only the top-most stack frame is active.

In the case of the free store, see the example above. The same pointer
many point to many different objects at different times during a
program's execution.

Well, I think I've gone on long enough. I hope this helps. :)

--Nick
 
J

John Harrison

newbiecpp said:
My confusion is from here: C++ says that static allocation, such as

static int i;

is binding at compile-time, while dynamic allocation, such as

int* pi = new int;

is binding at run-time.

I've never heard the term binding used in this way.
I understand now that for i, compiler can put an
offset related to some location (like stack base) somewhere. But I still
have some confusion about dynamic binding. To me, the compiler may put an
offset from heap to pi. But why we call it run-time binding?

You seem to have a confusion between the variable pi and the memory it
points to.

int a;
int* b;
int* c = &a;
int *d = new int;

a, b, c and d are all local variables. The compiler treats them all the
same, allocating space for them at compile time (as an offset from the stack
pointer). The fact that at run time d is initialised to point to dynamically
allocated memory is completely irrelevent. The memory occupied by d itself
and the memory d points to are different things.
To me, they
all decide at compile-time, that is, compiler record an offset from stack
or
heap. During run-time, the OS will decide the addresses of stack and heap
so that we can have real addresses of i and pi. But why we call one as
compile-time binding and the other as run-time binding? Or my
understanding
was wrong?

As I said I've never heard the term binding used like that. But in any case
the address of i and pi is worked out at compile time. The address pi points
to is worked out at run time.

john
 
J

Jonathan Mcdougall

My confusion is from here: C++ says that static allocation, such as
static int i;

is binding at compile-time, while dynamic allocation, such as

int* pi = new int;

is binding at run-time.

The C++ standard is completly silent on the way the compiler and system
implement allocation or "binding". C++ mandates behavior, not
implementation. An implementation would be free to implement the stack
and the heap the same way, as long as everything behaves as the standard
stated. That is important.

However...
I understand now that for i, compiler can put an
offset related to some location (like stack base) somewhere. But I still
have some confusion about dynamic binding. To me, the compiler may put an
offset from heap to pi.

It is possible to use "offsets" when using the stack, since the system
uses a stack pointer to know exactly what starting point you are using.
The offsets are always the same, but the starting point will change.
The thing is, you only have one starting point in the heap, which is the
start of the heap. Consider it to be address 1.

When you allocate an int on the heap, it is allocated at address 1. The
next free chunk will be at address 5 (if an int is four bytes). If you
allocate 10 ints on the heap, the the next free chunk will be at address
41. If you then free int number 2 and 5, you get some empty spaces
scattered in the heap. If you then allocate another int, it will have
address 5, which is the address of the first int you freed.

The heap is a _disorganized_ system. You never know what is free and
what is allocated, that is why the compiler has no way, nor the system,
to know in advance the address of a heap allocated object. It depends
on what is in the heap right now.

If you run two instances of your program at the same time, both will
have the same "offsets" on stack, that is, their structure on the stack
will be exactly the same (if they get the same input of course). But
the will obviously get different addresses from the heap.

Let's try something else.


You need to understand a little more about how a computer works. Take
for example

void f(int a, int b)
{
}

f() resides somewhere in memory as machine code. When it is executed,
that is, when the instruction pointer gets to its address, the values
passed to f() are pushed on the stack and you can access 'a' by doing
something like *(stack_pointer) and 'b' by doing *(stack_pointer +
sizeof(int)). That is what I meant by 'offset'. The compiler/linker
have no idea about where the stack_pointer will point at the time of
executing f(), but it knows that if it goes to the address
(stack_pointer + sizeof(int)), it will find 'b'.

The stack has a very rigid structure, since it must be constructed from
the bottom to the top, by pushing and popping values. This structure
allows the system to keep a pointer to the current position in the stack.

The heap is not that organized. In modern operating systems, an
application has a given amount of memory it can use as it wishes. In
that space, it can allocate and free memory independently of what is
going on on the stack.

Since a stack is last-in-first-out, it is impossible to force some
memory to be kept. Imagine a pile of plates. When you call a function,
you push, for example, 5 plates on the top of the current stack. When
that function returns, you remove these plates to get back to where you
were. If you had some important informations you wanted to keep in one
of these 5 plates, it is now lost.

The solution is to put that important information/plate aside, not in
the stack of plates. The system then tries to find a free space on the
side, in what is called the heap. If it finds the space, it returns a
pointer to that memory space.

But since the heap is "disorganized", that is, your application could
free and allocate memory in a random order, it is impossible for the
compiler or even the system to know in advance what memory will be used.

To make the explanation simpler, imagine the heap as a linear chunk of
memory, say of 10 mb. When an application request some memory on the
heap, to bypass the rigid stack memory management, the system does a
linear search in that chunk to find some free memory and returns a
pointer to that place. It then marks that place as "allocated".

If 10 applications are running at the same time, requesting and freeing
memory randomly (from the system's point of view), the memory will
become fragmented. Allocated and free memory will be randomly scattered
in the 10mb chunk.

At that point, when you decide to run your application, you have
absolutely no way to know anything about the current memory state. You
only ask the system for a free space and the system answers with a valid
space, or fails.

That is why stack-allocated objects are taken care of by the system,
since it knows exactly what is allocated. By having the stack of plates
before you, you know how many plates you have to take off to get back to
where you were and to free the memory used by objects. In the heap,
however, it is impossible to know when the memory must be freed. And it
is your task, as the programmer, to keep track of every chunk of memory
you allocated in order to be able to free them.
But why we call it run-time binding? To me, they
all decide at compile-time, that is, compiler record an offset from stack or
heap.

The heap is a disorganized pool of memory. Asking for free memory today
will not return the same thing as tomorrow. It depends on what is in
the heap right now. The stack, however, will _always_ have the same
structure and the same behavior. Arguments will always be sizeof(int)
bytes after the current stack pointer, wherever the stack pointer points.

I hope it is a little clearer now. If not, continue asking and wait for
somebody else to answer, since I don't know if I will be able to be
clearer than that.

Jonathan
 
S

Stephan Br?nnimann

newbiecpp said:
I have hard time to understand run-time environment. Let assume that I have
a program that has a simple variable alpha. When this variable is
statically allocated, the compiler can use the absolute address of alpha to
access to it. What confuses me is that when the variable is dynamically
allocated, how does the compiler implement it? We know the address of the
variable until run-time. During the compilation, how can we access to the
alpha variable since we don't know its address yet?

Thanks in advance

A lot has been written already, just a small hint if you are
working under UNIX (for Windows a similar tool might exist):

UNIX> man nm
linux> nm -C program

It has helped me a lot in understanding such issues.

Stephan Brönnimann
(e-mail address removed)
Open source rating and billing engine for communication networks.
 
N

newbiecpp

Jonathan Mcdougall said:
I hope it is a little clearer now. If not, continue asking and wait for
somebody else to answer, since I don't know if I will be able to be
clearer than that.

Jonathan

Thanks again. I am clearer than before now. By the way, can you recommend
some good books about assembly and compiler theory? I appreciate. Have a
wonderful weekend.
 
J

Jonathan Mcdougall

I hope it is a little clearer now. If not, continue asking and wait for
Thanks again. I am clearer than before now. By the way, can you recommend
some good books about assembly and compiler theory?

No, nor am I a good source of information about it. You would really be
better off in a newsgroup servicing your platform.


Jonathan
 
C

Chris \( Val \)

|
| | > I hope it is a little clearer now. If not, continue asking and wait for
| > somebody else to answer, since I don't know if I will be able to be
| > clearer than that.
| >
| > Jonathan
|
| Thanks again. I am clearer than before now. By the way, can you recommend
| some good books about assembly and compiler theory? I appreciate. Have a
| wonderful weekend.

You can go to the following link:
http://webster.cs.ucr.edu/

....and download "The Art of Assembly" book, written
by "Randall Hyde". It is also now available in hard
copy now, and it is one of the best books on the
topic available to beginners.

Additionally, you can download many other tools from
Randall' site above.

Cheers.
Chris Val
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top