Do buffers always start with the lowest memory address being the first element?

K

kiru.sengal

[This post is with regards computers/OSes with stacks that grow down.
i386/Unix is one possibility]

I have embedded my questions/assumptions in the the following sample
code:


include <stdio.h>

long num;
/* allocated a zero-init fixed memory location (.BSS) */

char *s = "Hello world";
/* s allocated a fixed memory location and initialized
to address of 'H' (DATA). Meanwhile, the literal string
(12-byte buffer) is stored in a fixed WRITE ONLY area (TEXT) */

short buffer[100];
/* buffer allocated in contiguous fixed memory locations (in DATA)
where buffer[0] is lowest memory address and buffer[99] is highest
memory address */

int main()
{
int count = 4; /* stored in main()'s stack frame */
float fcount= 4.0; /* stored in main()'s stack frame */
static confusion = 8; /* stored in same region as buffer (DATA) */

long lbuffer[50];
/* stored in main()'s stack frame, but since stack grows down, is
the field for buffer[0] still placed in the lowest memory address?
*/

printf("\n%s\n",s);

int i; /* stored in main()'s stack frame, but when is memory
allocated? */

for(i=0; i<100; i++)
{
buffer = i;
}

return 0;

}

Additional questions:

- Since local variables don't have to be declared at the beginning of a
function, during run-time, is space for all local variables to be used
in any function allocated when the function is entered, or only when
they are used?

- Since main() is always the starting point for programs, do compilers
really put it's local variables in a main()-stackframe or simply place
it in fixed memory locations? How are static locals in main()
treated?


Thanking everyone in advance.
 
M

Malcolm

The short answer to the question is "yes".
Technically you could write a perverse implementation that uses some weird
and wonderful mapping between pointers and physical memory, but no-one does
this.

include <stdio.h>

long num;
/* allocated a zero-init fixed memory location (.BSS) */

char *s = "Hello world";
/* s allocated a fixed memory location and initialized
to address of 'H' (DATA). Meanwhile, the literal string
(12-byte buffer) is stored in a fixed WRITE ONLY area (TEXT) */

short buffer[100];
/* buffer allocated in contiguous fixed memory locations (in DATA)
where buffer[0] is lowest memory address and buffer[99] is highest
memory address */

BSS, TEXT, and DATA are purely concepts provided by your OS. Compilers
usually adhere to the conventions of their host platform, but not absolutely
always. On a different platform, there may not be this distinction between
read-only and temporary memory.
- Since local variables don't have to be declared at the beginning of a
function, during run-time, is space for all local variables to be used
in any function allocated when the function is entered, or only when
they are used?
The same goes for the stack frame. Usually the stack pointer will be
advanced to allow space for all locals on function entry, and reset on exit.
However you cannot assume that this will always be the case for every
compiler.
- Since main() is always the starting point for programs, do compilers
really put it's local variables in a main()-stackframe or simply place
it in fixed memory locations? How are static locals in main()
treated?
Normally on a hosted system it is not possible for an application to write
to absolute memory addresses. So globals and static locals have got to go
somewhere defined at runtime. This might well be in the space immediately
before the stack where main's locals are held. However you cannot guarantee
this, and normally it shouldn't concern you as a C programmer.
 
E

Eric Sosman

[This post is with regards computers/OSes with stacks that grow down.
i386/Unix is one possibility]

I have embedded my questions/assumptions in the the following sample
code:


include <stdio.h>

long num;
/* allocated a zero-init fixed memory location (.BSS) */

Zero-initialized, yes. At a fixed location, yes.
"BSS" is an implementation detail, not necessarily shared
by all implementations.
char *s = "Hello world";
/* s allocated a fixed memory location and initialized
to address of 'H' (DATA). Meanwhile, the literal string
(12-byte buffer) is stored in a fixed WRITE ONLY area (TEXT) */

Fixed location for `s', yes. Initialized to point to
the initial 'H', yes. "Hello world" at a fixed location,
yes. Definitely not in a write-only area, possibly in a
read-only area or a read-write area at the implementation's
discretion. "TEXT" is an implementation detail.
short buffer[100];
/* buffer allocated in contiguous fixed memory locations (in DATA)
where buffer[0] is lowest memory address and buffer[99] is highest
memory address */

Contiguous fixed locations, yes. buffer[0] and [99] at
the low and high positions, yes. "DATA" is an implementation
detail (and probably not correct on implementations that happen
to use "BSS").
int main()
{
int count = 4; /* stored in main()'s stack frame */
float fcount= 4.0; /* stored in main()'s stack frame */

"Stack frame" is an implementation detail. Most
implementations use a stack, and the C language imposes a
LIFO ordering on the required lifetimes of `auto' variables,
but the language does not actually require an explicit stack.
static confusion = 8; /* stored in same region as buffer (DATA) */

May be stored anywhere at all, in the same region as `buffer'
or somewhere else, so long as it exists when main() is first called
and continues to exist until the program exits. "DATA" is an
implementation detail.
long lbuffer[50];
/* stored in main()'s stack frame, but since stack grows down, is
the field for buffer[0] still placed in the lowest memory address?
*/

"Stack frame" is an implementation detail. lbuffer[0]
and [49] are at the low and high ends, respectively, of the
memory occupied by `lbuffer', no matter where it is stored.
printf("\n%s\n",s);

int i; /* stored in main()'s stack frame, but when is memory
allocated? */

"Stack frame" is an implementation detail. Memory is
allocated (and deallocated) whenever the implementation
chooses, so long as `i' becomes allocated before it is used
and remains allocated until it is used no longer.
for(i=0; i<100; i++)
{
buffer = i;
}

return 0;

}

Additional questions:

- Since local variables don't have to be declared at the beginning of a
function, during run-time, is space for all local variables to be used
in any function allocated when the function is entered, or only when
they are used?


Different implementations behave differently. A conforming
C program cannot tell.
- Since main() is always the starting point for programs, do compilers
really put it's local variables in a main()-stackframe or simply place
it in fixed memory locations? How are static locals in main()
treated?

Compilers can do whatever they like, so long as the
variables exist when they are supposed to. All that I have
encountered use the same mechanisms for `auto' and `static'
variables in main() as they do for any other function.

Even though main() is the first function called when a
program starts, nothing prevents it from being called again,
recursively. Here's a stupid program to print its command-
line arguments in reverse order:

#include <stdio.h>
int main(int argc, char **argv) {
if (argc > 0) {
main(argc - 1, argv + 1);
puts (*argv);
}
return 0;
}
Thanking everyone in advance.

You're welcome. A piece of advice: It is usually better
to concentrate on *what* the implementation does with your
program than on *how* it does it. If you write carefully and
portably the former is constant, while the latter changes
from one implementation to the next.
 
C

Chris Torek

[This post is with regards computers/OSes with stacks that grow down.
i386/Unix is one possibility]

The C standard does not assume a downward-growing stack, nor even
an upward-growing stack. It merely requires that automatic (local)
variables behave in a "stack-like manner", if a function is called
recursively.

(Google's broken news-posting interface destroyed your indentation.
I have tried to restore it here.)
I have embedded my questions/assumptions in the the following sample
code:

include <stdio.h>

long num;
/* allocated a zero-init fixed memory location (.BSS) */

"BSS" is a system-specific (albeit common) method of implementing
this; C requires only that the variable exist as long as the program
runs, and be initialized to zero. Some implementations do the same
with this as with:

long num = 0;

because, e.g., they lack anything similar to the "bss region" found
on your example system.
char *s = "Hello world";
/* s allocated a fixed memory location and initialized
to address of 'H' (DATA). Meanwhile, the literal string
(12-byte buffer) is stored in a fixed WRITE ONLY area (TEXT) */

Surely you mean "read only" :) In typical Unix-like systems today,
the string literals are in a "read only data" section (".rodata"
and the like). C allows but does not require that the array produced
by the string literal be *physically* read-only, and some
implementations leave it write-able. The effect of writing on
elements of the array is undefined: it may trap, it may succeed
(changing the array), or it may silently fail (no trap but the
array remains unchanged), in the three "most typical" implementations,
but as far as Standard C is concerned, *anything* is allowed.
You -- the programmer -- are simply required not to do this,
in order for you to make any predictions about the operation of
your program.
short buffer[100];
/* buffer allocated in contiguous fixed memory locations (in DATA)
where buffer[0] is lowest memory address and buffer[99] is highest
memory address */

As with "num" above, C requires only that the variable exist as long
as the program runs, and be initialized to zero. On the implementation
you use, you will find that "buf" is also in the ".bss" section.

The question of "lower" and "higher" memory addresses suggests to
me that you are asking about machine-level interpretations, but C
does not give you direct access to machine-level interpretations.
Instead, Standard C pastes a (usually quite thin) layer of
abstraction atop the machine-level. You may compute pointers
pointing into the array named "buffer", e.g.:

short *p1 = &buffer[20];
short *p2 = &buffer[75];

and, given two pointers of compatible type pointing into this array,
you may compare them using any of the four relational operators:

int result1 = p1 < p2;
int result2 = p1 <= p2;
int result3 = p1 > p2;
int result4 = p1 >= p2;

Results 1 and 2 here are guaranteed to be zero if p1 is "not less
than" p2, and "not less than" means "has a lower subscript in the
array". In that sense, the elements of the array are indeed
addressed from lowest to highest.

(You can also, of course, use the equality operators: p1 == p2,
p1 != p2. I mention them separately because you can use them in
places you may *not* use the relational operators.)

There is nothing stopping an actual C compiler on real hardware
from putting buffer[0] at the hardware's highest physical memory
address, and working down towards lower addresses. In this case,
a C source expression like:

p1 < p2

might compile into a machine instruction that tests instead whether
p1 is greater than p2. (Practically speaking, this would be stupid
on today's hardware, and no one will do it, because it would also
require negating the index in expressions like buffer. But one
might do this on a machine in which ordinary array indexing works
by subtraction instead of addition -- and such machines have existed
in the past.)

At the C code level, then, it *is* the case that buffer[0] through
buffer[99] are in contiguous memory locations starting "at the
bottom" and "moving up", but there is no requirement that they be
physically contiguous (consider virtual-memory systems with small
page sizes), nor that a C-code level test like "p1 < p2" compile
to a machine instruction testing whether p1 is less than p2. In
C's abstract model, they are contiguous and ascending; the extent
to which C's abstract model matches what really happens on the
machine depends on both the C compiler and the machine.
int main()
{
int count = 4; /* stored in main()'s stack frame */
float fcount= 4.0; /* stored in main()'s stack frame */
static confusion = 8; /* stored in same region as buffer (DATA) */

Again, the C standard requires only that count and fcount work "as
if" they were on some kind of stack, and that "confusion" work "as
if" it were in that kind of data-segment: initially 8, and valid
throughout the lifetime of the program.
long lbuffer[50];
/* stored in main()'s stack frame, but since stack grows down, is the
field for buffer[0] still placed in the lowest memory address? */

Again, everything is "as if". Like count and fcount, lbuffer need
only exist as long as main() continues to execute, and if some
other function in your program calls main(), you must get "new
copies" of the variables, preserving the old copies that are still
around because the earlier call to main() is also still around (but
suspended until this copy of main() returns).

The direction of stack growth, if there is even a single stack[%]
that has a single growth direction, is irrelevant to C's abstract
model. C promises only that &lbuffer[0] < &lbuffer[1] and so on.
If you somehow manage to compare &lbuffer[23] relationally to
&buffer[72], no particular result is required. (The types of the
two buffers' elements do not match, so this requires a cast, which
potentially changes the value, which muddies the issue even more,
but never mind all that.) On the other hand, equality comparisons,
after conversion to a suitable type such as "char *" or "void *",
*are* required to produce "not equal":

if ((char *)&lbuffer[23] == (char *)&buffer[72])
abort(); /* never happens */
printf("\n%s\n",s);

int i; /* stored in main()'s stack frame, but when is memory allocated? */

Declarations after code are a C99 feature. Quite a few compilers
do not support this; in C89 you could use a new block:

printf("%s\n", s);
{
int i;
...
}

C requires only that the program behave "as if" i's lifetime begins
at its declaration and continues until execution reaches the "}" that
terminates its scope. In your C99-specific code, that is the final
close-brace for main(); in the C89 variant, it is the close-brace
inserted to match the open-brace I added here.
for(i=0; i<100; i++)
{
buffer = i;
}
return 0;
}

Additional questions:

- Since local variables don't have to be declared at the beginning of a
function, during run-time, is space for all local variables to be used
in any function allocated when the function is entered, or only when
they are used?


A C compiler can achieve the required behavior by allocating all
local variables at a function's entry, or by allocating them upon
reaching their enclosing block or (in C99) their initial definition.
There are merits and drawbacks to either method; you will find that
different C compilers choose different approaches.
- Since main() is always the starting point for programs, do compilers
really put it's local variables in a main()-stackframe or simply place
it in fixed memory locations? How are static locals in main()
treated?

In C (but not in C++ -- the languages are really quite different,
despite some syntactic similarities), you -- the programmer -- are
allowed to call main() recursively. If you do, it must behave just
like any other function. This makes it difficult for C compilers
to weasel out of creating a stack frame in the usual manner on
typical machines. (The compiler would have to determine that you
do not in fact call main() recursively; if so, it could rewrite
all the automatic variables in main() to have static-duration.
This determination is not all that hard, but such rewriting is also
not all that profitable -- the program is unlikely to be any faster
or smaller. So why bother?)

In both C and C++, it is possible -- via the atexit() function for
instance -- to do dumb things with variables whose lifetime terminates
when main() returns. Consider the following broken C code:

#include <stdio.h>
#include <stdlib.h>

static int *p;

void oops(void) {
printf("*p = %d\n", *p);
}

int main(void) {
int v = 42;
atexit(oops);
p = &v;
return 0;
}

Here, in the C abstract machine, atexit() registers the function
oops() to run when the program exits. Then we set p to point to
v, which is local to main(), and then we return from main(),
destroying the variable v. Now all atexit()-registered functions
are run, so oops() runs, and attempts to access *p -- but p points
to a variable whose lifetime has terminated. The effect is undefined:
the program is allowed to crash, or print 42, or print any other
number, or indeed do anything, such as post lies about your boss
to USENET. :)

We can fix the program by changing "int v" to "static int v". This
changes the storage duration from automatic to static, so that only
one copy of "v" exists no matter how many times we call main()
recursively (none, in this case), and "v" exists for the lifetime
of the program. Since the "lifetime of the program" continues
*past* the return of main(), while atexit() like oops() run, this
now makes a difference.

We can also fix the program by removing the call to atexit(), though
of course this stops the program from printing 42.

[% There is one quite substantial merit to having at least two
stacks, one for "control" -- return addresses and the like -- and
one for "data" such as local variables. In particular, a system
with two stacks offers the opportunity to debug programs that
overwrite local arrays. Because the data are in the "Dstack",
pointed to by the DSP or data-stack-pointer register, while the
control values are in the "Cstack" pointed to by the CSP or
control-stack-pointer register, and the two stacks are "far apart"
in memory, writing past your own DSP area clobbers only other DSP
memory. Breakpoints set via the CSP still cause the program to
stop where you want, and you can then observe the DSP corruption.

This design also interferes with typical Microsoft-bug-exploits:
buffer overruns no longer allow you to overwrite the return address.
The distance between CSP and DSP can be randomized on each run of
the program, as well.]
 
K

Keith Thompson

- Since local variables don't have to be declared at the beginning of a
function, during run-time, is space for all local variables to be used
in any function allocated when the function is entered, or only when
they are used?

They can be allocated whenever the compiler chooses to allocate them,
as long as they exist when they're used.
- Since main() is always the starting point for programs, do compilers
really put it's local variables in a main()-stackframe or simply place
it in fixed memory locations? How are static locals in main()
treated?

main() is generally treated like any other function. Local variables
within main() can't be allocated statically (at least not without a
lot of extra trickery) because main() can be called recursively.

A compiler could detect that main is never called recursively in a
given program and do something different, but I doubt that any
compilers actually do this. First, since main() can be called from a
separate translation unit, it can't be determined until link time.
Second, storing main()'s locals statically isn't likely to help
significantly anyway, so the optimization isn't even worth doing.

As for variables declared "static" within main(), again, these are
almost certainly treated the same way as static variables within any
other function.
 
I

imanpreet

[Google seems to have some problem, apologies if this posts multiple
times]

Chris said:
[This post is with regards computers/OSes with stacks that grow down.
i386/Unix is one possibility]

The C standard does not assume a downward-growing stack, nor even
an upward-growing stack. It merely requires that automatic (local)
variables behave in a "stack-like manner", if a function is called
recursively.

(Google's broken news-posting interface destroyed your indentation.
I have tried to restore it here.)
I have embedded my questions/assumptions in the the following sample
code:

include <stdio.h>

long num;
/* allocated a zero-init fixed memory location (.BSS) */

"BSS" is a system-specific (albeit common) method of implementing
this; C requires only that the variable exist as long as the program
runs, and be initialized to zero. Some implementations do the same
with this as with:

long num = 0;

because, e.g., they lack anything similar to the "bss region" found
on your example system.
char *s = "Hello world";
/* s allocated a fixed memory location and initialized
to address of 'H' (DATA). Meanwhile, the literal string
(12-byte buffer) is stored in a fixed WRITE ONLY area (TEXT) */

Surely you mean "read only" :) In typical Unix-like systems today,
the string literals are in a "read only data" section (".rodata"
and the like). C allows but does not require that the array produced
by the string literal be *physically* read-only, and some
implementations leave it write-able. The effect of writing on
elements of the array is undefined: it may trap, it may succeed
(changing the array), or it may silently fail (no trap but the
array remains unchanged), in the three "most typical" implementations,
but as far as Standard C is concerned, *anything* is allowed.
You -- the programmer -- are simply required not to do this,
in order for you to make any predictions about the operation of
your program.
short buffer[100];
/* buffer allocated in contiguous fixed memory locations (in DATA)
where buffer[0] is lowest memory address and buffer[99] is highest
memory address */

As with "num" above, C requires only that the variable exist as long
as the program runs, and be initialized to zero. On the implementation
you use, you will find that "buf" is also in the ".bss" section.


Nitpick: buffer, not buf

The question of "lower" and "higher" memory addresses suggests to
me that you are asking about machine-level interpretations, but C
does not give you direct access to machine-level interpretations.
Instead, Standard C pastes a (usually quite thin) layer of
abstraction atop the machine-level. You may compute pointers
pointing into the array named "buffer", e.g.:

short *p1 = &buffer[20];
short *p2 = &buffer[75];

and, given two pointers of compatible type pointing into this array,
you may compare them using any of the four relational operators:

int result1 = p1 < p2;
int result2 = p1 <= p2;
int result3 = p1 > p2;
int result4 = p1 >= p2;

Results 1 and 2 here are guaranteed to be zero if p1 is "not less
than" p2, and "not less than" means "has a lower subscript in the
array". In that sense, the elements of the array are indeed
addressed from lowest to highest.

(You can also, of course, use the equality operators: p1 == p2,
p1 != p2. I mention them separately because you can use them in
places you may *not* use the relational operators.)

There is nothing stopping an actual C compiler on real hardware
from putting buffer[0] at the hardware's highest physical memory
address, and working down towards lower addresses. In this case,
a C source expression like:

p1 < p2

might compile into a machine instruction that tests instead whether
p1 is greater than p2. (Practically speaking, this would be stupid
on today's hardware, and no one will do it, because it would also
require negating the index in expressions like buffer. But one
might do this on a machine in which ordinary array indexing works
by subtraction instead of addition -- and such machines have existed
in the past.)

At the C code level, then, it *is* the case that buffer[0] through
buffer[99] are in contiguous memory locations starting "at the
bottom" and "moving up", but there is no requirement that they be
physically contiguous (consider virtual-memory systems with small
page sizes), nor that a C-code level test like "p1 < p2" compile
to a machine instruction testing whether p1 is less than p2. In
C's abstract model, they are contiguous and ascending; the extent
to which C's abstract model matches what really happens on the
machine depends on both the C compiler and the machine.
int main()
{
int count = 4; /* stored in main()'s stack frame */
float fcount= 4.0; /* stored in main()'s stack frame */
static confusion = 8; /* stored in same region as buffer (DATA)
*/

Again, the C standard requires only that count and fcount work "as
if" they were on some kind of stack, and that "confusion" work "as
if" it were in that kind of data-segment: initially 8, and valid
throughout the lifetime of the program.
long lbuffer[50];
/* stored in main()'s stack frame, but since stack grows down, is the
field for buffer[0] still placed in the lowest memory address?
*/

Again, everything is "as if". Like count and fcount, lbuffer need
only exist as long as main() continues to execute, and if some
other function in your program calls main(), you must get "new
copies" of the variables, preserving the old copies that are still
around because the earlier call to main() is also still around (but
suspended until this copy of main() returns).

The direction of stack growth, if there is even a single stack[%]
that has a single growth direction, is irrelevant to C's abstract
model. C promises only that &lbuffer[0] < &lbuffer[1] and so on.
If you somehow manage to compare &lbuffer[23] relationally to
&buffer[72], no particular result is required. (The types of the
two buffers' elements do not match, so this requires a cast, which
potentially changes the value, which muddies the issue even more,
but never mind all that.) On the other hand, equality comparisons,
after conversion to a suitable type such as "char *" or "void *",
*are* required to produce "not equal":

if ((char *)&lbuffer[23] == (char *)&buffer[72])
abort(); /* never happens */


I am not sure if I agree with you completly on this one. The above is
supposed to be true _only_ as long as the indexes are within the limit
assigned to them. I wrote this test snippet for my Borland 5.0

#include <stdio.h>

int main(void)
{
int ibuff[1];
char cbuff[1];

if ( (char*)&ibuff[0] == (char*)&cbuff[0+0x42])
printf("OOps\n");
printf("%p %p\n", (char*) &ibuff[0], (char*)&cbuff[0] );
}
printf("\n%s\n",s);

int i; /* stored in main()'s stack frame, but when is memory
allocated? */

Declarations after code are a C99 feature. Quite a few compilers
do not support this; in C89 you could use a new block:

printf("%s\n", s);
{
int i;
...
}

C requires only that the program behave "as if" i's lifetime begins
at its declaration and continues until execution reaches the "}" that
terminates its scope. In your C99-specific code, that is the final
close-brace for main(); in the C89 variant, it is the close-brace
inserted to match the open-brace I added here.
for(i=0; i<100; i++)
{
buffer = i;
}
return 0;
}

Additional questions:

- Since local variables don't have to be declared at the beginning of a
function, during run-time, is space for all local variables to be used
in any function allocated when the function is entered, or only when
they are used?


A C compiler can achieve the required behavior by allocating all
local variables at a function's entry, or by allocating them upon
reaching their enclosing block or (in C99) their initial definition.
There are merits and drawbacks to either method; you will find that
different C compilers choose different approaches.
- Since main() is always the starting point for programs, do compilers
really put it's local variables in a main()-stackframe or simply place
it in fixed memory locations? How are static locals in main()
treated?

In C (but not in C++ -- the languages are really quite different,
despite some syntactic similarities), you -- the programmer -- are
allowed to call main() recursively. If you do, it must behave just
like any other function. This makes it difficult for C compilers
to weasel out of creating a stack frame in the usual manner on
typical machines. (The compiler would have to determine that you
do not in fact call main() recursively; if so, it could rewrite
all the automatic variables in main() to have static-duration.
This determination is not all that hard, but such rewriting is also
not all that profitable -- the program is unlikely to be any faster
or smaller. So why bother?)

In both C and C++, it is possible -- via the atexit() function for
instance -- to do dumb things with variables whose lifetime terminates
when main() returns. Consider the following broken C code:

#include <stdio.h>
#include <stdlib.h>

static int *p;

void oops(void) {
printf("*p = %d\n", *p);
}

int main(void) {
int v = 42;
atexit(oops);
p = &v;
return 0;
}

Here, in the C abstract machine, atexit() registers the function
oops() to run when the program exits. Then we set p to point to
v, which is local to main(), and then we return from main(),
destroying the variable v. Now all atexit()-registered functions
are run, so oops() runs, and attempts to access *p -- but p points
to a variable whose lifetime has terminated. The effect is undefined:
the program is allowed to crash, or print 42, or print any other
number, or indeed do anything, such as post lies about your boss
to USENET. :)

We can fix the program by changing "int v" to "static int v". This
changes the storage duration from automatic to static, so that only
one copy of "v" exists no matter how many times we call main()
recursively (none, in this case), and "v" exists for the lifetime
of the program. Since the "lifetime of the program" continues
*past* the return of main(), while atexit() like oops() run, this
now makes a difference.

We can also fix the program by removing the call to atexit(), though
of course this stops the program from printing 42.

[% There is one quite substantial merit to having at least two
stacks, one for "control" -- return addresses and the like -- and
one for "data" such as local variables. In particular, a system
with two stacks offers the opportunity to debug programs that
overwrite local arrays. Because the data are in the "Dstack",
pointed to by the DSP or data-stack-pointer register, while the
control values are in the "Cstack" pointed to by the CSP or
control-stack-pointer register, and the two stacks are "far apart"
in memory, writing past your own DSP area clobbers only other DSP
memory. Breakpoints set via the CSP still cause the program to
stop where you want, and you can then observe the DSP corruption.

This design also interferes with typical Microsoft-bug-exploits:
buffer overruns no longer allow you to overwrite the return address.
The distance between CSP and DSP can be randomized on each run of
the program, as well.]


Point well taken, but I believe it's not strictly a Microsoft thingy.
AFAIK Linux and for that matter x86 based systems save the return
address within the current stack itself only.



--
Imanpreet Singh Arora

If I am given 6 hours to chop a tree, I would spend the
first 4 to sharpen my axe.
Abraham Lincoln
 
C

Chris Torek

Nitpick: buffer, not buf

Oops, quite right.
... On the other hand, equality comparisons,
after conversion to a suitable type such as "char *" or "void *",
*are* required to produce "not equal":

if ((char *)&lbuffer[23] == (char *)&buffer[72])
abort(); /* never happens */

I am not sure if I agree with you completly on this one. The above is
supposed to be true _only_ as long as the indexes are within the limit
assigned to them.

Yes (but I made sure that this was the case in the example above).
Given an array "a" of size N, a+0 (&a[0]) through a+N (&a[N]) are
all valid, computable addresses[%] that are all different, but only
&a[0] through &a[N-1] are guaranteed to be distinct from other
objects' addresses, and it is not at all unusual for a+N to have
the same address (when converted to "char *") as some other object.
-----
% There is some brokenness in the C89 wording that makes "a + N"
OK but "&a[N]" not OK. This is fixed in C99, and possibly even
via some intermediate update to C89. It is probably better to
write it as a+N anyway, just in case, so I did here.
-----

[on separating control and data stacks]
This design also interferes with typical Microsoft-bug-exploits:
buffer overruns no longer allow you to overwrite the return address.
The distance between CSP and DSP can be randomized on each run of
the program, as well.]
Point well taken, but I believe it's not strictly a Microsoft thingy.
AFAIK Linux and for that matter x86 based systems save the return
address within the current stack itself only.

Yes, no doubt because the x86 instruction architecture "strongly
encourages" this (by making it easy to use one stack, and quite
difficult to use two separate ones). Not all other architectures
are so hobbled -- although one still finds a single combined stack
even where there is no "hardware encouragement", e.g., on the MIPS.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top