Why are variables stored on the stack?

R

Richard Heathfield

Keith Thompson said:

The unit of allocation is a collection of automatic objects declared
in a particular scope. That collection is conceptually created in
what can be thought of as a single "push" operation, and destroyed in
what can be thought of as a single "pop" operation. In that sense,
the language rules do imply a stack (an abstract last-in first-out
data structure, however it's implemented).

Well, again that isn't quite true. Consider, for example, the following
code:

{
int a = foo();
int b = bar();
if(a + b < c)
{
int d = c - (a + b); /* X */
baz(d * quux() / (d - foo()));
quuy(&d);
quuz(d * quua());
}
}

The line marked X is the last line in which a and b are used. The Standard
therefore allows implementations licence to destroy a and b /before/ it
destroys d - for example, a and b can be history by the time the sequence
point in line X occurs, even though d still exists, because the program
still behaves "as if" a and b exist, and no strictly conforming program
can tell the difference (because if it could, obviously the compiler
wouldn't do the early destruction in the first place). By virtue of its
"as if" rule, the C language *does not mandate* LIFO auto allocation.
 
R

Richard Heathfield

Rod Pemberton said:
Keith,

How so? How is the OP's usage of "insist" any different from Ritchie's
usage of "suggest strongly"? Ritchie said the exactly same thing that
the OP asked. So, how can you claim the OP is wrong without proclaiming
Ritchie was wrong also?

Dennis Ritchie wrote the cited text in 1978, so not only is it
non-normative, but it also pre-dates the ISO C language specification by
over a decade. The fact remains that the internationally agreed language
spec doesn't require implementations to use a stack. They probably will,
but they don't have to. If an implementor finds a clever way to speed
things up that requires him /not/ to use a stack, C allows him to take
advantage of that cleverness. I can't understand why several people in
this discussion seem to be trying to deprive implementors of this licence.
 
R

Richard Heathfield

Rod Pemberton said:

Richard,

How so? How is the OP's usage of "insist" any different from Ritchie's
usage of "suggest strongly"?

This looks familiar. I think you posted exactly the same question
elsethread, yes? I answered it there.
 
K

Keith Thompson

Ian Collins said:
Come on then, put up or shut up, where did Keith say that?

I'm fairly sure I've never said that. In fact, I've criticized
CBFalconer for saying that. It's arguably a true statement (for one
particular meaning of the word "stack"), and I think we both
understand what he means by it, but it's also incomplete and
potentially misleading.

If you had jumped on somebody responding to the original post in this
thread for saying "C has no stack", you might have a point. Instead,
you jumped on people for saying that C does not *insist* on storing
local variables on the stack. Those responses were both correct (the
C language doesn't insist on any such thing) and very relevant to the
original question.
 
K

Keith Thompson

Rod Pemberton said:
Keith,

How so? How is the OP's usage of "insist" any different from Ritchie's
usage of "suggest strongly"? Ritchie said the exactly same thing that the
OP asked. So, how can you claim the OP is wrong without proclaiming Ritchie
was wrong also?

There's a big difference between "insist" and "suggest strongly". The
former implies a requirement; for example, C insists that a byte is at
least 8 bits.
"Any function in C may be recursive (without special declaration) and most
possess several 'automatic' variables local to each invocation. These
characteristics suggest strongly that a stack must be used to store the
automatic variables, caller's return point, and saved registers local to
each function; in turn, the attractiveness of an implementation will depend
heavily on the ease with which a stack can be maintained."

"Portability of C Programs and the UNIX System" SC Johnson and DM Ritchie
http://cm.bell-labs.com/cm/cs/who/dmr/portpap.html

Which is why, as has been acknowledged repeatedly in this thread, most
implementations do use a contiguous hardware stack. But there is no
requirement that they do so. Also, (a) the paper was written in 1978;
things have changed since then; (b) the paper is about C and UNIX,
which were more closely tied together than they are now; and (c) I'm
not sure that Johnson and Ritchie were necessarily referring to a
contiguous hardware stack. I suspect they would have acknowledged
that anything scheme that works in a last-in first-out fashion would
work. Note that they refer to "a stack", not to "the stack" (though
perhaps I'm reading too much into that distinction).

And it's not impossible that Ritchie could have been wrong (and even
more likely that I might disagree with him). I don't recall that he's
ever claimed to be infallible; note that the errata list for K&R is
not empty.
 
K

Keith Thompson

santosh said:
Keith Thompson wrote:


Shadowing of an object in an outer scope by another of the same name in
an inner scope seems to imply a LIFO.
[...]

Perhaps I'm missing something, but I don't see your point. The
stack-like nature of local variable (strictly speaking, automatic
object) allocation is about lifetime, not visibility. I suppose
there's probably a stack-like data structure that's part of the
compile-time symbol table, but that's not what we've been discussing.
 
K

Keith Thompson

Richard Heathfield said:
Keith Thompson said:


Well, again that isn't quite true. Consider, for example, the following
code:

{
int a = foo();
int b = bar();
if(a + b < c)
{
int d = c - (a + b); /* X */
baz(d * quux() / (d - foo()));
quuy(&d);
quuz(d * quua());
}
}

The line marked X is the last line in which a and b are used. The Standard
therefore allows implementations licence to destroy a and b /before/ it
destroys d - for example, a and b can be history by the time the sequence
point in line X occurs, even though d still exists, because the program
still behaves "as if" a and b exist, and no strictly conforming program
can tell the difference (because if it could, obviously the compiler
wouldn't do the early destruction in the first place). By virtue of its
"as if" rule, the C language *does not mandate* LIFO auto allocation.

Fair enough. And by the "as if rule", if the implementation can prove
that no function is ever called recursively, it can use static storage
for everything. But in the abstract machine, something LIFOish is
necessary. To put it another way, a program must behave *as if*
automatic allocation is done in a LIFO manner.
 
K

Keith Thompson

Ben Bacarisse said:
Note that I say "obvious". I think it is possible to construct a
contrived but conforming implementation in which there is almost no
identifiable LIFO structure (it will still be there in ghostly
form due to the wording of the standard). I'll outline it for
variables in functions, but it needs to be extended to nested blocks
as well.

Every function starts with code that collects the time from a counter
(maybe click ticks). This value is concatenated with the variable
name to make a hash key. In g() above, an assignment 'a = 42;' will
lookup a@234 if the call to g started at cycle 234. Functions save
their return address in a special variable so, when g() calls f() at
cycle 336 a variable RETURN-FROM@336 holds the location in g() where
f() was called from, and all of f()'s auto variables will have the
form <name>@336.

A special variable (hash key), CURRENT-CALL-CYCLE, stores the cycle
the current call stated on. This means that to find a variable we
construct sprintf("%s@%d", var_name, lookup("CURRENT-CALL-CYCLE"))
and look that up.

So far, no LIFO in sight -- not even in the return addresses. But all
good things must come to an end. When the call to f() at cycle 336
returns, control is passed to the location in RETURN-FROM@336. There
the compiler had planted code to set CURRENT-CALL-CYCLE back to 234.
(This requires a writable code segment but, hey, we are in fantasy
land here so does it matter if we do that too? An extra variable can
avoid any writable code, but it make the LIFO nature of the storage
scheme more obvious so I've not used it.)

Your scheme reminiscent of the old PDP-8 calling convention, which
stored the return address at the beginning of the subroutine's code.
It couldn't handle recursion, and I don't see how your method would
handle recursion either. If you have 10 simultaneous invocations of a
function, you need to store 10 distinct return addresses somehow.
It is very hard to see the LIFO structure here. It *is* there, of
course, in the odd variable names and ever-changing numbers in the
code stream, but it is there only in a ghostly way. I doubt, though,
that anyone could call this a stack.

It quacks like a stack, so yes, I'd call it (a rather unconvention
implementation of) a stack.
 
W

Willem

Keith wrote:
) It quacks like a stack, so yes, I'd call it (a rather unconvention
) implementation of) a stack.

But not _the_ stack. See the subject header.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
B

Ben Bacarisse

Keith Thompson said:
Your scheme reminiscent of the old PDP-8 calling convention, which
stored the return address at the beginning of the subroutine's code.
It couldn't handle recursion, and I don't see how your method would
handle recursion either. If you have 10 simultaneous invocations of a
function, you need to store 10 distinct return addresses somehow.

Yes, you are right, but it is not the return addresses that are the
problem (there will be at least 10 of then called RETURN-FROM@xxxx).
I went too far in trying to obscure things. The parenthetical remark
at the end needs to be made essential. In other words each function
call needs to store the old value of CURRENT_CALL_CYCLE so it can be
restored from a variable associated with the function activation
rather than from code written at the time of the call.

Having finally got some sleep, I don't think there is any value in
showing obscure implementation options. I think there is enough scope
for miss-communication on this topic for it to run forever. I can't
help wondering whether, if the main protagonists were in a room
together, there would be an argument at all (about this topic).
 
E

Eric Sosman

jacob said:
Keith said:
[...]
You insisted that this correct response was wrong. And that is what
led to this entire wasteful argument.

Yes, discussing with you and falconer is a waste of time
You just say

"C has no stack"

Can you provide a reference to the posting where
Keith made this statement?
 
G

Gordon Burditt

The C standard does not require a C implementation to have bugs.
The C standard doesn't even define the term "bug".

Does this mean that C has no bugs? No.

Does this mean that a given C implementation has no bugs? No, although
not requiring an implementation to have bugs doesn't mean it's required
to have no bugs.

Does this mean that someone is wrong for saying "Most C implementations
have bugs"? No.
 
C

CBFalconer

Richard said:
Rod Pemberton said:



This looks familiar. I think you posted exactly the same question
elsethread, yes? I answered it there.

Again, time for a thread plonk.
 
C

Chris Torek

Which is precisely what makes the stack vulnerable to exploitation.
Most stack-oriented implementations store the return address of the
caller on the stack. Overflow the stack and you can make the program
execute arbitrary code.

This happens primarily because these implementations mix "control
stack" with "data stack". If one keeps the two separate -- uses
a control stack for return addresses, and a data stack for local
variables -- it becomes significantly more difficult to overwrite
control-stack information this way. (Placing guard pages between
the two stack is a good idea as well.)
... On Modified Harvard Architecture machines it would not be a
problem since the code space cannot be modified and smashing a
stack on them would most likely simply crash the system.
While this is a DoS on such a system, it can't be used to execute
arbitrary code.

We have actually implemented this (by "we" I mean myself and others
I have worked with in the past, although modern Linux has features
that, with appropriate hardware, also provide non-executable stack
segments). It is a valuable defense technique, whether done on
its own or combined with a separate data-and-control stacks. But
it is not perfect: if a program already contains code that can be
"tricked" into peforming operations it should not, one need only
overwrite the control stack such that control passes to this code,
tricking it into doing the wrong thing.

Eliminating all stacks entirely does not solve this problem either.
I used a FORTRAN (F77) implementation on an IBM mainframe in which
careful use of out-of-bounds array subscripts could modify an
array reference:

TEMP = ARR(1)

into a function call. Loading the array with the desired machine
code caused the implementation to execute that code. The F77
compiler used the equivalent of C's "static duration" for all its
variables, so all data was on what one would call "the heap" in a
more modern system.
... Most processors have a stack pointer and associated PUSH and
POP instructions ...

I would say "many" rather than "most", as there are good number of
processor architectures that have neither a dedicated stack-pointer
register nor special "push" and "pop" instructions (although the
ABIs for systems on these generally designate one of the machine
registers as a stack pointer anyway). Some have architecturally-
specified stack pointers but no push-and-pop (e.g., SPARC).
Other computer languages can and do use the stack

Or several stacks (Forth and Postscript, for instance, both
require at least two stacks).
if it outperforms other methods of storing temporary variables or
return addresses.

It usually does, which is why stacks are so common.
 
R

Richard Bos

jacob navia said:
This is wrong. Most C implementations use the hardware stack

Well, here's the thing: you're both right. Willem is right that C
doesn't require the use of the (single, hardware) stack. You are right
that C implementors tend to use that stack anyway. So the OP really
shouldn't blame C itself for all those buffer overflow attacks, but the
authors of the C implementations which do use that stack.

Richard
 
D

Dik T. Winter

> Richard wrote:
> )> Exactly which concept in C is explained easier by introducing a stack?
> )
> ) Just about everything. I am amazed you have to ask this. You have been
> ) in CLC too long.
>
> You're dodging the question.
>
> PS: It may be true that a lot of 'how does it do it' questions are most
> easily answered by 'it uses a stack', but for C programming concepts as
> such, I see no need.

Moreover, I think that when somebody needs C to be explained and you wish
to use the stack in explaining, you have first to explain the stack.
 
S

santosh

Richard said:
So the OP really
shouldn't blame C itself for all those buffer overflow attacks, but
the authors of the C implementations which do use that stack.

Authors of C programs which overflow.
 
D

Dik T. Winter

> You, like jacob, are ignoring the fact the word "stack" has more than
> one meaning.
>
> C does require a last-in first-out data structure of some kind to
> implement local variables; this can reasonably be called "a stack".
> It does not require (but most implementations use) a contiguous
> hardware stack, sometimes referred to as "the stack". Can we all
> agree on that?

I agree with that, but the OP probably meant the second sense, because
he wants to distinguish between allocation on the stack and allocation
on the heap. When using stack in the first sense, such a distinction
is irrelevant.
 
K

Kenneth Brody

CJ said:
Hello:

We know that C programs are often vulnerable to buffer overflows which
overwrite the stack.

But my question is: Why does C insist on storing local variables on the
stack in the first place?

I can see two definite disadvantages with this:
1) deeply nested recursive calls to a function (especially if it defines
large local arrays) can easily overflow the stack
2) the problems described above of security vulnerabilities.

My solution would be for C instead to store its local variables on the
heap - effectively separating data from executable code.

What do people think?

Well, ignoring the "C insist on" part (C doesn't "insist" on any
such things -- it's just a very common way of doing things on many
systems), your "solution" would:

Have a great impact on performance, as malloc/free must be used
inside every function that has local storage.

Would require a lot of overhead to handle setjmp/longjmp around
any such calls, in order to free their local storage.

Would simply move the buffer overrun problem from the "stack" to
the "heap".

Of course, you are free to make a C implementation that does exactly
what you describe. Remember, C doesn't "insist" on any of what you
say about local variables and a "stack".

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 
K

Kenneth Brody

jacob said:
Please stop confusing people by using word games.
I have yet to see a SINGLE example of an implementation that
doesn't use a stack for the local variables. Yes, a single
one.

Until now, there wasn't any that the regulars could put forward.

While I haven't looked into this in great detail, here is a likely
candidate:

http://www.cc65.org/

As I recall, the 6502 has only 256 bytes for its stack.
(Obviously in machines running now, and having a certain
minimum size. Coffee machines with less than 1K of
RAM and similars do not count)

Just curious why you eliminate such things?

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top