Why are variables stored on the stack?

C

cr88192

CJ said:
Hello:

We know that C programs are often vulnerable to buffer overflows which
overwrite the stack.

But my question is: Why does C insist on storing local variables on the
stack in the first place?

as per the standard, it does not.

as per implementations:
it matches established practice and calling conventions (aka: in 32-bit land
code from one compiler very often links acceptably with that from another);
in the general case, this allows the greatest performance (the stack is
usually a dedicated register into a sliding region of memory, and can thus
be adjusted very quickly).

I can see two definite disadvantages with this:
1) deeply nested recursive calls to a function (especially if it defines
large local arrays) can easily overflow the stack
2) the problems described above of security vulnerabilities.

1. whatever is done, sufficiently deep recursion will break something (be it
a stack overflow or running out of heap).
2. security vulnerabilities will still exist, though they will be mildly
reduced.

My solution would be for C instead to store its local variables on the
heap - effectively separating data from executable code.

What do people think?

simply for partly addressing security issues, a compiler could concievably
treat arrays specially, namely by moving them off the main stack, and
possibly implementing bounds checking (for common cases). if done well, this
could potentially be done with only a minor performance impact. note that
ordinary local variables would likely remain on the stack.

if one does move the locals off the stack (actually, I had considered partly
doing this eventually for the sake of implementing lexical closures), then
they could go "all the way", essentially ending nearly all use of the main
stack (apart from possibly temporary values or similar), which would allow
implementation of many features, such as closures, call/cc, more effective
use of tail-elimination, ...

the big cost would be, for a language like C, this would incur a notable
performance cost (and, very likely, tightly couple the compiled code and the
runtime, making compilation of stand-alone code very problematic).

however, for such a compiler, one "could" possibly make use of a hybrid
approach, using good old stack-frames wherever it can be "proven" that it is
safe to do so (functions are leaf and don't use any advanced features, or
can be verified not use and such features and only call functions with this
same property).

basically, we allow both performance and call/cc, by proving that call/cc,
closures, or anything like them, occur nowhere within the possible call
graph from this point downward (could be very difficult in practice, given
tracability issues, possible use of function pointers, ...).


so, in short, this would be a very expensive feature (but still something I
may pursue at some point, noting that my compiler is primarily JIT-based so
this is acceptable, but likely not so in a more traditional stand-alone
compiler...).


or such...
 
J

Jack Klein

Hello:

We know that C programs are often vulnerable to buffer overflows which
overwrite the stack.

Not on most C compilers for 8051 architecture.
But my question is: Why does C insist on storing local variables on the
stack in the first place?

As has been said to death, C does not. Quite a few C compilers
specifically do not. I know of at least one architecture where it is
quite impossible, as the stack is completely inaccessible to
instructions other than call and return.
I can see two definite disadvantages with this:
1) deeply nested recursive calls to a function (especially if it defines
large local arrays) can easily overflow the stack
2) the problems described above of security vulnerabilities.

My solution would be for C instead to store its local variables on the
heap - effectively separating data from executable code.

What do people think?

I think I can see two definite disadvantages with people pontificating
about subjects in which they have insufficient. Deducing what they
are is left as an exercise to the reader.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html
 
K

Kenny McCormack

Richard Tobin said:
The C standard does not insist on a stack. Almost all implementations
do. The OP is unlikely to know that some people here will insist on
interpreting "C" as "the C standard". You could have perfectly well
made it clear with accusing Jacob of lying, which he is obviously not.

Oh oh. Clique membership in jeopardy.
 
R

Richard Heathfield

Richard Tobin said:
The C standard does not insist on a stack.
Right.

Almost all implementations do.

But that isn't what was asked.
The OP is unlikely to know that some people here will insist on
interpreting "C" as "the C standard".

Then inform him. Or, if you prefer, inform him of the whole sorry mess:
K&R, C89, C90, C95, C99, C99+TCs, C0x... - and then tell him that "C"
means "the programming language specified by one or more of these
standards".
You could have perfectly well
made it clear with accusing Jacob of lying,

Presumably s/with/without/
which he is obviously not.

He knows perfectly well that C does not insist on a stack, yet he claims
otherwise. What would *you* call that? Being parsimonious with the truth?
Creative ineptitude? Truth redefinition?

If he means that the vast majority of C implementations use a stack, he
should say that. There is a huge difference between "the rules say you
must" and "you choose to because it seems to be a good idea".
 
H

Harald van Dijk

You don't have to drag your endless dispute with Jacob into *every*
thread.

Excuse me? You might want to re-read the thread. I didn't drag anything
in here.
The C standard does not insist on a stack. Almost all implementations
do. The OP is unlikely to know that some people here will insist on
interpreting "C" as "the C standard".

The OP ("CJ", as should have been mentioned in the attributions) is no
stranger here, and has asked questions why things are or aren't in the
standard in the past here in c.l.c. I was pretty sure that's what he's
asking now.
You could have perfectly well
made it clear with accusing Jacob of lying,

Yes, and I should have, but...
which he is obviously not.

....I'm not convinced one way or the other. However, jacob, I do apologise.
 
R

Rod Pemberton

CJ said:
But my question is: Why does C insist on storing local variables on the
stack in the first place?

I think the following is more informative than all the other responses
you've gotten so far:

"Any function in C may be recursive (without special declaration) and most
possess several 'automatic' variables local to each invocation. These
characteristics suggest strongly that a stack must be used to store the
automatic variables, caller's return point, and saved registers local to
each function; in turn, the attractiveness of an implementation will depend
heavily on the ease with which a stack can be maintained."

"Portability of C Programs and the UNIX System" SC Johnson and DM Ritchie
http://cm.bell-labs.com/cm/cs/who/dmr/portpap.html


Rod Pemberton
 
K

Keith Thompson

jacob navia said:
Only if you can execute code in the stack

Not necessarily. Assuming a typical implementation with a hardware
stack, a buffer overrun can clobber a function's return address; if
this is done maliciously, it can cause the program to resume execution
at any desired location in memory.
The principal reason is efficiency. Stack allocation is very fast,
in most cases just a single machine instruction. Deallocation is equally
fast, with a single instruction.

Yes, that's why C *implementations* typically use a hardware stack.

It's important to be aware of the difference between the *language*
(an abstraction defined by the standard) and an *implementation* (a
concrete entity that meets the requirements of the standard). The
latter very often uses a hardware stack; the former doesn't even
mention such a thing.

[...]

Incidentally, I think there are two distinct kinds of "overflow" that
might be causing some confusion. A stack overflow occurs when (again,
assuming a stack-based implementation) a function call or other
operation requires more stack space than is available. If the stack
cannot be extended, this typically results in the immediate
termination of the program, not in data corruption. A buffer overrun,
on the other hand, is an attempt to access memory beyond what's been
allocated; this can cause nearly arbitrarily bad things to happen.
 
K

Keith Thompson

jacob navia said:
I have yet to see a SINGLE example of an implementation that
doesn't use a stack for the local variables. Yes, a single
one.

Until now, there wasn't any that the regulars could put forward.

(Obviously in machines running now, and having a certain
minimum size. Coffee machines with less than 1K of
RAM and similars do not count)

For the umpteenth time, it depends on what you mean by "stack". If
you mean an abstract last-in first-out data structure, then the
semantics of C function calls require a stack (but it's clear that
that's not what the original poster in this thread was referring to).
If you mean a typical contiguous hardware stack managed via a stack
pointer, at least one example of an implementation that *doesn't* use
such a thing has been mentioned here many times, namely an IBM
mainframe system that allocates function activation records on the
heap (or something similar).
 
H

Harald van Dijk

Richard Tobin said:

Presumably s/with/without/


He knows perfectly well that C does not insist on a stack, yet he claims
otherwise. What would *you* call that? Being parsimonious with the
truth? Creative ineptitude? Truth redefinition?

A misunderstanding, hopefully. I agree with Richard Tobin that I
shouldn't have accused jacob navia of lying.
 
M

Malcolm McLean

Rod Pemberton said:
I think the following is more informative than all the other responses
you've gotten so far:

"Any function in C may be recursive (without special declaration) and most
possess several 'automatic' variables local to each invocation. These
characteristics suggest strongly that a stack must be used to store the
automatic variables, caller's return point, and saved registers local to
each function; in turn, the attractiveness of an implementation will
depend
heavily on the ease with which a stack can be maintained."
And if a function is recursive there's often no easy way of protecting the
top of the stack from overflow, because the depth of recursion tends to be
controlled by the input.
However a stack overflow is less likely than a buffer overrun into the stack
to be exploitable. When user can overwrite a return address and put
user-defiined bytes that the place the new return points to then you've got
either a security hole or the mother of all user-configurable programs.
 
W

Willem

jacob wrote:
) Willem wrote:
)> CJ wrote:
)> ) But my question is: Why does C insist on storing local variables on the
)> ) stack in the first place?
)>
)> It doesn't. Your question is moot.
)>
)>
)> SaSW, Willem
)
) This is wrong. Most C implementations use the hardware stack

It is perfectly correct. You should update your reading skills.
C does not ***INSIST*** on storing local variables on the stack.

'Insist' is something that can be said of requirements and/or standards.
It is *not* something you say of an *implementation*.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
S

santosh

Willem said:
jacob wrote:
) Willem wrote:
)> CJ wrote:
)> ) But my question is: Why does C insist on storing local variables
on the )> ) stack in the first place?
)>
)> It doesn't. Your question is moot.
)>
)>
)> SaSW, Willem
)
) This is wrong. Most C implementations use the hardware stack

It is perfectly correct. You should update your reading skills.
C does not ***INSIST*** on storing local variables on the stack.

C does insist that automatic variables be treated in LIFO manner with
regard to their lifetimes. However this LIFO characteristic needn't be
implemented with a LIFO data structure, I think.
'Insist' is something that can be said of requirements and/or
standards. It is *not* something you say of an *implementation*.

A non-conforming implementation can insist on doing things it's own way.
 
W

Willem

santosh wrote:
) C does insist that automatic variables be treated in LIFO manner with
) regard to their lifetimes. However this LIFO characteristic needn't be
) implemented with a LIFO data structure, I think.

You can store all automatic variables in a malloc()ed block, and store
the pointer to that on the stack, for example.

You could also have one stack for call/return, and one for automatic
storage, but the OP's wording 'using _the_ stack' rules this out.

That's probably what started this flamefest. 'using _the_ stack',
together with the word 'insist'.

)> 'Insist' is something that can be said of requirements and/or
)> standards. It is *not* something you say of an *implementation*.
)
) A non-conforming implementation can insist on doing things it's own way.

Well yes, you could say it that way. You could also say that it's the
implementation writers, or the requirements document.

In any case, the wording 'C insists on X' specifically does not mean
'most implementations of C do X'.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
S

santosh

Willem said:
santosh wrote:
) C does insist that automatic variables be treated in LIFO manner
with ) regard to their lifetimes. However this LIFO characteristic
needn't be ) implemented with a LIFO data structure, I think.

You can store all automatic variables in a malloc()ed block, and store
the pointer to that on the stack, for example.

You could also have one stack for call/return, and one for automatic
storage, but the OP's wording 'using _the_ stack' rules this out.

Actually I was wondering if a conforming C implementation could be
written, and a conforming C program compiled and run, without *any* use
of a LIFO data type.

<agree with the rest>
 
J

jacob navia

Keith said:
For the umpteenth time, it depends on what you mean by "stack". If
you mean an abstract last-in first-out data structure, then the
semantics of C function calls require a stack (but it's clear that
that's not what the original poster in this thread was referring to).
If you mean a typical contiguous hardware stack managed via a stack
pointer, at least one example of an implementation that *doesn't* use
such a thing has been mentioned here many times, namely an IBM
mainframe system that allocates function activation records on the
heap (or something similar).


This is wrong. All mainframe implementations use a contiguous
memory area that is accessed by a dedicated register. This
register is increased in the function's prologue and decreased
in the functions epilogue. The machine has no universally
dedicated stack registers but within C it HAS A STACK as I proved with
links to the mainframe's compiler documentation A NUMBER
OF TIMES ALREADY

Please look it up and stop telling stories.

Thanks

Here is my message from the last discussion we had about this:
-------------------------------------------------------------------
[snip]
There is nothing more "big iron" that the IBM mainframes... at least
within my limited experience since I quit using that environment
in 1984.

The C compiler for the IBM mainframes is "C for VM/ESA", a C89
compiler.

We find in the users guide
http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/cbcvpg00/CCONTENTS

3.1.4.3 Accessing Automatic Memory
Use the EDCDSAD macro to access automatic memory. Automatic memory is
reserved using the USRDSAL, or the DSALEN operand of the EDCPRLG macro.
The length of the allocated area is derived from the ulen and/or dlen
values specified on the EDCPRLG macro. EDCDSAD generates a DSECT, which
reserves space for the *stack frame* needed for the C environment.

<end quote>

I repeat: "... the stack frame needed for the C environment".

The documentation goes on to specify that register 13 (R13) is used to
address this memory.
 
F

Flash Gordon

santosh wrote, On 15/03/08 09:53:
Actually I was wondering if a conforming C implementation could be
written, and a conforming C program compiled and run, without *any* use
of a LIFO data type.

How about this...
Rather than using a malloc()ed block use a GC_malloc()ed block. The
return address being passed in a register, and if the called function
needs to it saves it in its local block. Then on return it calls a
GC_partial_collect() function which does a partial scan freeing some
blocks, but not necessarily all and not necessarily the Last In block.
 
C

CJ

Thanks for all the replies, this is an interesting discussion.

Here are a couple of points that occur to me:

1) Buffer overflows are a more serious security problem on the stack
than on the heap, because the program counter is stored on the stack and
not the heap, so that a malicious stack overflow can execute arbitrary
code. The heap is used for data exclusively, which is what I meant by
"separate data from executable code".

Even if a buffer on the heap overflows, the worst that can happen is
some (probably insignificant) data corruption. Since malloc() generally
allocates space in powers of 2, often an off-by-one error or similar
won't overwrite anything anyway, but will just land in the gap between
the end of the buffer and the next power of 2.

2) I believe the argument about it being more efficient to use the stack
than the heap is spurious - if I recall, both are O(N) data structures.
 
S

santosh

CJ said:
Thanks for all the replies, this is an interesting discussion.

Here are a couple of points that occur to me:

1) Buffer overflows are a more serious security problem on the stack
than on the heap, because the program counter is stored on the stack
and not the heap, so that a malicious stack overflow can execute
arbitrary code. The heap is used for data exclusively, which is what I
meant by "separate data from executable code".

Even if a buffer on the heap overflows, the worst that can happen is
some (probably insignificant) data corruption. Since malloc()
generally allocates space in powers of 2, often an off-by-one error or
similar won't overwrite anything anyway, but will just land in the gap
between the end of the buffer and the next power of 2.

2) I believe the argument about it being more efficient to use the
stack than the heap is spurious - if I recall, both are O(N) data
structures.

So? If the "heap" were to be used for all C's auto variables in the
suggested LIFO manner, then it *becomes* the stack. In fact that's how
modern systems operate. The "stack" is simply a defined region of
memory, demarcated by two rather special registers. This still doesn't
prevent malicious code execution.

And overwriting the return address is just one possible way to execute
malicious code.
 
R

Richard Heathfield

CJ said:
Thanks for all the replies, this is an interesting discussion.

Here are a couple of points that occur to me:

1) Buffer overflows are a more serious security problem on the stack
than on the heap, because the program counter is stored on the stack and
not the heap, so that a malicious stack overflow can execute arbitrary
code. The heap is used for data exclusively,

Such as return addresses and function pointers, you mean? Sounds like a
great way for an attacker to execute arbitrary code.

Keep death off the roads - drive on the pavement.

(Or "sidewalk", if you're in the USA.)

Changing the place where an attack happens won't stop the attack happening.
It'll just change the scenery.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top