Why are variables stored on the stack?

A

Antoninus Twink

And this is important to C programmers who come here for help who
program 99.9999% of the worlds Computers how?

What is this "help" you speak of? The purpose of CLC is for the regulars
to grandstand and posture and show off their superior knowledge of the
dustier corners of the Standard - helping people is not on the agenda
here.
 
B

Bartc

CJ said:
Thanks for all the replies, this is an interesting discussion.

Here are a couple of points that occur to me:

1) Buffer overflows are a more serious security problem on the stack
than on the heap, because the program counter is stored on the stack and
not the heap, so that a malicious stack overflow can execute arbitrary
code. The heap is used for data exclusively, which is what I meant by
"separate data from executable code".

Even if a buffer on the heap overflows, the worst that can happen is
some (probably insignificant) data corruption. Since malloc() generally
allocates space in powers of 2, often an off-by-one error or similar
won't overwrite anything anyway, but will just land in the gap between
the end of the buffer and the next power of 2.

2) I believe the argument about it being more efficient to use the stack
than the heap is spurious - if I recall, both are O(N) data structures.

The heap is a fragmented resource and requires a function call to allocate
and deallocate.

The stack, by it's nature, is unfragmented and allocation and deallocation
is very fast, especially when hardware assisted.

To address your other concerns, then yes it might be a good idea to have a
compiler option to put auto-data on the heap, to avoid accidental or
malicious overwrite of critical data. But your applications might run 10-100
x slower if they do lots of function calls.

But in practice, if there was such an option, the compiler would simply
create a separate, linear data-stack with a software stack pointer. And
performance would be little affected. So your idea is a good one.
 
M

Malcolm McLean

CJ said:
2) I believe the argument about it being more efficient to use the stack
than the heap is spurious - if I recall, both are O(N) data structures.
A stack is O(1) to allocate a chunk, and chunk sizes can be variable. O(N)
to fill the chunk with useful data, where N is chunk size.
A heap is O( f(N) ) to allocate a chunk, where f(N) is some complex and
platform dependent function of the number of chunks allocated. Typically it
will be about logarithmic, certainly under N. You can look up broken stick
distributions and the like if you are interested in problems of
fragmentation and how they impact on block search.

A heap is O(N) to fill with useful data, where N is the chunk size.

So you are not really correct, unless chunk size is very large in
comparision to the number of blocks allocated, in which case that term will
dominate.
 
S

santosh

Bartc said:
The heap is a fragmented resource and requires a function call to
allocate and deallocate.

The stack, by it's nature, is unfragmented and allocation and
deallocation is very fast, especially when hardware assisted.

To address your other concerns, then yes it might be a good idea to
have a compiler option to put auto-data on the heap, to avoid
accidental or malicious overwrite of critical data. But your
applications might run 10-100 x slower if they do lots of function
calls.

The compiler could have special support, translating auto declarations
into some sequence of compiler magic. A function call needn't be
involved, I think.
But in practice, if there was such an option, the compiler would
simply create a separate, linear data-stack with a software stack
pointer. And performance would be little affected. So your idea is a
good one.

It is an idea which has been implemented by a lot of languages, like for
example Java. But it's not really suitable for C, since C is mainly
used for low level programming where this will either not be possible
or will impact performance unacceptably.
 
S

santosh

CJ said:
Thanks for all the replies, this is an interesting discussion.

Here are a couple of points that occur to me:

1) Buffer overflows are a more serious security problem on the stack
than on the heap, because the program counter is stored on the stack
and not the heap, so that a malicious stack overflow can execute
arbitrary code. The heap is used for data exclusively, which is what I
meant by "separate data from executable code".

Even if a buffer on the heap overflows, the worst that can happen is
some (probably insignificant) data corruption. Since malloc()
generally allocates space in powers of 2, often an off-by-one error or
similar won't overwrite anything anyway, but will just land in the gap
between the end of the buffer and the next power of 2.

Actually the DieHard framework designed by Emery Berger implements a
similar idea. It randomises the location of heap allocations and tries
to "space them out", so that a small buffer overrun will simply write
to unused memory and not critical data.

See his site for more details:

<http://www.cs.umass.edu/~emery/>
 
K

Kenny McCormack

What is this "help" you speak of? The purpose of CLC is for the regulars
to grandstand and posture and show off their superior knowledge of the
dustier corners of the Standard - helping people is not on the agenda
here.

So true. So true.

Note that this is true by definition, given how they've defined the
topicality rules.
 
S

Stephen Sprunk

Richard said:
And this is important to C programmers who come here for help who
program 99.9999% of the worlds Computers how?

Um, you do realize that embedded systems comprise the _vast_ majority of
computers in existence today and that "traditional" computers like you're
used to are a small minority, right?

Just because _you_ don't program 8051s, DSPs, etc. doesn't mean that they're
not important.

S
 
S

Stephen Sprunk

CJ said:
Thanks for all the replies, this is an interesting discussion.

Here are a couple of points that occur to me:

1) Buffer overflows are a more serious security problem on the stack
than on the heap, because the program counter is stored on the stack and
not the heap, so that a malicious stack overflow can execute arbitrary
code. The heap is used for data exclusively, which is what I meant by
"separate data from executable code".

There are two common answers to that problem:

1. Make the stack non-executable
2. Use separate stacks for data and control information

Both have been used in real-world systems.
Even if a buffer on the heap overflows, the worst that can happen is
some (probably insignificant) data corruption. Since malloc() generally
allocates space in powers of 2, often an off-by-one error or similar
won't overwrite anything anyway, but will just land in the gap between
the end of the buffer and the next power of 2.

Off-by-one errors very commonly result in heap corruption; it's one of the
most common causes of program crashes, probably right after dereferencing
pointers without verifying they're not NULL. Sure, most malloc()
implementations allocate in powers of two, but most data is _also_ sized in
powers of two (due to object sizes and programmer habit), so there is
usually little extra space.
2) I believe the argument about it being more efficient to use the stack
than the heap is spurious - if I recall, both are O(N) data structures.

There is no "O(N) data structure". There are a variety of operations one
can perform on a data structure, and each operation will have its own
complexity. Operating on the top of a stack is O(1).

Also, big-O notation isn't always helpful. The constants can often
overwhelm the comparison when N is small; bubble sorts can outperform
quicksorts under some common conditions.

Lastly, a typical heap is orders of magnitude more costly to manipulate than
a typical stack. On most common processors, it's a single instruction to
(de)allocate on the top of the stack, vs. hundreds to thousands of
instructions for a heap. In some of the latest processors, there is even a
unit dedicated to manipulating the stack which effectively means stack
manipulation instructions have zero cost, and variables spilled to the stack
can even wind up in renamed registers instead of RAM. Heaps are way, way
too complex for that.

S
 
S

Stephen Sprunk

CJ said:
We know that C programs are often vulnerable to buffer overflows which
overwrite the stack.

Only on poorly-designed processors/OSes.
But my question is: Why does C insist on storing local variables on the
stack in the first place?

As many have pointed out, it doesn't _insist_ on a stack. C insists on LIFO
(de)allocation of data and function calls, which _implies_ a stack.

There are many things _implied_ by the C standard, because C evolved from
the way that certain machines worked most efficiently, but there is room for
implementations on machines that work differently to do things another way.

More correctly, most platform ABIs require a stack, and those ABIs are
generally derived from how a given processor works most efficiently.
However, there's a different ABI for each processor family works, and some
of the more esoteric ones have either no stack or multiple stacks.

Also, not all auto variables end up on the stack, even in machines that have
one. Compilers frequently put as many auto variables as possible into
registers, only assigning them memory locations if the code takes the
address of the variable or they need to be "live" across function calls.
Also, the most recent return address is stored in a "link" register on some
processors, not the stack. Likewise, many ABIs specify (non-variadic)
function arguments to be passed in registers, not on the stack. Don't
assume that every processor, ABI, or compiler works exactly the same as the
ones you're used to; there's a _lot_ of variation out there in such details.

S
 
W

Willem

Richard wrote:
)
)> Richard wrote:
)> )> Exactly which concept in C is explained easier by introducing a stack?
)> )
)> ) Just about everything. I am amazed you have to ask this. You have been
)> ) in CLC too long.
)>
)> You're dodging the question.
)>
)> PS: It may be true that a lot of 'how does it do it' questions are most
)> easily answered by 'it uses a stack', but for C programming concepts as
)> such, I see no need.
)
) Sigh. You just answered it yourself.

Were it not that 'how does it do it' is only a small part of the things
you need to explain about C. So 'Just about everything' is quite wrong.

When you are explaining programming, and the student asks 'but how does
the compiler do X', you can also answer 'it doesn't matter'.

Put together, 'it uses a stack' is an answer to precisely those questions
that don't matter.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
T

Thad Smith

jacob said:
Please stop confusing people by using word games.
I have yet to see a SINGLE example of an implementation that
doesn't use a stack for the local variables. Yes, a single
one.

Until now, there wasn't any that the regulars could put forward.

(Obviously in machines running now, and having a certain
minimum size. Coffee machines with less than 1K of
RAM and similars do not count)

I suppose I don't post enough to be considered a "regular".

In Message ID:<[email protected]>
(Macro that allocates storage and "returns" value) I mentioned that some
embedded processors use static allocation rather than a stack. Besides 1k
coffee machines, it also also used on PICs with 128k code space and a few k
of RAM.

Could a stack be used? Yes, especially for the larger memory processors,
but the instruction set does not provide an efficient means of accessing
stack-based variables, so performance and code size would suffer.
 
J

jacob navia

Stephen said:
Um, you do realize that embedded systems comprise the _vast_ majority of
computers in existence today and that "traditional" computers like
you're used to are a small minority, right?

Just because _you_ don't program 8051s, DSPs, etc. doesn't mean that
they're not important.

S

They are not important for C since there are no conforming
implementations in those machines, they are just too small.

Either recursion is absent, or floating point, or similar
restrictions (no 64 bit long long, for instance)

Yes, in the embedded world is important to care about
those small processors. But let's not generalize OK?

Behind all this arguments is the idea that C is for
small embedded processors. Real development should be done
in C++.
 
R

Richard Heathfield

M

Malcolm McLean

jacob navia said:
Yes, that is why stack allocation of large arrays is not a very
good idea.
To be precise, using the same stack for large arrays and frequently-accessed
small data items is a bad idea. Unless the arrays are huge, having a
separate stack and a function - I call it salloc() - to allocate from it can
speed up the program quite a bit. Remember most data items can be written in
a LIFO fashion.
 
F

Flash Gordon

jacob navia wrote, On 15/03/08 19:00:
They are not important for C since there are no conforming
implementations in those machines, they are just too small.

Rubbish. Back in 95 I was using a conforming implementation on a DSP.
Either recursion is absent, or floating point, or similar
restrictions (no 64 bit long long, for instance)

Well, MS doesn't provide long long because it only conforms to C95. I
don't know about the current state of the C compiler I was using on a
DSP, but in 95 it conformed to at least C90. They were even nice enough
to provide a copy of K&R2 as part of the documentation.
Yes, in the embedded world is important to care about
those small processors. But let's not generalize OK?

Ah, so you want to stick to the minority of C implementations.
Behind all this arguments is the idea that C is for
small embedded processors. Real development should be done
in C++.

You keep suggesting that this is the motivation of others despite the
many times people have told you that you are wrong. What makes you think
you know my motivations better than me, or Eric's better than him etc?
 
I

Ian Collins

jacob said:
And now you say:
Please look up the word "stack" in the standard and tell us where it
occurs.

You *ignore* the whole argumentation and just throw a sentence
again, without any context.
No, I've asked you several times for the reference and you keep telling
stories about C insisting on a stack.
The word "stack" doesn't appear in the standard. It is implied.
Thank you, at last a clear admission that C does not insist on a stack.

The original argument was not whether most implementations use a stack,
which no one here denies, but whether they are required to.
 
K

Keith Thompson

jacob navia said:
This is wrong. All mainframe implementations use a contiguous
memory area that is accessed by a dedicated register. This
register is increased in the function's prologue and decreased
in the functions epilogue. The machine has no universally
dedicated stack registers but within C it HAS A STACK as I proved with
links to the mainframe's compiler documentation A NUMBER
OF TIMES ALREADY

*All* mainframe implementations do this? Can you prove that by citing
the documentation for *all* such implementations, not just one?

There have been a number of different IBM mainframe architectures, and
a number of different compilers and operating systems for them. I'm
not at all surprised that *some* of them use something very similar to
a typical contiguous hardware stack.

I have no direct experience with the IBM mainframe implementation in
question; I've logged into an IBM mainframe once or twice many years
ago, but I've never programmed one. So my statement was not based on
my own personal experience. But *a* mainframe implementation (IBM, I
think) has been mentioned in this newsgroup that allocates function
activation records on the heap, or on something similar to a heap, so
that consecutively allocated activation records aren't necessarily
contiguous or even ordered in memory. I cannot provide a more
specific citation than that; perhaps someone else can.

But even if that's not the case, it's certainly *possible* that such
an implementation could exist; assuming it's done correctly, it would
not violate the C standard in any way.
Please look it up and stop telling stories.

Thanks

Here is my message from the last discussion we had about this:
-------------------------------------------------------------------
[snip]
There is nothing more "big iron" that the IBM mainframes... at least
within my limited experience since I quit using that environment
in 1984.

The C compiler for the IBM mainframes is "C for VM/ESA", a C89
compiler.

That's *one* C compiler for *one* operating system for *one* class of
IBM mainframes.
We find in the users guide
http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/cbcvpg00/CCONTENTS

3.1.4.3 Accessing Automatic Memory [snip]
I repeat: "... the stack frame needed for the C environment".

The documentation goes on to specify that register 13 (R13) is used to
address this memory.

I could use the same method to prove that int is 32 bits on Unix.

I agree that the vast majority of C implementations use a contiguous
hardware stack. It's even possible that all existing C
implementations do so (examples have been cited here of
implementations that don't, but that might have been erroneous, and I
lack the direct knowledge to judge one way or the other).

But that's not the question.

Here's a very simple question that I'd like you to answer.

Does C insist on storing local variables on the stack?

By "C", I mean the language, not any particular implementation or any
particular set of implementations. By "the stack", I mean a
contiguously allocated hardware stack (which is clearly what the OP
was referring to), not any abstract LIFO data structure.

Yes or no.

If your answer is yes, then you're saying that an implementation that
allocates activation records on the heap must be non-conforming. In
that case, please explain how it fails to conform.
 
K

Keith Thompson

Thad Smith said:
I suppose I don't post enough to be considered a "regular".

In Message ID:<[email protected]>
(Macro that allocates storage and "returns" value) I mentioned that
some embedded processors use static allocation rather than a
stack. Besides 1k coffee machines, it also also used on PICs with 128k
code space and a few k of RAM.

Could a stack be used? Yes, especially for the larger memory
processors, but the instruction set does not provide an efficient
means of accessing stack-based variables, so performance and code size
would suffer.

If an implementation uses static allocation for all local objects,
then it can't support recursion, and therefore can't be a conforming C
implementation. (It could be a very useful implementation that
doesn't fully conform to the C standard.)

It could still conform to the C standard if it uses static allocation
only for functions that it knows are never called recursively.

C does insist on a "stack" in the sense of last-in first-out
allocation for local objects across function calls. It absolutely
does not insist that this "stack" be implemented as a contiguous
hardware stack (even though most implementations do so). I don't
believe the embedded systems you refer to are a counterexample to any
claim that C requires a contiguous hardware stack.
 
K

Keith Thompson

jacob navia said:
Here is my message from the last discussion we had about this:
-------------------------------------------------------------------
[snip]
There is nothing more "big iron" that the IBM mainframes... at least
within my limited experience since I quit using that environment
in 1984.

The C compiler for the IBM mainframes is "C for VM/ESA", a C89
compiler.

We find in the users guide
http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/cbcvpg00/CCONTENTS

3.1.4.3 Accessing Automatic Memory
Use the EDCDSAD macro to access automatic memory. Automatic memory is
reserved using the USRDSAL, or the DSALEN operand of the EDCPRLG macro.
The length of the allocated area is derived from the ulen and/or dlen
values specified on the EDCPRLG macro. EDCDSAD generates a DSECT, which
reserves space for the *stack frame* needed for the C environment.

<end quote>

I repeat: "... the stack frame needed for the C environment".

The documentation goes on to specify that register 13 (R13) is used to
address this memory.

I don't see anything, at least in the excerpt that you quoted, that
implies that stack frames are allocated contiguously. As far as I can
tell from what you quoted, the memroy "reserved using the USRDSAL, or
the DSALEN operand of the EDCPRLG macro" could have been allocated
from a contiguous hardware stack, or from a heap, or by printing a
purchase request for an additional memory chip (the latter is
admittedly unlikely). The use of a dedicated register to address this
memory doesn't imply that it's allocated contiguously.
 
K

Keith Thompson

jacob navia said:
Obviously you can implement a stack in many ways, but MOST serious
implementations of C use the hardware stack.

"MOST" implies "NOT ALL", doesn't it?
This is SO OBVIOUS that it needs the twisted "regulars" mentality to
deny it.

No, I agree with you on this point, and I think everyone else does as
well.

Most implementations of C use a hardware stack.

Not all implementations of C use a hardware stack.

C (the language, not any particular implementation) does not insist on
the use of a contiguous hardware stack, though that's the most common
implementation.

Are we in agreement?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top