Question about alloca() and alternative to using stack in an implementation

Sushil · Jul 15, 2004

Hi gurus

I was reading FAQ

"alloca cannot be written portably, and is difficult to implement on
machines without a conventional stack."

I understand that the standard does not mandate "heap" or "stack"

I'm curious to know the implemenations which dont have stack or heap.
How do they manage local and global memory? Although this may be a
little off-topic but this will help me understand the reasons behind
the above rationale of the committee.

I hope to get interesting answers since in my limited exposure I've
always seen machines with stack n heap and someone from this worldwide
forum would get me some answers about the machines who did not have
stack and/or heap and yet ran standard C.

Thanks in advance
-Sushil Ramachandran

jota · Jul 15, 2004

One example would be PIC controlers!
The PIC controler does have an internal hardware stack with only a few
(8-32) words depth. This hardware stack is commited only to call and returns
And there is no push/pop to the stack.
So my PIC C compilator has to put all my local 'C' variables in the global
space!
Ofcousre there is also the lack of an OS handling the memory resourses!
'C' leaves all that to the 'C' implementor!
The 'C' code looks more or less the same in a small PIC controller as it
looks in
a PC for example.
void ex()
{
int var;//In stack on a pc.//Global ram in a PIC
...
}
//jota

Mark A. Odell · Jul 15, 2004

One example would be PIC controlers!
The PIC controler does have an internal hardware stack with only a few
(8-32) words depth. This hardware stack is commited only to call and
returns And there is no push/pop to the stack.
So my PIC C compilator has to put all my local 'C' variables in the
global space!

This is a bit off-topic since it's not to do with the C langauge but
implementations of C compilers. This would be far better discussed in
comp.arch.embedded.

<OT>
That's not the reason. There's no hardware stack on many 32-bit RISC CPUs
either. The just use one of the registers, by convention, to be the stack
pointer. Then, instead of pushes and pops you use load and stores with an
add or sub on the "stack pointer" register.

Ofcousre there is also the lack of an OS handling the memory resourses!
'C' leaves all that to the 'C' implementor!
The 'C' code looks more or less the same in a small PIC controller as it
looks in
a PC for example.
void ex()
{
int var;//In stack on a pc.//Global ram in a PIC

It's not global, it's local to the function. It may or maynot be overlayed
with other local variables from other functions. This is also how the Keil
compiler works for the 8051. However, you loose reentrancy so in this mode
the compiler is not ISO C compliant. </OT>

Malcolm · Jul 15, 2004

Sushil said:
I'm curious to know the implemenations which dont have stack or
heap.
How do they manage local and global memory? Although this may be a
little off-topic but this will help me understand the reasons behind
the above rationale of the committee.

One way is to allocate separate memory space for each function. Effectively
you are declaring every varaible "static" without actually using the
keyword. This has the effect of banning recursive functions, and is also
wasteful of memory, but it may simplify the compiler.

Dan Pop · Jul 15, 2004

One way is to allocate separate memory space for each function. Effectively
you are declaring every varaible "static" without actually using the
keyword. This has the effect of banning recursive functions, and is also
wasteful of memory, but it may simplify the compiler.

You're missing the point. ANY implementation can have a stack, which is
nothing more than a data structure, if it wants to. No hardware support
is needed for that. If an implementation chooses another data structure
for handling automatic object allocation it is for some reason of its
own, but this alternate data structure must still provide all the
functionality required by the standard, including recursion.

The stack has the advantage that it can be trivially extended, to
satisfy an alloca request or allocate a VLA. This advantage may
not be shared by other data structures used for this porpose (e.g. a
linked list of dynamically allocated memory blocks).

Dan

Chris Torek · Jul 15, 2004

I understand that the standard does not mandate "heap" or "stack"

I'm curious to know the implemenations which dont have stack or heap.
How do they manage local and global memory?

It is ... unusual, at least, if not unheard-of, to lack both a
hardware stack *and* a malloc()-type arena. (I am assuming the
latter is what you mean by "heap" here.) A system that has neither
cannot implement Standard C. (But other than limited code or data
space, there is no reason an implementor cannot simply write his
own malloc-like subsystem.)

IBM mainframe systems have, however, used "no hardware stack" along
with "does have malloc()-like arena" to implement local per-function
variables and recursive functions. Consider the following source
code level transformation:

/* original */
int treewalk(struct tree *p, void (*fn)(struct tree *)) {
int a, b;
if (p == NULL)
return 0;
a = treewalk(p->left);
b = treewalk(p->right);
if (a > b)
heavyleft(p);
else
heavyright(p);
return a + b;
}

/* new, "pseudo C" to avoid writing pure assembly */
void treewalk(struct tree *r3, void (*r4)(struct tree *)) {
struct {
frame_pointer_type saved_frameptr;
return_addr_type saved_retattr;
int a, b;
} *r2;

r2 = runtime$framealloc(sizeof *r2);
r2->saved_frameptr = frameptr;
r2->saved_retattr = retaddr;
frameptr = r2;

if (r3 == NULL) {
r0 = 0; /* "return 0" -- r0 is a "global" register */
goto out;
}
treewalk(r3->left), frameptr->a = r0;
treewalk(r3->right), frameptr->b = r0;
if (frameptr->a > frameptr->b)
heavyleft(r3);
else
heavyright(r3);
r0 = frameptr->a + frameptr->b;
r2 = frameptr;
frameptr = r2->saved_frameptr;
retaddr = r2->saved_retaddr;
out:
runtime$framefree(r2, sizeof *r2);
return; /* return value in r0 and r1 as needed */
}

Here r0 through r4 are machine-level registers, as are "frameptr"
and "retaddr" (probably r6 and r7 respectively, on an 8-register
machine).

The runtime$framealloc() call is Super Extra Special in that it
does not use the regular machine-register calling sequence. (Hence
the name with the dollar sign.) It has to avoid overwriting the
hardware's "return address" register (r7), using instead the
reserved-by-compiler-at-function-entry register (r5).

Of course, runtime$framefree is the counterpart to runtime$framealloc,
and releases the "stack frame" allocated here.

Dave Vandervies · Jul 15, 2004

[implementing the activation-record stack without a hardware-assisted
stack mechanism]

/* new, "pseudo C" to avoid writing pure assembly */
void treewalk(struct tree *r3, void (*r4)(struct tree *)) {
struct {
frame_pointer_type saved_frameptr;
return_addr_type saved_retattr;
int a, b;

The arguments also belong somewhere in the allocated stack frame, but
it's not obvious exactly where or how to get them there. Any inherited
wisdom on the subject? Or is that part Left As An Exercise?

} *r2;

r2 = runtime$framealloc(sizeof *r2);
r2->saved_frameptr = frameptr;
r2->saved_retattr = retaddr;
frameptr = r2;

if (r3 == NULL) {
r0 = 0; /* "return 0" -- r0 is a "global" register */
goto out;
}
treewalk(r3->left), frameptr->a = r0;
treewalk(r3->right), frameptr->b = r0;
if (frameptr->a > frameptr->b)
heavyleft(r3);
else
heavyright(r3);
r0 = frameptr->a + frameptr->b;
r2 = frameptr;
frameptr = r2->saved_frameptr;
retaddr = r2->saved_retaddr;
out:

Wouldn't you want to point frameptr back at your saved frame pointer
*after* the target of the early-exit jump? Or am I missing something?
--------
out:
frameptr = r2->saved_frameptr;
retaddr = r2->saved_retaddr;
--------
Or if you want to microoptimize the early-exit case (which you probably
do here, since this would be done by a compiler and not a human), since
you're not mangling the return address until after the check that jumps
to it:
--------
retaddr = r2->saved_retaddr;
out:
frameptr = r2->saved_frameptr;
--------
(Unless you can get away with putting the early-exit check before you
create your new stack frame and point frameptr at it, which would probably
be a better optimization for a sufficiently clever compiler.)

runtime$framefree(r2, sizeof *r2);
return; /* return value in r0 and r1 as needed */
}

Here r0 through r4 are machine-level registers, as are "frameptr"
and "retaddr" (probably r6 and r7 respectively, on an 8-register
machine).

The runtime$framealloc() call is Super Extra Special in that it
does not use the regular machine-register calling sequence. (Hence
the name with the dollar sign.) It has to avoid overwriting the
hardware's "return address" register (r7), using instead the
reserved-by-compiler-at-function-entry register (r5).

Of course, runtime$framefree is the counterpart to runtime$framealloc,
and releases the "stack frame" allocated here.

I've noted that doing something like this is a valid way of handling the
"stack" a few times in discussions of ifwhen it's pedant-compatible to
talk about allocating variables on the stack, but never described it in
this much detail. Now all I need to do is put a link to Google's copy
of this post somewhere where I'll remember it the next time the subject
comes up...

dave

Chris Torek · Jul 15, 2004

[implementing the activation-record stack without a hardware-assisted
stack mechanism]

The arguments also belong somewhere in the allocated stack frame, but
it's not obvious exactly where or how to get them there. Any inherited
wisdom on the subject? Or is that part Left As An Exercise?

In this case, I decided to leave them in the "available registers"
to save complexity. There are few "good" answers to handling
function parameters, because they are provided by the caller, but
the callee often needs to save them. This leads to one typical
solution, which is to allocate the callee's activation record
("stack frame") in the *caller*, before making the call. One can
do this for fixed-size frames by recording the frame-size along
with the function, and replacing a simple:

CALL foo

with

LEA foo - 4, reg // required frame size is 4 bytes before entry point
CALL runtime$setup_entry // allocate frame
// now stash arguments at 0(newframe) .. N(newframe)
// then finally call the function
CALL foo

This does not work as well for a language that provides VLAs or
alloca(), though (which is of course the original problem).

Wouldn't you want to point frameptr back at your saved frame pointer
*after* the target of the early-exit jump?

Er, yes. Just a mistake -- I was writing the previous article between
other things, and almost forgot even the "r3 == NULL" test.

E. Robert Tisdale · Jul 15, 2004

Sushil said:
I was reading FAQ

"alloca cannot be written portably, and is difficult to implement on
machines without a conventional stack."

I understand that the standard does not mandate "heap" or "stack"

I'm curious to know the implemenations which dont have stack or heap.
How do they manage local and global memory? Although this may be a
little off-topic but this will help me understand the reasons behind
the above rationale of the committee.

I hope to get interesting answers since in my limited exposure I've
always seen machines with stack n heap and someone from this worldwide
forum would get me some answers about the machines who did not have
stack and/or heap and yet ran standard C.

I don't see any reason why it would be any more difficult to implement
alloca than it would be to implement variable length arrays
which are specified by the new ANSI/ISO C 99 standard.
Of course, specifying variable length arrays obviates alloca.

Dik T. Winter · Jul 16, 2004

> I was reading FAQ
> "alloca cannot be written portably, and is difficult to implement on
> machines without a conventional stack."
>
> I understand that the standard does not mandate "heap" or "stack"

Note that he FAQ talks about *conventional* stack.

> I'm curious to know the implemenations which dont have stack or heap.
> How do they manage local and global memory?

I know of a machine that did not have conventional stack or heap, but
nevertheless could implement Algol 68, which requires both. The compiler
did dedicate two registers to current stack pointer and bottom of heap
pointer. At each moment the compiler did know how much space above the
stack pointer was required in each block. Varying space (e.g. arrays of
running time dependent size) in a block was always allocated on the heap.
So at each moment when a routine call did occur, the compiler had knowledge
to what place the stack was filled and could take measures appropriately.
With such a system C89 can be implemented very well, but "alloca" can not be
implemented. C99 would be more of a problem because of the variable
stack-size in a block. (No problem in Algol 68, because garbage collection
is a near requirement.)

Roman Ziak · Jul 16, 2004

jota said:
So my PIC C compilator has to put all my local 'C' variables in the global [...]
void ex()
{
int var;//In stack on a pc.//Global ram in a PIC

The concept of scope is abstraction used by HLLs. Processors
operate in single (code+data) or multiple (code,data, hw stack)
memory spaces.

Hardware stack is very convenient, but nothing is stopping the
C implementation to provide software stack as do some compilers
for mentioned PICs, where hardware stack is severely limited.
I believe the mutually non-recursive functions are fine with static
variables. Only recursive functions need to implement local storage
area through pointer.

I think the problem lies in availability or lack of pointer registers.
PIC16 has only one and PIC18 has 3 such registers (That's why
Microchip calls it "C friendly", but that is different story).
ESP on IA32 is nothing but "special" pointer register.

Now back to "alloca". Traditionally the stack frame on PC looks like:

push ebp
mov epb,esp
sub esp, LOCAL_STORAGE_SIZE

where LOCAL_STORAGE_SIZE is a constant and traditionally stack
pointer stays the same throughout the function. But does not have
to, because we have already local frame pointer (shown as EBP) for
accessing local variables.

so each call ptr=alloca(size) may result in this code (no error
condition check shown):

stack_pointer -= size;
ptr = stack_pointer;

The conclusion: IMHO the alloca() implementation does require
_at least_ one pointer register. Such register is, however, required
also by heap and stack implementation, so any platform with C89
can also implement aloca().

Roman

jota · Jul 16, 2004

One example would be PIC controlers!

This is a bit off-topic since it's not to do with the C langauge but
implementations of C compilers. This would be far better discussed in
comp.arch.embedded.

No! Its fully off-topic!

<OT>
That's not the reason. There's no hardware stack on many 32-bit RISC CPUs
either. The just use one of the registers, by convention, to be the stack
pointer. Then, instead of pushes and pops you use load and stores with an
add or sub on the "stack pointer" register.

Read carefully again! (Starting from "one example")
What if there is NO reacheble stack pointer, and only users of stack are
call/return?

It's not global, it's local to the function. It may or maynot be overlayed
with other local variables from other functions. This is also how the Keil
compiler works for the 8051. However, you loose reentrancy so in this mode
the compiler is not ISO C compliant. </OT>

I see that u get the picture ;-)
I have to point out that I did not say that 'var' it is considered global in
'C'!
'var' is considered local 'C'. However since there is no _useble_ stack in a
PIC (at least the old ones)
The compiler targetes your local vars to the global memory space directly,
NO stack pointer!
OK to be real anal. The stack is mostly used as a pointer to the global
space and can
be considered a as an allocator for for temporary data. Even if the
stackpointer
points a _part_ of the global ram or not u need to push/pop the stack to use
it to store local data!
Now Im sure there are some other nmemonics out there for what I call
push/pop, but U get the picture.
I use SDCC for 8051!

compiler works for the 8051. However, you loose reentrancy so in this mode
the compiler is not ISO C compliant. </OT>

Thats right!
Function recurison is unsupported atleast om my PIC compiler.
And function mapping is by default only allowed for one call graph at a
time.
For example ex() is only to be used by the main graph or the isr graph. Not
by both!
And the reason _is_ the lack of a fully functional stack!
In the example 'var' is stored at an absolute memory position, not using the
stack!
Or actually an virtual internal compile time stack, with decide where 'var'
goes!

OK Im off-topic I know!
//jota

kal · Jul 16, 2004

E. Robert Tisdale said:
I don't see any reason why it would be any more difficult to implement
alloca than it would be to implement variable length arrays
which are specified by the new ANSI/ISO C 99 standard.
Of course, specifying variable length arrays obviates alloca.

Well, let me try arguing the counterpoint.

The reasons will become apparent when one tries to implement it.

<OT>
Try figuring out how to manage the Top Of Stack pointer.
</OT>

Despite arguments to the contrary, alloca need not allocate
space in the memory area used for automatic variables.
It just has to ensure that memory is automatically freed on
function exit.

One way of doing this, in implementations where "stack" is
used, is to allocate above the Top Of the Stack or at the
end of the memory area "reserved" for the "stack". Both
are fraught with difficulties.

VLA is quite different with regard to housekeeping (prolog/
epilog.) For one, within a given scope, instances of VLA
are fixed in number. Also, VLA comes with some restrictions.

e.g. given below is part of C99 6.7.5.2 (3)

... The size of each instance of a variable length array
type does not change during its lifetime.

Dan Pop · Jul 16, 2004

In said:
I don't see any reason why it would be any more difficult to implement
alloca than it would be to implement variable length arrays
which are specified by the new ANSI/ISO C 99 standard.

You're perfectly right. The difference is that people expect alloca()
to be portable, but have no such expectations about VLAs. And all this
is a quality of implementation issue: a conforming implementation can
use malloc() for both and never bother to call free().

Of course, specifying variable length arrays obviates alloca.

Of course, doing this reduces the code portability to virtually nil.

Dan

kal · Jul 16, 2004

This is highly off topic so people are probably going
to jump on me.

Most machines do not have stack or heap. They have some
hardware facilities that enable simulation (or would that
be emulation?) of a stack. Usually this consists of a
register (aka Stack Pointer or SP) and a couple of machine
instructions (push and pop.)

For instance, the "push A" instruction will store the
value in register "A" at the memory location pointed to
by the register "SP" and decrement/increment the value
in "SP" by 1. The "pop A" instruction will store the value
at the memory location pointed to by "SP" into register
"A" and increment/decrement the value in "SP" by 1.

This can be done even if there was no special register
so long as some register is availble for such use. The
compiler will just have to generate two instructions
instead of one.

The OS and the compilers are so written as to use a
predefined part of the memory for these purposes.
That is the SP spans this part of the memory. One gets
what is called "stack overflow" if, during execution,
SP points to memory outside this predefined region.

There are, however, some machines that have hardware
stacks. One such machine is peddled by the Unisys corp.,
the erstwhile "Burroughs" large systems. It is a very
different architecture than anything you have probably
come across. It would be too long to explain it here.
But there are only 32 or so stack registers and hence
functions can be called only to a depth of 32 levels.
Of that the first two are used by the OS. All data
accessed by the program (or descriptors) must be on the
stack. The executable file as such does not contain any
reference to memory locations. It is the only true
stack machine that I know of.

In that machine C programs are implemented using, you
guessed it, SIMULATED stacks. Whoever implemented the
C compiler in that machine must have had a whole lot
of fun.

I don't see any reason why it would be any more difficult
to implement alloca than it would be to implement variable
length arrays which are specified by the new ANSI/ISO C 99
standard. Of course, specifying variable length arrays
obviates alloca.

That may be so but let me try to argue the counterpoint.

The functionality provided by Variable Length Arrays (VLAs)
are quite different from that provided by alloca. alloca
implementation is likely much more complex. I would not
know how different these two are till I actually try to
implement them.

At block level the number of instances of VLAs are fixed.
They also come with restrictions like the one quoted below.

6.7.5.2 Array declarators

3. ... The size of each instance of a variable length
array type does not change during its lifetime.

OTOH, within a block, alloca may or may not be called and
if called it may be called a variable number of times.

Arthur J. O'Dwyer · Jul 16, 2004

...

Well, let me try arguing the counterpoint.
The reasons will become apparent when one tries to implement it.

Indeed, but I don't think you "get" the /essential/ difference
between VLAs and alloca() yet, either. (Or else you're just more
familiar with a "stack" model radically different from the one I
know, and your argument applies to it, whatever it is.)

<OT>
Try figuring out how to manage the Top Of Stack pointer.
</OT>

Despite arguments to the contrary, alloca need not allocate
space in the memory area used for automatic variables.
It just has to ensure that memory is automatically freed on
function exit.

In particular, the memory allocated in function Foo is freed
on exit from Foo, and the memory in function Bar is freed on
exit from Bar, and never vice versa. I mention this because I
can think of a naive implementation of 'alloca' which would
break when confronted with

void foo() { int *p = alloca(100); bar(); *p = 42; }
void bar() { alloca(100); return; }

One way of doing this, in implementations where "stack" is
used, is to allocate above the Top Of the Stack

For hysterical reasons which are OT in this newsgroup, the
phrase "above the Top Of the Stack" is ambiguous. I have no
idea where you mean (but I have at least three ideas).

or at the
end of the memory area "reserved" for the "stack". Both
are fraught with difficulties.

VLA is quite different with regard to housekeeping (prolog/
epilog.) For one, within a given scope, instances of VLA
are fixed in number. Also, VLA comes with some restrictions.

e.g. given below is part of C99 6.7.5.2 (3)

... The size of each instance of a variable length array
type does not change during its lifetime.

This is true of objects allocated with 'alloca', too, unless
some library wants to provide a 'realloca' --- now /that/ would
be messy! So it's not a difference.

The big difference is that VLAs can only be defined in certain
contexts --- namely, as "statements" of their own, not as part of
any bigger expression. Thus the implementation doesn't have to
struggle with the equivalent of

foo(alloca(n));

on (common) architectures which use a single stack for both
automatic storage /and/ the function call stack
<completely OT>which seems like a dumber and dumber idea the
more I learn about other architectures</completely OT>.

With VLAs, you always have

char myVLA[n];
foo(myVLA);

that is, the allocation and possible initialization of the object
on the "automatic storage" stack are completely separated from the
mucking around with the "call" stack. So you're never mixed up,
even if they're the same stack.

<OT> I would expect GCC has a similar problem with its ({})
construct, which IIRC lets the programmer write something like

foo( ({int i; &i;}) );

although that particular example is almost certainly undefined
even in GNU C. </OT>

-Arthur

jota · Jul 16, 2004

Most machines do not have stack or heap. They have some

hardware facilities that enable simulation (or would that
be emulation?) of a stack. Usually this consists of a
register (aka Stack Pointer or SP) and a couple of machine
instructions (push and pop.)

Yes they do!
An ordanary machine has both stack and heap.
Every thread has its own stack!
And when one thread stops the execution the OS simply exchanges
stackpointer, registers and flags.
There is alot writen about context switching and realmode/protected mode and
memory handlers.
Read up!

The OS and the compilers are so written as to use a
predefined part of the memory for these purposes.
That is the SP spans this part of the memory. One gets
what is called "stack overflow" if, during execution,
SP points to memory outside this predefined region.

Not quite true!
A modern OS will detect the stackoverflow and simply allocate more stack.
From the users perspective it seems like nothing happend!
Thanks to the OS memory handler!
//jota

Keith Thompson · Jul 17, 2004

In <[email protected]> "E. Robert Tisdale"

You're perfectly right. The difference is that people expect alloca()
to be portable, but have no such expectations about VLAs. And all this
is a quality of implementation issue: a conforming implementation can
use malloc() for both and never bother to call free().

Of course, doing this reduces the code portability to virtually nil.

For anyone just tuning in, Dan's implicit assumption is that code that
depends on features that are new in the C99 standard is blatantly
non-portable because of the shortage of C99 compliant compilers.

I'm not disputing his assumption, just pointing it out. (I expect
he'll find some way to take offense at my remarks.)

Dan Pop · Jul 19, 2004

In said:
For anyone just tuning in, Dan's implicit assumption is that code that
depends on features that are new in the C99 standard is blatantly
non-portable because of the shortage of C99 compliant compilers.

Is this an assumption or a fact?

Dan

Keith Thompson · Jul 19, 2004

Is this an assumption or a fact?

It is an assumption. That doesn't prevent it from being a correct
assumption, and therefore also a fact.

It is an implicit premise for your statement about the non-portability
of VLAs, which would have been clearer if you had stated it
explicitly.

Short question about automatic and dynamic variables and C standard	3	Sep 12, 2009
stack and heap	9	Jun 26, 2007
[Question]activities of stack pointer and frame pointer when function called	3	Jan 12, 2005
Stack vs. Heap	14	Oct 9, 2003
Question about Ruby implementation	6	Dec 29, 2003
Simple Question: How to see an Error stack trace in Tomcat... (Struts)	3	Jun 27, 2006
using "Implementation-Version" in MANIFEST.MF to version war, jar and ear files	3	Feb 2, 2007
Elise Mooney reports on Channel 9 about Maths Worldwide and the fraudthat it is	1	Apr 17, 2010

Question about alloca() and alternative to using stack in an implementation

Sushil

jota

Mark A. Odell

Malcolm

Dan Pop

Chris Torek

Dave Vandervies

Chris Torek

E. Robert Tisdale

Dik T. Winter

Roman Ziak

jota

kal

Dan Pop

kal

Arthur J. O'Dwyer

jota

Keith Thompson

Dan Pop

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads