(This post is not really on-topic at all; it deals with assembly code
generated by gcc 3.3.3 for a situation described by OP.)
Perhaps, but it may be instructive anyway. Here is a little more
on the assembly, given the source. This appears to be targeted to
the DEC Alpha (which has some similarities to MIPS, including many
of the assembler directives and the use of register "$30" as a stack
pointer).
#include <stdio.h>
struct foo {
int bar;
int baz;
};
int qux( struct foo *f ) {
return (f+1)-f;
}
int main(void)
{
struct foo g;
return qux(&g)-1;
}
, generates this with optimization disabled:
.set noat
.set noreorder
These turn off the assembler's use of register $28 (AKA "$at" or
"assembler temporary") to synthesize various constants, and disable
the assembler's re-ordering of instructions to fill delay slots.
(GNU "gas" does not do any re-ordering anyway, apparently, but it
is always safe to tell it not to.)
.text
.align 2
.globl qux
.ent qux
$qux..ng:
qux:
These -- except the labels, which should be obvious -- are all
"pseudo-instructions", i.e., assembler directives that may or may
not generate any actual code. Most of these should be reasonably
familiar to most people who know assembly languages. The oddball,
".ent", inserts a debug information record marking the entry point
for the function.
This pseudo-instruction inserts another debug-information record
-- which may also be used by longjmp() and/or C++ exception handlers;
I am not sure about this -- indicating that there is an "actual
frame pointer" in register $15, which is often, but not always,
used as a frame pointer. (The frame size, 32 bytes, is the 2nd
argument. Register $26 normally holds the return address. I am
not sure why it appears in this directive. The last 0 is ignored,
at least in the GNU system.)
This pseudo-instruction inserts another debugger record, this time
indicating which registers are saved, and at what offset from the
initial $sp. The two bits set in the mask above correspond to
registers $26 and $15 ($30, which holds the stack pointer, is
calculated instead of saved).
This subtracts 32 from the stack pointer, creating room on the stack
to store stuff.
stq $26,0($30)
stq $15,8($30)
This stores the previous return address ($26) and the previous frame
pointer ($15) in the stack frame just created. "stq" stores a "quad",
an 8-byte quantity; registers are 8 bytes.
Oddly enough, this is actually a "move" instruction. "bis" is the
bitwise OR instruction, but register $31 is hardwired to zero, so
"$31 or $30" ORs register $30 with all-zero-bits, which obviously
is just the value in register $30. The result is stored in register
$15 -- i.e., "bis $31,$30,$15" copies the value in $30 to $15.
This inserts the last of the stack-frame debug data, indicating
that the stack frame is now complete (and, in this case, that
register $27 is not being used by this code). If you run the code
under a debugger and it stops at various points, these debug records
tell it how to find and interpret the stack contents for a stack-trace.
(In other words, the stack frame format is more flexible than on,
e.g., the 80x86 or SPARC.)
$16 is the first scalar, non-floating-point (i.e., integral or
pointer) argument to the function -- in this case, "f". So this
stores the actual argument from the parameter register into the
stack-frame just built (at offset 16 from register $15).
Again, $31 is hardwired to zero, so computing offset 1 from it
computes the value 1. This is stored in register $0, which is
the register holding the return value.
This copies the value in register $15 back to register $30 -- i.e.,
moves the frame pointer value back into the stack pointer. Since
the two registers are (still) equal (from the last such move), this
is entirely unnecessary; but this *is* unoptimized code. In a
function that made use of internal stack allocation (using C99's
VLAs or the non-standard alloca()), this might be required in
some cases (those where one could not use $15 for everything to
follow).
This reloads register $26 (the return address) from where it was
saved, even though it is has not been changed.
This reloads register $15 (the frame pointer) from where it was saved.
This reloads register $30 (the previous stack pointer) from where it
was saved. The stack frame that was built at the function entry is
now destroyed; the function must now return.
This returns to the caller -- the address in $26 -- and presumably
puts the address of the "ret" instruction itself (or that of the
next instruction) in $31. Since $31 is hardwired to zero, that
causes it to be discarded. I assume "ret" is just an alias for
"jmp" (but I am not an Alpha expert). (The final 1 is a prediction
as to whether the branch is taken. Since this is an unconditional
branch, obviously it is taken; it seems pointless to bother writing
this part, and apparently it *is* optional in the assembler.)
.end qux
.align 2
.globl main
.ent main
main:
These end the qux() function and start the main() function, in the
same way. (I omit the rest of the code since it should now be
obvious.)
Not so good, but as I believe someone elsethread (but in ng) mentioned
recently, gcc is notorious for generating brain-dead code unless you
ask it to optimize.
More specifically, it never looks at anything other than one expression
at a time, and even then it often does not look closely. However, in
the code above, the "working" part of the code is just one instruction,
setting register $0 (the return value) to 1.
When asked to do so (given -O3), gcc generates
.set noat
.set noreorder
.text
.align 2
.align 4
.globl main
.ent main
$main..ng:
main:
These are largely as before, except now we are generating code for
main() first. (Also, it is curious that there are two .align
directives. The first one could be removed with no effect.) The
main() function has had qux() expanded in-line:
.frame $30,16,$26,0
lda $30,-16($30)
.prologue 0
This time, there is no separate frame pointer -- the frame is
given by register $30, the stack pointer register. It has only
16 bytes in it -- and aside from zero bytes, this is the minimal
stack frame size, as stack frames must be multiples of 16 bytes.
No additional registers are saved, so no ".mask" directive is
required.
This sets register $0 to 0, i.e., sets the return value for main()
to zero. GCC has seen that qux() returns the constant 1, and that
main() returns qux() - 1, or 0.
lda $30,16($30)
ret $31,($26),1
.end main
This restores the stack pointer and returns. The only "wasted motion"
was saving and restoring the stack pointer -- main() could have been
expressed as:
.frame $30,0,$26,0
bis $31,$31,0
ret $31,($26)
The qux() function is in fact compiled this way:
.align 2
.align 4
.globl qux
.ent qux
$qux..ng:
qux:
.frame $30,0,$26,0
.prologue 0
lda $0,1($31)
ret $31,($26),1
.end qux
This time, there is no useless adjusting of the stack pointer: the
qux() function simply sets the return-value register to 1 and then
returns. This is just two instructions long, which is the minimum.
.ident "GCC: (GNU) 3.3.3 (NetBSD nb3 20040520)"
I'm far from an assembler guru (I'm quite happy to let those of you
who are continue to make gobs of cash so that I never have to code in
it), but gcc seems to be pretty capable when you ask it to be. It
isn't what ICC will do for you (presumably) ...
Well, ICC does not generate Alpha assembly.
Aside from the
odd extra two instruction in main() (needlessly adjusting the stack
pointer in $30), this is pretty much minimal -- the qux() function
still has to exist, even though main() does not call it, in case
the object file is loaded into some other program (written in some
language other than C, presumably).
... I'd be curious how gcc 4 performs.
Perhaps it elides the pointless stack adjustment in main().