Stack organisation locals/args

Discussion in 'C Programming' started by datenpunk, Jul 11, 2013.

  1. datenpunk

    datenpunk Guest

    hi,

    here is some rather simple code with a complex question ...

    the code - called from main:

    int long call_me(int long a, int short b) {
    int long c;
    c = 0;
    c = c + a;
    c = c - b;
    return c;
    }

    when debugging this I see that the stack has this order (low to high)

    1. a (EBP + 8)
    2. Return address (EBP +4)
    3. EBP
    4. c (EBP -4)
    5. b (EBP - 20) /* some debugger stuff between b and c)

    the question is: could it be that C mixes locals with params? I would expect to have a and b and then c but they are mixed up and b is after/ahead of EBP. Thanks in advance!

    Thanks in advance

    Daniel Khan
     
    datenpunk, Jul 11, 2013
    #1
    1. Advertisements

  2. You're passing b as a short. What I suspect is that it's being passed as
    16 bits, then transferred to a 32 bit variable for efficiency reasons.

    But the way to be sure is to compile to assembly. Debuggers don't necessarily
    tell the unvarnished truth, because what they're doing is treating machine
    code as something that can be viewed at source level.
     
    Malcolm McLean, Jul 11, 2013
    #2
    1. Advertisements

  3. datenpunk

    Eric Sosman Guest

    Yes.

    As a quite common case, all three of a,b,c might reside
    in registers and have no memory addresses at all.

    But the main point is this: The code mentions three
    variables a,b,c and performs assorted operations on them.
    The definition of the C language specifies what results the
    operations must produce (barring things like overflow), and
    *any* stratagem the implementation adopts is okay so long as
    it produces those results.
    ... things you shouldn't.
     
    Eric Sosman, Jul 11, 2013
    #3
  4. datenpunk

    James Kuyper Guest

    The standard only specifies how C code must behave, it doesn't specify
    the details about how the compiler arranges for that behavior to occur.
    Different compilers for different platforms arrange it in different
    ways. In particular, the standard specifies nothing about how memory for
    function parameters and local variables is allocated, other than the
    fact that the lifetime of both the function parameters and variables
    local to the outmost block of the function ends when the function
    returns. That makes it reasonable (but not necessary) for all of those
    variables to be stored in the same general location, but the standard
    says nothing to suggest what order they might be in.

    Even if your expectations about the locations of those variables had
    been correct, they would only have been correct for a particular
    platform and a particular compiler - you couldn't count on them being
    correct in any other context.
     
    James Kuyper, Jul 11, 2013
    #4
  5. datenpunk

    Joe Pfeiffer Guest

    Others have commented on this in the context of C (and pointed out,
    rightly, that it would be legal); as I look at it from the context of
    ia32 it looks really weird. Do you know how high an optimization level
    is being used? Without enough optimization turned on to start passing
    parameters in registers or something, it pretty much has to push b, then
    push a, then perform a call pushing the return address, and then reserve
    space for the old EBP and c.
     
    Joe Pfeiffer, Jul 11, 2013
    #5
  6. Compilers are doing a lot of optimization these days, including
    inlining and tail call optimization. (Well, not the latter for this.)
    If you inline it, though, and/or pass parameters in registers
    it could easily do something like that.

    -- glen
     
    glen herrmannsfeldt, Jul 11, 2013
    #6
  7. datenpunk

    datenpunk Guest

    Thank you all for the explanation. This really helped a lot already.

    For me to understand it completely:

    If b is stored inside a register - where does the system store this information (how does the system know where to find b)?

    Meanwhile I am thinking that eclipse has a bug here.
    I went through the whole stackframe and in fact b is where I expected it. Right at the beginning of the stackframe @EBP + 0xC. Only the location of the variable inside eclipse shows a "wrong" address and in fact the value inside this address is 0x08040001 looks like some garbage.

    If someone is interested - this is how it looks like: https://www.evernote.com/shard/s16/...2761053e9252/56a1d31c6c418c19f07df4ac9678f7d2

    Again thanks a lot.

    Daniel Khan
     
    datenpunk, Jul 11, 2013
    #7
  8. Inside a function, the compiler just has to remember which variable is
    currently stored in which register.
    For passing function arguments, both the caller and the callee have to
    follow the same convention. Such a convention can be that the parameters
    are pushed onto the stack from right to left. But equally valid
    conventions are that the parameters are pushed onto the stack from left
    to right or that the first 5 parameters are passed in registers r4 to r9
    and that the return address is stored in r15.
    Depending on the convention for passing function arguments, the compiler
    knows where to look for the second argument to the function.
    Bart v Ingen Schenau
     
    Bart van Ingen Schenau, Jul 11, 2013
    #8
  9. datenpunk

    datenpunk Guest

    Am Donnerstag, 11. Juli 2013 12:54:44 UTC+2 schrieb Bart van Ingen Schenau:
    Thanks a lot - that makes sense.

    Daniel Khan
     
    datenpunk, Jul 11, 2013
    #9
  10. datenpunk

    Noob Guest

    Please note that this entire discussion is off-topic here.
    comp.compilers and comp.lang.asm.x86 might be good places
    to ask these questions.

    See also

    https://en.wikipedia.org/wiki/X86_calling_conventions
    https://en.wikipedia.org/wiki/Application_binary_interface

    Regards.
     
    Noob, Jul 11, 2013
    #10
  11. datenpunk

    datenpunk Guest

    Am Donnerstag, 11. Juli 2013 13:35:51 UTC+2 schrieb Noob:
    Thank you.
     
    datenpunk, Jul 11, 2013
    #11
  12. datenpunk

    James Kuyper Guest

    That information is stored in your program itself, in the form of
    instructions to retrieve the value of b from the appropriate register.
    The compiler needed to keep track of the location of b when generating
    those instructions, but the instructions themselves are all that is left
    of that information, once the code has been generated.
    I'll let someone who knows something about eclipse respond to that issue.
     
    James Kuyper, Jul 11, 2013
    #12
  13. Some of it, but certainly not all of it. It's illuminating some
    important points about how C is defined, particularly that (unlike
    assembly language) a C program defines behvior, not machine code.

    A question is not a bad one just because the answer is no.
     
    Keith Thompson, Jul 11, 2013
    #13
  14. (snip)
    It is a little more complicated if you allow for varargs.

    As I understand it, ANSI C allows for a different convention for
    varargs and non-varargs, but the systems I know (not all that
    many) use the same convention. Pushing an unknown number of
    arguments right to left means that the caller can find the left
    most argument(s) without knowing how many there are.

    The early 808x compilers (for Pascal and Fortran) used a convention
    where the arguments are pushed left to right, and the called routine
    pops them off the stack with a special form of RET. As that is not
    compatible with varargs, different conventions were used when C
    compilers appeared, where the calling routine pops the arguments.
    Many systems that pass in registers still save stack space for them.
    No, the return address is always in R14!

    BALR R14,R15
    -- glen
     
    glen herrmannsfeldt, Jul 11, 2013
    #14
  15. datenpunk

    BartC Guest

    *I* would expect, with a simple-minded non-optimising compiler, that the
    order on the stack might be, low to high:

    c, return address, old frame pointer, a, b (with b widened to the machine
    word size).

    How this is not really predictable so it's unwise to rely on any particular
    ordering.
     
    BartC, Jul 11, 2013
    #15
  16. datenpunk

    Lew Pitcher Guest

    Which loaded the address of the next sequential instruction into R14, and
    then branched to the address held in R15. Alternatively, with a named
    entrypoint, you would code
    BAL 14,ENTRYPOINT
    (where ENTRYPOINT was the symbolic name or external name of the instruction
    that started the subroutine).

    To get back to the (previous) next sequential instruction, you would execute
    a
    BR 14
    which would effectively branch to the address held in R14

    Of course, for proper linkage, the subroutine would save the "callers"
    registers on entry (that is, as the first part of the logic pointed to by
    R15), and establish it's own base register and SAVEAREA. On exit, the
    subroutine would restore all the saved registers immediately prior to the
    BR 14. The address of the save/restore area (aka the SAVEAREA) was always
    pointed to by R13

    So, the caller would do something like...
    LA 13,SA1
    ...
    BAL 14,DOIT CALL DOIT SUBROUTINE
    ...
    SA1 DS 18F

    and the callee would do something like
    DOIT STM 14,12,12(13) SAVE CALLER REGS EXCEPT R13 IN HIS SA
    BALR 12,0 LOAD NSI ADDRESS INTO R12
    USING *,12 R12 NOW OUR BASE REG
    ST 13,SA2+4 SAVE CALLERS R13 IN OUR SAVEAREA
    LA 13,SA2 R13 IS NOW OUR SAVEAREA
    ...
    L 13,SA2+4 R13 POINTS TO CALLER SAVEAREA
    LM 14,12,12(13) RESTORE CALLERS REGISTERS
    BR 14 RETURN TO CALLER
    SA2 DS 18F


    FWIW, many s360/370/390/etc apps stored the caller's arguments in space
    following the BAL or BALR that invoked the callee. Thus, the caller might
    look like
    LA 13,SA1
    ...
    BAL 14,DOIT CALL DOIT SUBROUTINE
    ARG1 DS F
    ARG2 DS F
    ARG3 DS H
    DS H
    ARG3 DS CL3
    ...
    SA1 DS 18F

    and the callee would access these parameters as offsets from the caller's
    R14. Of course, this meant that the callee had to adjust the caller's R14
    to account for the arguments /prior/ to performing the BR 14.
     
    Lew Pitcher, Jul 11, 2013
    #16
  17. (snip, someone wrote)
    (then I wrote)
    For external routines, you load R15 from an address constant.
    For internal ones, you could do that.
    Not only that, but there is an IBM utility program named IEFBR14.

    In its original implementation it contained just one instruction,
    but later an SR 15,15 was added such that the return code would
    be zero.
    Leaf routines don't need to provide a save area, but do need to
    save registers.
    For those used to a stack, this should be a double linked list.
    To do that, you instead load the new save area address into a
    register other than 13, save that in the previous save area,
    then copy to R13.
    Some might have done that, but again it won't work for varargs.
    Ones I remember, which I believe includes many system macros,
    use BAL 1,AROUND to load the address into R1 while branching
    around the arguments. R1 is the usual argument list register.

    But for internal subroutines you could do that.

    And, in case anyone noticed, to allow for recursion you must
    dynamically allocate the new save area. Most IBM utilities and
    compiled Fortran code used static allocation and static save areas.

    -- glen
     
    glen herrmannsfeldt, Jul 11, 2013
    #17
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.