Read/copy/call a functions machine code?

Discussion in 'C Programming' started by MisterE, Feb 18, 2008.

  1. MisterE

    MisterE Guest

    Is it possible to create a pointer to a function, and then get its size (the
    actual size the function takes in machine code), such that you can copy the
    function to another memory location. You could then modify it (I know it
    would be modifing the machine code) and then call the modified function via
    a function pointer?
    MisterE, Feb 18, 2008
    #1
    1. Advertising

  2. MisterE

    Chris Dollin Guest

    MisterE wrote:

    > Is it possible to create a pointer to a function, and then get its size (the
    > actual size the function takes in machine code), such that you can copy the
    > function to another memory location.


    Not in remotely portable C, no.

    Of course, if you're hacking machine code anyway, if your implementation
    non-portable allows you to read the bytes of a function via a char* you
    can "just" do a code disassembly and trace to find out the function's code.
    Just remember the result will be about as portable as a duck-sized lump of
    neutronium.

    --
    "Creation began." - James Blish, /A Clash of Cymbals/

    Hewlett-Packard Limited registered office: Cain Road, Bracknell,
    registered no: 690597 England Berks RG12 1HN
    Chris Dollin, Feb 18, 2008
    #2
    1. Advertising

  3. There's also the fact that on some modern processors, you'd need (?) to copy
    the code in a data segment, and this is non-executable. And the code segment
    is non-writable, or should be, so you can't copy it back.
    --
    Guillaume Dargaud
    http://www.gdargaud.net/
    Guillaume Dargaud, Feb 18, 2008
    #3
  4. MisterE

    Guest

    On Feb 18, 12:12 pm, "MisterE" <> wrote:
    > Is it possible to create a pointer to a function

    Sure, T (*ptr)(T); declares ptr as a function pointer that takes T and
    returns T.

    > and then get its size (the actual size the function takes in machine code),

    Nope, that cannot be done. There is not even machine code in a
    function pointer, and a function pointer does not have to point to
    actual memory in the implementation.

    > such that you can copy the function to another memory location. You could then modify it (I know it

    You can do that

    int (*ptr)(int) = putchar;
    T (*tmp)(T) = (T (*)(T))ptr; /* any type T is, this is guaranteed to
    work, cast is not needed */
    ptr = getchar;
    ptr();
    ptr = (int (*)(int))tmp; /* this is guaranteed to work too, cast not
    needed */
    ptr('\n');

    What my code demonstrates here is that there is no 'void *' for
    function pointers because you can store any function pointer to any
    other function pointer and back.

    > would be modifing the machine code) and then call the modified function via
    > a function pointer?


    ISO C does not define 'machine code'.
    Why do you ask here? try it!
    It doesn't seem to me you care about ISO C or portability, rather than
    getting that 'hack' work.
    , Feb 18, 2008
    #4
  5. "MisterE" <> wrote in message news:
    > Is it possible to create a pointer to a function, and then get its size
    > (the actual size the function takes in machine code), such that you can
    > copy the function to another memory location. You could then modify it (I
    > know it would be modifing the machine code) and then call the modified
    > function via a function pointer?
    >

    Yes and no.
    If you cast the function pointer to an unsigned char *, then most compilers
    will allow you to read the instructions until you hit upon a return
    instruction, which will be the end of the function.
    However it is not guaranteed, and most modern Oses frown on allowing code to
    be modified on the fly. There are ways around this, of course, or the OS
    itself wouldn't be able to load programs into memory.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
    Malcolm McLean, Feb 18, 2008
    #5
  6. MisterE

    Chris Dollin Guest

    Malcolm McLean wrote:

    > "MisterE" <> wrote in message news:
    >> Is it possible to create a pointer to a function, and then get its size
    >> (the actual size the function takes in machine code), such that you can
    >> copy the function to another memory location. You could then modify it (I
    >> know it would be modifing the machine code) and then call the modified
    >> function via a function pointer?
    >>

    > Yes and no.
    > If you cast the function pointer to an unsigned char *, then most compilers
    > will allow you to read the instructions until you hit upon a return
    > instruction, which will be the end of the function.


    Not necessarily.

    (a) A function may have multiple return points -- there's no need for there
    to be a single exit point in the code.

    (b) The compiler is at liberty to put returns in front of the function
    body, if that leads to more efficient code.

    (c) A tail-optimised function may have no returns at all, just jumps to
    other functions.

    A Poul Anderson quote occurs to me, but I can't remember where it comes
    from.

    --
    "Well begun is half done." - Proverb

    Hewlett-Packard Limited Cain Road, Bracknell, registered no:
    registered office: Berks RG12 1HN 690597 England
    Chris Dollin, Feb 18, 2008
    #6
  7. In article <>,
    Malcolm McLean <> wrote:

    >If you cast the function pointer to an unsigned char *, then most compilers
    >will allow you to read the instructions until you hit upon a return
    >instruction, which will be the end of the function.


    Qui?? Many a function would have multiple return instructions.

    On all of the machines that I have had experience with that allowed
    the code to be examined under program control, there was no limit
    such as "until a return instruction: reading was possible until
    you ran off the end of the readable memory in that address block
    (the exact end of which was not necessarily predicatable and might
    not have anything to do with the location of return instructions.)

    But then I've used processors that didn't -have- return
    instructions, just branch instructions that took the destination
    location from memory or a register.

    Some systems might put a "guard page" (or guard segment) after the
    end of a routine to catch overruns, but that's more common for
    data segments than for instruction segments.
    --
    "The slogans of an inadequate criticism peddle ideas to fashion"
    -- Walter Benjamin
    Walter Roberson, Feb 18, 2008
    #7
  8. MisterE

    Bartc Guest

    "Malcolm McLean" <> wrote in message
    news:...
    > "MisterE" <> wrote in message news:
    >> Is it possible to create a pointer to a function, and then get its size
    >> (the actual size the function takes in machine code), such that you can
    >> copy the function to another memory location. You could then modify it (I
    >> know it would be modifing the machine code) and then call the modified
    >> function via a function pointer?
    >>

    > Yes and no.
    > If you cast the function pointer to an unsigned char *, then most
    > compilers will allow you to read the instructions until you hit upon a
    > return instruction, which will be the end of the function.


    I take it you've never actually tried this? :)

    --
    Bart
    Bartc, Feb 18, 2008
    #8
  9. In article <fpbnag$pt2$2p3.fr>,
    Guillaume Dargaud <> wrote:

    >There's also the fact that on some modern processors, you'd need (?) to copy
    >the code in a data segment, and this is non-executable.


    Whether it's executable is typically controlled by the operating
    system, and on systems that make data non-executable by default there
    is bound to be some method for changing that.

    For example, on unix the mmap() system call allows you to specify the
    desired permissions for an allocated area of memory, and mprotect()
    allows you to change it.

    (In practice, on many systems even the stack is executable, which is
    the commonest way to exploit buffer overflows.)

    -- Richard
    --
    :wq
    Richard Tobin, Feb 18, 2008
    #9
  10. "Bartc" <> wrote in message
    news:7Deuj.9218$...
    >
    > "Malcolm McLean" <> wrote in message
    > news:...
    >> "MisterE" <> wrote in message news:
    >>> Is it possible to create a pointer to a function, and then get its size
    >>> (the actual size the function takes in machine code), such that you can
    >>> copy the function to another memory location. You could then modify it
    >>> (I know it would be modifing the machine code) and then call the
    >>> modified function via a function pointer?
    >>>

    >> Yes and no.
    >> If you cast the function pointer to an unsigned char *, then most
    >> compilers will allow you to read the instructions until you hit upon a
    >> return instruction, which will be the end of the function.

    >
    > I take it you've never actually tried this? :)
    >

    Here we go

    int compstr(const void *e1, const void *e2)
    {
    const char * const *str1 = e1;
    const char * const *str2 = e2;

    return (int) strcmp(*str1, *str2);
    }

    int main(void)
    {
    int i;
    unsigned char *fptr;

    fptr = (unsigned char *) compstr;
    for(i=0;i<10;i++)
    printf("%d, ", fptr);
    printf("\n");
    return 0;
    }

    I get the output 139, 68, 36, 8, 139, 76, 36, 4, 139, 16

    It seems to me that it's reading the machine code of the function OK. What I
    haven't done is tried to disassemble.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
    Malcolm McLean, Feb 18, 2008
    #10
  11. In article <>,
    Malcolm McLean <> wrote:

    >>> If you cast the function pointer to an unsigned char *, then most
    >>> compilers will allow you to read the instructions until you hit upon a
    >>> return instruction, which will be the end of the function.


    >> I take it you've never actually tried this? :)


    >[...]


    >I get the output 139, 68, 36, 8, 139, 76, 36, 4, 139, 16


    >It seems to me that it's reading the machine code of the function OK. What I
    >haven't done is tried to disassemble.


    Well, it's reading *something*.

    I think the objection was to the idea that you can tell the end of the
    function by looking for a return instruction. There may be multiple
    return instructions; the code might not end with one (it might end
    with a backwards jump); and there may be embedded data that looks like
    a return instruction.

    It's even possible (though unlikely) that the code for one function is
    mixed up with the code for another.

    -- Richard
    --
    :wq
    Richard Tobin, Feb 18, 2008
    #11
  12. "Richard Tobin" <> wrote in message
    news:fpcdk2$1v7a$...
    > In article <>,
    > Malcolm McLean <> wrote:
    >
    >>>> If you cast the function pointer to an unsigned char *, then most
    >>>> compilers will allow you to read the instructions until you hit upon a
    >>>> return instruction, which will be the end of the function.

    >
    >>> I take it you've never actually tried this? :)

    >
    >>[...]

    >
    >>I get the output 139, 68, 36, 8, 139, 76, 36, 4, 139, 16

    >
    >>It seems to me that it's reading the machine code of the function OK. What
    >>I
    >>haven't done is tried to disassemble.

    >
    > Well, it's reading *something*.
    >
    > I think the objection was to the idea that you can tell the end of the
    > function by looking for a return instruction. There may be multiple
    > return instructions; the code might not end with one (it might end
    > with a backwards jump); and there may be embedded data that looks like
    > a return instruction.
    >
    > It's even possible (though unlikely) that the code for one function is
    > mixed up with the code for another.
    >

    Oh, OK.
    If you take the function's address then on any normal compiler it will be
    possible to execute it as a function. So that fixes the inlining problem. If
    the compiler is super-intelligent you will have to fool it with something
    like
    if( sqrt(x) < 0.0)
    (*fptr)();

    Embedded or multiple returns are a bit more complex. Often machine code
    obeys the one in one out rule of a single point of entry and a single point
    of exit, and you can pick up the real return by looking for the stack
    manipulation that immediately precedes it.
    However it might be non-trival to disentangle this, I'd agree.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
    Malcolm McLean, Feb 18, 2008
    #12
  13. MisterE

    Bartc Guest

    "Malcolm McLean" <> wrote in message
    news:...
    >
    > "Bartc" <> wrote in message
    > news:7Deuj.9218$...
    >>
    >> "Malcolm McLean" <> wrote in message
    >> news:...
    >>> "MisterE" <> wrote in message news:
    >>>> Is it possible to create a pointer to a function, and then get its size


    >>> If you cast the function pointer to an unsigned char *, then most
    >>> compilers will allow you to read the instructions until you hit upon a
    >>> return instruction, which will be the end of the function.

    >>
    >> I take it you've never actually tried this? :)
    >>


    > int compstr(const void *e1, const void *e2)
    > {
    > const char * const *str1 = e1;
    > const char * const *str2 = e2;
    >
    > return (int) strcmp(*str1, *str2);
    > }
    >
    > int main(void)
    > {
    > int i;
    > unsigned char *fptr;
    >
    > fptr = (unsigned char *) compstr;
    > for(i=0;i<10;i++)
    > printf("%d, ", fptr);
    > printf("\n");
    > return 0;
    > }
    >
    > I get the output 139, 68, 36, 8, 139, 76, 36, 4, 139, 16
    >
    > It seems to me that it's reading the machine code of the function OK.


    Sure: the first 8 bytes correspond to the following x86 instructions:

    8B442408 mov eax,[esp+8]
    8B4C2404 mov ecx,[esp+4]

    So it's likely to be running on an x86 processor.

    > What I haven't done is tried to disassemble.


    This is the problem: x86 uses instructions of varying width. You're looking
    for C3h or C2h RET opcodes, but these values may occur as part of the
    address mode byte or memory address operand or immediate data operand. Or
    there may be a whole block of data bytes (depending on how the compiler does
    things).

    So if you find C3 for example, it might be data not a RET opcode, and even
    if it is, it's not necessarily the physically last one in the function.

    Since the OP must be familiar with the machine code on his system, he can
    see whether functions are generated sequentially (function A followed by
    function B in the source has the same order in machine code). Then maybe B-A
    (where B is an actual or dummy following function) will give a good estimate
    for the size of the machine code.

    --
    Bart
    Bartc, Feb 18, 2008
    #13
  14. In article <>,
    Malcolm McLean <> wrote:

    >Embedded or multiple returns are a bit more complex. Often machine code
    >obeys the one in one out rule of a single point of entry and a single point
    >of exit, and you can pick up the real return by looking for the stack
    >manipulation that immediately precedes it.


    Just as a data point, if I give gcc the -fomit-frame-pointer option on
    my x86 Mac, there's no stack manipulation before a return, and
    (presumably as a consequence) it's happy to emit multiple return
    instructions.

    -- Richard
    --
    :wq
    Richard Tobin, Feb 18, 2008
    #14
  15. "Bartc" <> wrote in message
    news:mZiuj.9396$...
    >
    > "Malcolm McLean" <> wrote in message
    > news:...
    >>

    >
    >> What I haven't done is tried to disassemble.

    >
    > This is the problem: x86 uses instructions of varying width. You're
    > looking for C3h or C2h RET opcodes, but these values may occur as part of
    > the address mode byte or memory address operand or immediate data operand.
    > > Or there may be a whole block of data bytes (depending on how the

    > compiler does things).
    >

    So it seems you do need a full disassembly and there is no easy shortcut.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
    Malcolm McLean, Feb 18, 2008
    #15
  16. MisterE

    Bartc Guest

    "Malcolm McLean" <> wrote in message
    news:...
    >
    > "Bartc" <> wrote in message
    > news:mZiuj.9396$...
    >>
    >> "Malcolm McLean" <> wrote in message
    >> news:...
    >>>

    >>
    >>> What I haven't done is tried to disassemble.

    >>
    >> This is the problem: x86 uses instructions of varying width. You're


    > So it seems you do need a full disassembly and there is no easy shortcut.


    Actually, that seems the most sensible solution, to look at a disassembly
    and just see the size occupied. Unless the OP is attempting something even
    hairier than s/he seems to be.

    --
    Bart
    Bartc, Feb 18, 2008
    #16
  17. MisterE

    Flash Gordon Guest

    Richard Tobin wrote, On 18/02/08 17:24:
    > In article <>,
    > Malcolm McLean <> wrote:
    >
    >> Embedded or multiple returns are a bit more complex. Often machine code
    >> obeys the one in one out rule of a single point of entry and a single point
    >> of exit,


    How many optimisers have you analysed? With real code that where there
    are real chances for optimisation.

    >> and you can pick up the real return by looking for the stack
    >> manipulation that immediately precedes it.

    >
    > Just as a data point, if I give gcc the -fomit-frame-pointer option on
    > my x86 Mac, there's no stack manipulation before a return, and
    > (presumably as a consequence) it's happy to emit multiple return
    > instructions.


    On another processor I've used you can adjust an address register as
    part of the return statement so even with the stack manipulation there
    is no penalty to having multiple return instructions. Then there are the
    processors with a delayed return instruction, so you might need to copy
    some of the instructions after the return instruction (delayed branch
    can be real fun when jumping over one instruction and tracing the
    execution using a logic analyser).

    Oh, and it is only on a strange processor with these types of things
    that I ever came across a reason to copy code, however there where
    things you could do with the implementation to find the size of a
    function at *build* time and thus avoid the need for any hunting for
    return instructions.
    --
    Flash Gordon
    Flash Gordon, Feb 18, 2008
    #17
  18. >Is it possible to create a pointer to a function, and then get its size (the
    >actual size the function takes in machine code), such that you can copy the
    >function to another memory location. You could then modify it (I know it
    >would be modifing the machine code) and then call the modified function via
    >a function pointer?


    funccat() is not implemented on any known system. It takes a pointer
    to a function f(a) and another pointer to function g(b), where "a"
    and the return type of g have the same type, and returns a pointer
    to a function h(b) such that h(b) = f(g(b)).

    How much do you know about assembly language? If you have some
    experience with it, you should be familiar with the idea of
    "relocation": the bytes in a function (especially those representing
    address fields within instructions) are different depending on where
    it is loaded. Some of this can be avoided by the use of pc-relative
    addressing on machines which have it. Some (for example, reference
    to static data areas or other functions) cannot. Relocation
    information is likely present in shared libraries, and may not be
    in executables. Also, you may have the problem that the copied
    function needs a static data area *different* from the static data
    area of the original function.

    There is no guarantee that the code for a function is contiguous
    and non-overlapping with the code for a different function. Optimizers
    can cause code sharing between functions. In one compiler, where
    the linkage conventions required a bit of code, generally had a
    common function return sequence in the whole program and most of
    the functions used it. This could be an unpleasant surprise if you
    set a breakpoint there with a debugger, expecting to see returns
    from one function.
    Gordon Burditt, Feb 18, 2008
    #18
  19. MisterE

    MisterE Guest


    > Actually, that seems the most sensible solution, to look at a disassembly
    > and just see the size occupied. Unless the OP is attempting something even
    > hairier than s/he seems to be.


    What I am trying to do is embed encrypted machine code into the source file,
    then during run-time the program decrypts its own functions and runs them.
    Currently its all done in assembly but I would like to move to C as much as
    possible. Its seems I can indeed simply cast as unsigned char and read the
    source code, so the answer is yes. I will try copying that data to memory
    and trying to execute later.
    MisterE, Feb 18, 2008
    #19
  20. MisterE

    MisterE Guest


    > There is no guarantee that the code for a function is contiguous
    > and non-overlapping with the code for a different function. Optimizers
    > can cause code sharing between functions. In one compiler, where
    > the linkage conventions required a bit of code, generally had a
    > common function return sequence in the whole program and most of
    > the functions used it. This could be an unpleasant surprise if you
    > set a breakpoint there with a debugger, expecting to see returns
    > from one function.


    Currently I do it all in assembly (this is on an ARM processor), but I was
    just wondering what could be done with C. I can assemble the functions so
    that operate independant of their position and only use pop/pushes off the
    stack, and they run from any location we put them. I was just trying to
    write a C interface to this instead of assembly.
    MisterE, Feb 18, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Xiangliang Meng
    Replies:
    1
    Views:
    1,571
    Victor Bazarov
    Jun 21, 2004
  2. Alex
    Replies:
    2
    Views:
    1,195
  3. Nico Grubert
    Replies:
    1
    Views:
    2,585
    Markus Rosenstihl
    Sep 21, 2005
  4. Steve Holden
    Replies:
    0
    Views:
    401
    Steve Holden
    Sep 21, 2005
  5. Replies:
    26
    Views:
    2,075
    Roland Pibinger
    Sep 1, 2006
Loading...

Share This Page