Read/copy/call a functions machine code?

M

MisterE

Is it possible to create a pointer to a function, and then get its size (the
actual size the function takes in machine code), such that you can copy the
function to another memory location. You could then modify it (I know it
would be modifing the machine code) and then call the modified function via
a function pointer?
 
C

Chris Dollin

MisterE said:
Is it possible to create a pointer to a function, and then get its size (the
actual size the function takes in machine code), such that you can copy the
function to another memory location.

Not in remotely portable C, no.

Of course, if you're hacking machine code anyway, if your implementation
non-portable allows you to read the bytes of a function via a char* you
can "just" do a code disassembly and trace to find out the function's code.
Just remember the result will be about as portable as a duck-sized lump of
neutronium.
 
G

Guillaume Dargaud

There's also the fact that on some modern processors, you'd need (?) to copy
the code in a data segment, and this is non-executable. And the code segment
is non-writable, or should be, so you can't copy it back.
 
V

vippstar

Is it possible to create a pointer to a function
Sure, T (*ptr)(T); declares ptr as a function pointer that takes T and
returns T.
and then get its size (the actual size the function takes in machine code),
Nope, that cannot be done. There is not even machine code in a
function pointer, and a function pointer does not have to point to
actual memory in the implementation.
such that you can copy the function to another memory location. You could then modify it (I know it
You can do that

int (*ptr)(int) = putchar;
T (*tmp)(T) = (T (*)(T))ptr; /* any type T is, this is guaranteed to
work, cast is not needed */
ptr = getchar;
ptr();
ptr = (int (*)(int))tmp; /* this is guaranteed to work too, cast not
needed */
ptr('\n');

What my code demonstrates here is that there is no 'void *' for
function pointers because you can store any function pointer to any
other function pointer and back.
would be modifing the machine code) and then call the modified function via
a function pointer?

ISO C does not define 'machine code'.
Why do you ask here? try it!
It doesn't seem to me you care about ISO C or portability, rather than
getting that 'hack' work.
 
M

Malcolm McLean

MisterE said:
Is it possible to create a pointer to a function, and then get its size
(the actual size the function takes in machine code), such that you can
copy the function to another memory location. You could then modify it (I
know it would be modifing the machine code) and then call the modified
function via a function pointer?
Yes and no.
If you cast the function pointer to an unsigned char *, then most compilers
will allow you to read the instructions until you hit upon a return
instruction, which will be the end of the function.
However it is not guaranteed, and most modern Oses frown on allowing code to
be modified on the fly. There are ways around this, of course, or the OS
itself wouldn't be able to load programs into memory.
 
C

Chris Dollin

Malcolm said:
Yes and no.
If you cast the function pointer to an unsigned char *, then most compilers
will allow you to read the instructions until you hit upon a return
instruction, which will be the end of the function.

Not necessarily.

(a) A function may have multiple return points -- there's no need for there
to be a single exit point in the code.

(b) The compiler is at liberty to put returns in front of the function
body, if that leads to more efficient code.

(c) A tail-optimised function may have no returns at all, just jumps to
other functions.

A Poul Anderson quote occurs to me, but I can't remember where it comes
from.
 
W

Walter Roberson

Malcolm McLean said:
If you cast the function pointer to an unsigned char *, then most compilers
will allow you to read the instructions until you hit upon a return
instruction, which will be the end of the function.

Qui?? Many a function would have multiple return instructions.

On all of the machines that I have had experience with that allowed
the code to be examined under program control, there was no limit
such as "until a return instruction: reading was possible until
you ran off the end of the readable memory in that address block
(the exact end of which was not necessarily predicatable and might
not have anything to do with the location of return instructions.)

But then I've used processors that didn't -have- return
instructions, just branch instructions that took the destination
location from memory or a register.

Some systems might put a "guard page" (or guard segment) after the
end of a routine to catch overruns, but that's more common for
data segments than for instruction segments.
 
B

Bartc

Malcolm McLean said:
Yes and no.
If you cast the function pointer to an unsigned char *, then most
compilers will allow you to read the instructions until you hit upon a
return instruction, which will be the end of the function.

I take it you've never actually tried this? :)
 
R

Richard Tobin

Guillaume Dargaud said:
There's also the fact that on some modern processors, you'd need (?) to copy
the code in a data segment, and this is non-executable.

Whether it's executable is typically controlled by the operating
system, and on systems that make data non-executable by default there
is bound to be some method for changing that.

For example, on unix the mmap() system call allows you to specify the
desired permissions for an allocated area of memory, and mprotect()
allows you to change it.

(In practice, on many systems even the stack is executable, which is
the commonest way to exploit buffer overflows.)

-- Richard
 
M

Malcolm McLean

Bartc said:
I take it you've never actually tried this? :)
Here we go

int compstr(const void *e1, const void *e2)
{
const char * const *str1 = e1;
const char * const *str2 = e2;

return (int) strcmp(*str1, *str2);
}

int main(void)
{
int i;
unsigned char *fptr;

fptr = (unsigned char *) compstr;
for(i=0;i<10;i++)
printf("%d, ", fptr);
printf("\n");
return 0;
}

I get the output 139, 68, 36, 8, 139, 76, 36, 4, 139, 16

It seems to me that it's reading the machine code of the function OK. What I
haven't done is tried to disassemble.
 
R

Richard Tobin

I take it you've never actually tried this? :)

I get the output 139, 68, 36, 8, 139, 76, 36, 4, 139, 16[/QUOTE]
It seems to me that it's reading the machine code of the function OK. What I
haven't done is tried to disassemble.

Well, it's reading *something*.

I think the objection was to the idea that you can tell the end of the
function by looking for a return instruction. There may be multiple
return instructions; the code might not end with one (it might end
with a backwards jump); and there may be embedded data that looks like
a return instruction.

It's even possible (though unlikely) that the code for one function is
mixed up with the code for another.

-- Richard
 
M

Malcolm McLean

Richard Tobin said:
If you cast the function pointer to an unsigned char *, then most
compilers will allow you to read the instructions until you hit upon a
return instruction, which will be the end of the function.
I take it you've never actually tried this? :)

I get the output 139, 68, 36, 8, 139, 76, 36, 4, 139, 16
It seems to me that it's reading the machine code of the function OK. What
I
haven't done is tried to disassemble.

Well, it's reading *something*.

I think the objection was to the idea that you can tell the end of the
function by looking for a return instruction. There may be multiple
return instructions; the code might not end with one (it might end
with a backwards jump); and there may be embedded data that looks like
a return instruction.

It's even possible (though unlikely) that the code for one function is
mixed up with the code for another.
[/QUOTE]
Oh, OK.
If you take the function's address then on any normal compiler it will be
possible to execute it as a function. So that fixes the inlining problem. If
the compiler is super-intelligent you will have to fool it with something
like
if( sqrt(x) < 0.0)
(*fptr)();

Embedded or multiple returns are a bit more complex. Often machine code
obeys the one in one out rule of a single point of entry and a single point
of exit, and you can pick up the real return by looking for the stack
manipulation that immediately precedes it.
However it might be non-trival to disentangle this, I'd agree.
 
B

Bartc

int compstr(const void *e1, const void *e2)
{
const char * const *str1 = e1;
const char * const *str2 = e2;

return (int) strcmp(*str1, *str2);
}

int main(void)
{
int i;
unsigned char *fptr;

fptr = (unsigned char *) compstr;
for(i=0;i<10;i++)
printf("%d, ", fptr);
printf("\n");
return 0;
}

I get the output 139, 68, 36, 8, 139, 76, 36, 4, 139, 16

It seems to me that it's reading the machine code of the function OK.


Sure: the first 8 bytes correspond to the following x86 instructions:

8B442408 mov eax,[esp+8]
8B4C2404 mov ecx,[esp+4]

So it's likely to be running on an x86 processor.
What I haven't done is tried to disassemble.

This is the problem: x86 uses instructions of varying width. You're looking
for C3h or C2h RET opcodes, but these values may occur as part of the
address mode byte or memory address operand or immediate data operand. Or
there may be a whole block of data bytes (depending on how the compiler does
things).

So if you find C3 for example, it might be data not a RET opcode, and even
if it is, it's not necessarily the physically last one in the function.

Since the OP must be familiar with the machine code on his system, he can
see whether functions are generated sequentially (function A followed by
function B in the source has the same order in machine code). Then maybe B-A
(where B is an actual or dummy following function) will give a good estimate
for the size of the machine code.
 
R

Richard Tobin

Malcolm McLean said:
Embedded or multiple returns are a bit more complex. Often machine code
obeys the one in one out rule of a single point of entry and a single point
of exit, and you can pick up the real return by looking for the stack
manipulation that immediately precedes it.

Just as a data point, if I give gcc the -fomit-frame-pointer option on
my x86 Mac, there's no stack manipulation before a return, and
(presumably as a consequence) it's happy to emit multiple return
instructions.

-- Richard
 
M

Malcolm McLean

Bartc said:
This is the problem: x86 uses instructions of varying width. You're
looking for C3h or C2h RET opcodes, but these values may occur as part of
the address mode byte or memory address operand or immediate data operand.
compiler does things).
So it seems you do need a full disassembly and there is no easy shortcut.
 
B

Bartc

So it seems you do need a full disassembly and there is no easy shortcut.

Actually, that seems the most sensible solution, to look at a disassembly
and just see the size occupied. Unless the OP is attempting something even
hairier than s/he seems to be.
 
F

Flash Gordon

Richard Tobin wrote, On 18/02/08 17:24:
How many optimisers have you analysed? With real code that where there
are real chances for optimisation.
Just as a data point, if I give gcc the -fomit-frame-pointer option on
my x86 Mac, there's no stack manipulation before a return, and
(presumably as a consequence) it's happy to emit multiple return
instructions.

On another processor I've used you can adjust an address register as
part of the return statement so even with the stack manipulation there
is no penalty to having multiple return instructions. Then there are the
processors with a delayed return instruction, so you might need to copy
some of the instructions after the return instruction (delayed branch
can be real fun when jumping over one instruction and tracing the
execution using a logic analyser).

Oh, and it is only on a strange processor with these types of things
that I ever came across a reason to copy code, however there where
things you could do with the implementation to find the size of a
function at *build* time and thus avoid the need for any hunting for
return instructions.
 
G

Gordon Burditt

Is it possible to create a pointer to a function, and then get its size (the
actual size the function takes in machine code), such that you can copy the
function to another memory location. You could then modify it (I know it
would be modifing the machine code) and then call the modified function via
a function pointer?

funccat() is not implemented on any known system. It takes a pointer
to a function f(a) and another pointer to function g(b), where "a"
and the return type of g have the same type, and returns a pointer
to a function h(b) such that h(b) = f(g(b)).

How much do you know about assembly language? If you have some
experience with it, you should be familiar with the idea of
"relocation": the bytes in a function (especially those representing
address fields within instructions) are different depending on where
it is loaded. Some of this can be avoided by the use of pc-relative
addressing on machines which have it. Some (for example, reference
to static data areas or other functions) cannot. Relocation
information is likely present in shared libraries, and may not be
in executables. Also, you may have the problem that the copied
function needs a static data area *different* from the static data
area of the original function.

There is no guarantee that the code for a function is contiguous
and non-overlapping with the code for a different function. Optimizers
can cause code sharing between functions. In one compiler, where
the linkage conventions required a bit of code, generally had a
common function return sequence in the whole program and most of
the functions used it. This could be an unpleasant surprise if you
set a breakpoint there with a debugger, expecting to see returns
from one function.
 
M

MisterE

Actually, that seems the most sensible solution, to look at a disassembly
and just see the size occupied. Unless the OP is attempting something even
hairier than s/he seems to be.

What I am trying to do is embed encrypted machine code into the source file,
then during run-time the program decrypts its own functions and runs them.
Currently its all done in assembly but I would like to move to C as much as
possible. Its seems I can indeed simply cast as unsigned char and read the
source code, so the answer is yes. I will try copying that data to memory
and trying to execute later.
 
M

MisterE

There is no guarantee that the code for a function is contiguous
and non-overlapping with the code for a different function. Optimizers
can cause code sharing between functions. In one compiler, where
the linkage conventions required a bit of code, generally had a
common function return sequence in the whole program and most of
the functions used it. This could be an unpleasant surprise if you
set a breakpoint there with a debugger, expecting to see returns
from one function.

Currently I do it all in assembly (this is on an ARM processor), but I was
just wondering what could be done with C. I can assemble the functions so
that operate independant of their position and only use pop/pushes off the
stack, and they run from any location we put them. I was just trying to
write a C interface to this instead of assembly.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top