Getting the size of a C function

J

john

Hi,

I need to know the size of a function or module because I need to
temporarily relocate the function or module from flash into sram to
do firmware updates.

How can I determine that at runtime? The
sizeof( myfunction)
generates an error: "size of function unknown".

Thanks.
 
D

David Empson

john said:
I need to know the size of a function or module because I need to
temporarily relocate the function or module from flash into sram to
do firmware updates.

In general, C does not provide a mechanism to find the size of a
function. Some compilers might implement sizeof(function) but it is not
standard C.

If your compiler always outputs functions to the object code in the same
order as they appear in the source code, you could take the address of
the next function and the address of the function in question, convert
them to (char *) and get the difference between them. This assumes you
never rearrange your source code - comment well!

If your compiler outputs functions in a somewhat unpredictable order
then this won't work.

The technique I used for a similar problem was to examine the object
code to determine the size of the function manually, added a safety
margin to allow for potential code growth, and embedded that as a
constant in the source code. It then needs to be re-checked after source
changes (or a revised compiler) to confirm that the size hasn't grown
too much.
 
J

jacob navia

john a écrit :
Hi,

I need to know the size of a function or module because I need to
temporarily relocate the function or module from flash into sram to
do firmware updates.

How can I determine that at runtime? The
sizeof( myfunction)
generates an error: "size of function unknown".

Thanks.

(1)
There is the method already mentioned that subtracts two function
addresses. If your compiler is "well behaved" that could work
except for the last function in the module...

(2)
Another method is to generate an assembly listing and insert at the end
of each function a "marker" by just using (the equivalent) of
.byte 0,1,2,3,4,5,6,7,8,9,8,7,6,5,4,3,2,1
Then, at runtime you load the code and search for the terminator marker
Obviously the terminator should contain at least one illegal instruction
to be sure that it doesn't appear in the code itself

(3)
Yet another method is to generate a linker map table and read the
size of each function from the table, what comes to method (1) but
at compile time.

(4) Another method is to locate all function prologues and
function epilogues ofthe functions in the code you generate.
Locating the prologue means searching for the sequence of
instructions that the compiler generates for each function start,
probably the saving of some registers and the allocating of stack
space for the local variables.
Caveat: It could be that for certain functions the compiler
doesn't generate any prologue... specially if the function
doesn't call any other functions and receives no arguments...

Locating the epilogue means searching for the return instruction
Caveat: It could be that the compiler generates several...
You should find the last one, before you find a prologue.

From all those possibilities, the option (3) looks the more
promising one to me. Method (1) isn't very precise and
there is the problem of the last function in a compilation unit.

Method 2 is a PITA since you have to generate the assembly,
insert the markers, re-assemble...

Method (4) needs a disassembler, and a LOT of parsing work,
and it is very sensitive to compilation options.
 
B

BGB / cr88192

john said:
Hi,

I need to know the size of a function or module because I need to
temporarily relocate the function or module from flash into sram to
do firmware updates.

How can I determine that at runtime? The
sizeof( myfunction)
generates an error: "size of function unknown".

my recommendation:
in this case, it might actually be better advised to generate the function
as a chunk of arch-specific ASM or machine code (ASM is preferable IMO, but
requires an assembler...), which could then be located wherever (such as the
heap).

the reason for suggesting this is that, for many archs, relocating compiled
code (for example, via memcpy) may very well cause it to break. at least
with custom ASM, one can be more certain that the code will survive the
relocation.

another possibility would be to compile some code itself as a relocatable
module (such as an ELF or COFF object or image or whatever is common on the
arch), which can then be stored as a glob of binary data (this can be done
fairly easily by writing a tool to convert the module into a an array of
bytes in C syntax which can be linked into the image). when needed, this
module is itself relocated to the target address, and jumped to.

this would allow more complex modules to be used (and is less-effort in the
non-trivial case than would be writing it in ASM or raw machine code).


keep in mind that there is no really "good" or "general purpose" ways to do
these sorts of tasks.
 
G

Grant Edwards

in this case, it might actually be better advised to generate the function
as a chunk of arch-specific ASM or machine code (ASM is preferable IMO, but
requires an assembler...), which could then be located wherever (such as the
heap).

IMO, the "right" thing to do is to tell the compiler to put the
function into a separate section and then have it linked so
that it's "located" to run in RAM at the proper address but
stored in ROM.

That way you know the code will work correctly when it's run
from RAM. Defining approprate symbols in the linker command
file will allow the program to refer to the start and end of
the section's address in ROM.

The OP needs to spend some time studying the manuals for his
compiler and linker.
 
K

Keith Thompson

WangoTango said:
Good question, and I would like to know if there is an easy way to do it
during runtime, and a portable way would be nice too. I would probably
look at the map file and use the size I calculated from there, but
that's surely not runtime.

You can get the starting address of the function pretty easy, but how
about the end? Hmmm, gotta' think about that.

You can't even portably assume that &func is the memory address of the
beginning of the function. I think there are systems (AS/400) where
function pointers are not just machine addresses.

Given whatever it is you're doing, you're probably not too concerned
with portability, so that likely not to be an issue. But there's no
portable way in C to determine the size of a function, so you're more
likely to get help somewhere other than comp.lang.c.
 
D

David Empson

Tim Wescott said:
I've seen it done like this:

whatever my_eeprom_burning_code()
{
// insert deathless prose here
}

void end_my_eeprom_burning_code(void)
{
}

As long as the second function doesn't get optimized away or moved,
you're home free.

Except if the compiler outputs the functions in reverse order, as one
I've used does (which means you need a "begin_my_eeprom_burning_code"
dummy function instead). You need to know the pattern generated by your
particular compiler, which might depend on factors other than the order
the functions appear in the source code.
 
N

Nobody

I need to know the size of a function or module because I need to
temporarily relocate the function or module from flash into sram to
do firmware updates.

Do you need to be able to run it from RAM? If so, simply memcpy()ing it
may not work. And you would also need to copy anything which the function
calls (just because there aren't any explicit function calls in the source
code, that doesn't mean that there aren't any in the resulting object code).
 
M

Mark Borgerson

Do you need to be able to run it from RAM? If so, simply memcpy()ing it
may not work. And you would also need to copy anything which the function
calls (just because there aren't any explicit function calls in the source
code, that doesn't mean that there aren't any in the resulting object code).
At the expense of a few words of code and a parameter, you could do


int MoveMe(...., bool findend){
if(!findend){

// do all the stuff the function is supposed to do

} else Markend();

}


Where Markend is a function that pulls the return
address off the stack and stashes it somewhere
convenient. Markend may have to have some
assembly code. External code can then
subtract the function address from the address
stashed by Markend(), add a safety margin, and
know how many bytes to move to RAM.


Mark Borgerson
 
P

Paul Keinanen

I need to know the size of a function or module because I need to
temporarily relocate the function or module from flash into sram to
do firmware updates.

Do you actually want to execute that function while in RAM or just
store some bytes into safety during the update ?

If you intend to execute, the code must be position independent e.g.
with PC relative branch, however accessing some fixed addresses, such
as memory mapped peripherals, some absolute addressing mode must be
used, no PC relative addressing modes can not be used.
 
J

jacob navia

Mark Borgerson a écrit :
At the expense of a few words of code and a parameter, you could do


int MoveMe(...., bool findend){
if(!findend){

// do all the stuff the function is supposed to do

} else Markend();

}


Where Markend is a function that pulls the return
address off the stack and stashes it somewhere
convenient. Markend may have to have some
assembly code. External code can then
subtract the function address from the address
stashed by Markend(), add a safety margin, and
know how many bytes to move to RAM.


Mark Borgerson

Sorry Mark but this is totally WRONG!

The return address contains the address where the CPU RETURNS TO
when the current function is finished, not the end of the
current function!!!

The return address will be in the middle of another function, that CALLED
this one.
 
S

Stefan Reuther

john said:
I need to know the size of a function or module because I need to
temporarily relocate the function or module from flash into sram to
do firmware updates.

How can I determine that at runtime?

You can't in standard C, because functions are not contiguous objects.

Most environments have some way of placing a function in a special
section (using pragmas or things like __attribute__), and a possibility
to acquire position and size of that section (using linker magic).

In general, you cannot assume a function generates just a single blob of
assembly code in the ".text" sections. For example, functions containing
string or floating-point literals, or large switches, often generate
some data in ".rodata", static variables end up in ".data" or ".bss",
and if you're doing C++, you'll get some exception handling tables as well.


Stefan
 
F

Flash Gordon

jacob said:
john a écrit :

(1)

Method (4) needs a disassembler, and a LOT of parsing work,
and it is very sensitive to compilation options.

You forgot to mention the method which, in my experience, is by far the
best, most reliable, and easiest method.

Read the manual!

This is NOT a glib suggestion, on the one occasion where I needed to do
something similar, but for different reasons, I read the manual and low
and behold the implementation documented a nice and relatively easy way
to achieve when I wanted. In fact, using any other method was almost
guaranteed to produce a function that did not work correctly. After all,
there could be references to absolute addresses which will be wrong
after the code is moved!

In my case, it was compile the function and get it in to a specific
section (I can't remember how now) and then tell the linker to locate
the section at one location for programming in to the ROM but set up all
the addresses as if it would be placed in another section. Then use some
link-time constants (can't remember the details) for moving it. As I
say, it was all fully documented in the manuals!

There is every chance that someone on comp.arch.embedded might know how
to do it on the target platform, if the OP specifies the target platform.
 
B

BGB / cr88192

Grant Edwards said:
IMO, the "right" thing to do is to tell the compiler to put the
function into a separate section and then have it linked so
that it's "located" to run in RAM at the proper address but
stored in ROM.

That way you know the code will work correctly when it's run
from RAM. Defining approprate symbols in the linker command
file will allow the program to refer to the start and end of
the section's address in ROM.

this is a little closer to the second option, of having a secondary image
file embedded as data...

The OP needs to spend some time studying the manuals for his
compiler and linker.

this is, assuming the linker or image format actually supports the "separate
section" idea...

dunno about ELF, but PE/COFF would not support this, since it would require
breaking some of the internal assumptions of the file format (for example,
that the image is continuous from ImageBase to ImageBase+ImageSize, ...).

ELF may have similar restrictions (actually, I think most ELF images are
position independent anyways, so one could relocate and adjust the GOT for
an image easily enough).

(note that embedding an additional PE/COFF of ELF image would not likely be
"that difficult", and the formats are not particularly difficult to work
with).
a fixed-address PE/COFF image is likely an easy case, since one can copy the
contents of the sections and then call into it.

for fixed-address, producing a raw binary image (supported by GNU ld, ...)
is also probably a good option, since in this case the resulting image can
be copied as a raw chunk of data (no need to relocate or worry about
file-format), and jumped into.


can't say so much about other file formats though...
 
J

James Harris

....
....

Sorry Mark but this is totally WRONG!

The return address contains the address where the CPU RETURNS TO
when the current function is finished, not the end of the
current function!!!

So in Mark's example what will it be in Markend()?
The return address will be in the middle of another function, that CALLED
this one.

i.e. Moveme()?

James
 
M

Mark Borgerson

Mark Borgerson a écrit :

Sorry Mark but this is totally WRONG!

The return address contains the address where the CPU RETURNS TO
when the current function is finished, not the end of the
current function!!!

The return address will be in the middle of another function, that CALLED
this one.

I think you missed a few points:

Inside Markend, The return address on the stack will be the address
after the call to Markend----which was purposely located at the end of
MoveMe. Then next few instructions after the call to
Markend will be the return from MoveMe (an RTS or equivalent with stack
cleanup).


Inside Markend, the return address on the stack will be
an address near the end of MoveMe. It is that address that
you need to save and make available for the computation
of the function length.

In assembly, the code in Moveme might look like this:

0900 MoveMe: sub.l #8, SP // make room for 8 bytes of locals
0904 test.l R14 // check the findend parameter in R14
0908 bne lbl1; // if true, just find end of function
.....
..... // all the work of Moveme goes here
..... // and gets executed when findend is zero
.....
1000 bra lbl2 // skip the markend call
1004 lbl1: bsr Markend
1008 lbl2: add.l #8, SP // clean up 8 bytes of local variables
1012 rts // return from MoveMe

When Markend is called at 1004, the address 1008 gets pushed on the
stack.

Inside Markend, you could do:

2040 Markend: Move SP, NearEnd // NearEnd is a global variable
2044 RTS

Someplace else, could do

MMLength = NearEnd - (unsigned long)&Moveme + 4;


When I was teaching introductory M68K assembly language, I used
to give exam problems with nested subroutine calls like this---some
with pushed local variables, and ask the students to show
the contents of the stack at some point in the function.
Those questions really separated the As from the Bs and
Cs!

NOTE: You have to make sure that your compiler doesn't convert
the Markend function to an inline sequence of instructions.


Mark Borgerson
 
M

Mark Borgerson

You can't in standard C, because functions are not contiguous objects.

Most environments have some way of placing a function in a special
section (using pragmas or things like __attribute__), and a possibility
to acquire position and size of that section (using linker magic).

In general, you cannot assume a function generates just a single blob of
assembly code in the ".text" sections. For example, functions containing
string or floating-point literals, or large switches, often generate
some data in ".rodata", static variables end up in ".data" or ".bss",
and if you're doing C++, you'll get some exception handling tables as well.
That's a real good point. If the OP's goal was just to move the
function code--and not necessarily execute it after movement, he
may not care whether the bytes in the .rodata, .data, or .bss
segments get moved.

If the function has to be moved and executed, then it better
to be able to access the data in the .rodata, .data and .bss
segements---or not use data in any of those segments that are
in flash memory.

If you're moving the function to RAM because you can't execute
from Flash while updating flash, the function being moved
could be written to use only variables and data in RAM. This
might be the case if the function being moved is the Flash
write routine.

Now that I think about it, I may use this approach in writing
a firmware update routine for the MSP430---which has the
restrictions mentioned above.


Mark Borgerson
 
B

Ben Pfaff

Mark Borgerson said:
At the expense of a few words of code and a parameter, you could do


int MoveMe(...., bool findend){
if(!findend){

// do all the stuff the function is supposed to do

} else Markend();

}


Where Markend is a function that pulls the return
address off the stack and stashes it somewhere
convenient. Markend may have to have some
assembly code. External code can then
subtract the function address from the address
stashed by Markend(), add a safety margin, and
know how many bytes to move to RAM.

You seem to be assuming that the compiler emits machine code that
is in the same order as the corresponding C code, i.e. that the
call to Markend() will occur at the end of MoveMe(). This is not
a good assumption.
 
M

Mark Borgerson

I think you missed a few points:

Inside Markend, The return address on the stack will be the address
after the call to Markend----which was purposely located at the end of
MoveMe. Then next few instructions after the call to
Markend will be the return from MoveMe (an RTS or equivalent with stack
cleanup).


Inside Markend, the return address on the stack will be
an address near the end of MoveMe. It is that address that
you need to save and make available for the computation
of the function length.

In assembly, the code in Moveme might look like this:

0900 MoveMe: sub.l #8, SP // make room for 8 bytes of locals
0904 test.l R14 // check the findend parameter in R14
0908 bne lbl1; // if true, just find end of function
....
.... // all the work of Moveme goes here
.... // and gets executed when findend is zero
....
1000 bra lbl2 // skip the markend call
1004 lbl1: bsr Markend
1008 lbl2: add.l #8, SP // clean up 8 bytes of local variables
1012 rts // return from MoveMe

When Markend is called at 1004, the address 1008 gets pushed on the
stack.

Inside Markend, you could do:

2040 Markend: Move SP, NearEnd // NearEnd is a global variable
2044 RTS

Yikes! I'll have to mark myself down 5 points!!!

That should be
2040 Markend: Move @SP, NearEnd // NearEnd is a global variable

I need to save the data pointed to by the stack pointer, not the
contents of the stack pointer itself.

Someplace else, could do

MMLength = NearEnd - (unsigned long)&Moveme + 4;


When I was teaching introductory M68K assembly language, I used
to give exam problems with nested subroutine calls like this---some
with pushed local variables, and ask the students to show
the contents of the stack at some point in the function.
Those questions really separated the As from the Bs and
Cs!

NOTE: You have to make sure that your compiler doesn't convert
the Markend function to an inline sequence of instructions.

I also realized that, on the MSP430, I don't even need the
function call. At the end of the function whose
length I want to determine, I simply add the assembly
language:

mov PC, NearEnd


Both these methods do require some assembly language and
are processor dependent. The compiler that I'm using on
the MSP430 (Imagecraft), allows inline assembly, so
the instruction above would be

asm("mov PC, %NearEnd\n"); // the % is used to reference a C
//variable


I'm reasonably confident that I can use this technique to move
a flash-write routine, but I will have to be very careful
about using global variables, since the compiler produces
PC relative references to global and static variables. Those
references will be hosed when the code is moved.

Mark Borgerson
 
M

Mark Borgerson

You seem to be assuming that the compiler emits machine code that
is in the same order as the corresponding C code, i.e. that the
call to Markend() will occur at the end of MoveMe(). This is not
a good assumption.
I'll paraphrase the old Reagan maxim: "assume, but verify". I
did a test run with an MSP-430 compiler and the call was at
the end. For that particular processor, as I later discovered
and noted in another post, you don't even need the
function call. You can save the contents of the PC at the
end of the function with a line of assembly.


This would certainly be a dangerous technique on a processor
with multi-threading and possible out-of-order execution.
I think it will work OK on the MSP430 that is the CPU where
I am working on a flash-burning routine.


Mark Borgerson
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top