representing functions with arguments in an abstract syntax tree

M

Melkor Ainur

Hello,

I'm attempting to build an interpreter for a pascal-like language.
Currently, I don't generate any assembly. Instead, I just build an
abstract syntax tree representing what I've parsed from a user's
program and use a C backend to evaluate the tree. I'm using Lex, Yacc
and C. Now, there are some functions that are built into this language
and others which are not. The problem I'm having is that I haven't
found a way to represent functions and handle generating their
arguments.

For example, an example code block in this language looks like:

if (cpu_temp > 100) then
begin
TurnOffWireless(interface1, interface2);
end

where both cpu_temp and TurnOffWireless are builtin functions.
currently, once I've parsed above, I build a datastructure looking
like below:

if_then_block_node->expression = expression_node
if_then_block_node->statements = function_node
expression_node->body.left = function_node
expression_node->body.right = number_node

where function_node looks like
node {
fptr a;
int numargs;
node *arglist;
} function_node;

and arglist looks like
node {
node *next;
enum argtype_t argtype;
union {
int num;
char *str;
void* ptr;
} body;
}


currently, my function_node just stores a function pointer (declared
as
typedef (void *)(*fptr)(void*) ) which i fill in with a C function for
the builtin. for example, cpu_temp(), calls a C function that uses a
system call to get the cpu temperature. when it comes to
TurnOffWireless(...), then I'm a bit stuck because of the need to
generate arguments for the builtin function. To be specific, how do
(or is it even possible) I write a generic function pointer that can
represent all my different functions. some that have multiple
promotable-arguments (chars, ints) and pointers. and then, how do I
Pass these functions their arguments? I currently suspect I can't do
that within C and that I might need to generate architecture specific
assembly to store the arguments on the stack and then call the builtin
function. Perhaps there is a GCC or compiler specific convention for
doing this?

That said, I'd suspect that other people have done or explored this
type of work. Googling for this didn't get me any real answers so I'm
hoping someone on these groups might point me towards some examples or
further reading that I could look at. I would think lanaguages like
perl/python/etc might be using similar concepts but trawling around
blindly in the source code was difficult. Anyone know how these
languages handle functions?

Last question, are there open-source blocks of code that I could reuse
for the purpose of this type of projects. IE: A reasonable parser that
generates a syntax tree/datastructure which I could then evaluate
using my choice of backend.

Any and all help/suggestions/recommendations/education are welcome.

Thanks,
Melkor
 
A

Alexander Bartolich

begin followup to Melkor Ainur:
[...] To be specific, how do (or is it even possible) I write a
generic function pointer that can represent all my different
functions. some that have multiple promotable-arguments (chars,
ints) and pointers. and then, how do I Pass these functions their
arguments? I currently suspect I can't do that within C

Relax. Some people call C a glorified portable assembler.
and that I might need to generate architecture specific assembly
to store the arguments on the stack and then call the builtin
function.

A rather primitive variation is to use the stuff in stdarg.h to
built a function like printf. The problem here is that you have
to pass all arguments in one single list of arguments, i.e.
you can't do a loop to push arguments like you would in assembly.

The natural workaround is to do your own stack handling, i.e.
allocate a chunk of memory and use memcpy to 'push' elements on
to the buffer.

Well, no. That would be the lame way to do it.
Real C programmers use pointer arithmetic.

char stack[0x100];
char* stack_ptr;

/* push an integer */
*(int*)stack_ptr = the_int;
stack_ptr += sizeof(int);

/* push a double */
*(double*)stack_ptr = the_double;
stack_ptr += sizeof(double);

Well, this works on i386, but with a performance penalty.
RISC CPUs outright refuse to access a double (8 bytes in size)
on an address that is not a multiple of 8.

So need to increment the stack pointer by the maximum size
of any arguments. Real C programmers use a union for that.

typedef union {
char c;
int i;
double d;
} stack_arg;

The stack then becomes a mere array of stack_arg.

stack_arg stack[32];
stack_arg* stack_ptr;

/* push an integer */
stack_ptr->i = the_int;
stack_ptr++;
Perhaps there is a GCC or compiler specific convention
for doing this?

Actually I don't understand your problem.
Could it be that you have very little programming experience?
 
C

Chris Torek

[NB: this article is cross-posted!]

... To be specific, how do (or is it even possible) I write a
generic function pointer that can represent all my different
functions. some that have multiple promotable-arguments
(chars, ints) and pointers. and then, how do I Pass these
functions their arguments? I currently suspect I can't do
that within C ...

Not in a generic fashion, no. This is actually a comp.lang.c FAQ
(15.13).

C *does* allow you to store any "pointer to function" value in
any object of type "pointer to function" regardless of the function's
return-value type and argument types. These "extra" types (return
value and arguments) *are* part of the function's "type signature"
(a phrase found more often in C++ than C, due to the usual C++
"name mangling" practice for function overloading), and *do* have
to be present at the actual call, but C guarantees that you can
cast a function-pointer value to some other function-pointer type
and store all the "useful" parts for recovery. Any "extra" bits
that might depend on the function's type signature will be added
back by a second cast, back to the original type. For instance:

extern double modf(double, double *);
void (*p)(void);
double x, y, z;
...
p = (void (*)(void))modf; /* legal and well-defined */
... code that does not modify p ...
/* assuming y has been set: */
x = ((double (*)(double, double *))p)(y, &z); /* calls modf() */

and that I might need to generate architecture specific
assembly to store the arguments on the stack and then call the builtin
function.

This is one way to do it. Note that some systems (even using C)
do not pass most arguments on a stack at all; on those machines
you will need architecture-specific code to store the arguments in
the appropriate argument registers (integer and/or floating-point,
e.g., SPARC and PowerPC).

There is another way to handle this, assuming that the set of
functions is fixed at interpreter-build-time (which is probably
true now but may not be part of your ultimate goal). Suppose
you have "interpreter level" functions f, g, and h and the kind
of data structures you described earlier. Then instead of calling
the C functions cf(), cg(), and ch() that implement f, g, and h
(but are, say, void cf(int), double cg(double *), and so on, i.e.,
have different type signatures from each other), have the interpreter's
core loop call functions do_f(), do_g(), and do_h(), which
read something like:

struct retinfo *do_f(struct arginfo *arginfo) {
static struct retinfo r = { TY_VOID };

/* f needs one int */
if (arginfo->numargs != 1 || arginfo->arglist->type != TY_INT)
panic("invalid call to do_f()");
f(arginfo->arglist->argunion.un_int); /* actually call f */
return &r;
}

In other words, instead of coming up with a *generic* shim
to fit between "interpreter engine" and "external C functions",
you can resort to a specific set of translation-layer shim
functions. The total number is bounded by the number of
interpreter built-ins. If many of the translation layer
functions all call C functions with a single common type,
you might even want to use one translation-layer shim for
all such functions, passing it an additional parameter giving
the target function, e.g.:

void *do_voidofint(struct funcall_info *info) {
static struct retinfo r = { TY_VOID };
struct arginfo *arginfo = info->arginfo;
struct arglist *al;
int (*fp)(int);

/* whatever function we call, it requires one int */
if (arginfo->numargs != 1 || (al = arginfo->arglist)->type != TY_INT)
panic("invalid call to do_voidofint()");

/* convert info->target_function to the right type */
fp = (int (*)(int))info->target_function;

/* and call the function, whatever it is */
fp(al->argunion.un_int);

return &r;
}

In this case, if you have 100 interpreter built-ins that call 100
different C functions that (collectively) have 9 different type
signatures, you only need 9 of these shims.
That said, I'd suspect that other people have done or explored this
type of work.

Steve Summit (the author of the comp.lang.c FAQ) has, for pretty
similar reasons (writing an interpreter).
 
M

Malcolm

Melkor Ainur said:
I write a generic function pointer that can
represent all my different functions. some that have multiple
promotable-arguments (chars, ints) and pointers. and then, how do I
Pass these functions their arguments? I currently suspect I can't do
that within C and that I might need to generate architecture specific
assembly to store the arguments on the stack and then call the builtin
function.

Your suspicion is right. You want to write a C function like this

void call_function(char *argtypes, void (*fptr)(), ...)
{
/* use information in argtypes to call fptr with arguments */
}

Unfortunately it can't be done in ANSI C. It's not too difficult to do
in assembly language however, as long as you know the calling
convention.
 
C

Chris F Clark

Melor Ainur asked:
how do (or is it even possible) I write a generic function pointer
that can represent all my different functions. some that have
multiple promotable-arguments (chars, ints) and pointers. and then,
how do I Pass these functions their arguments?

I suspect others may give a better and more specific answer to your
question, but here is some advice. I believe the solution pattern you
are looking for is the "trampoline" or "bridge".

Instead of putting the C built-in function in your table, you put a
"trampoline" or "bridge" function in your table. That function
matches whatever calling spec you have defined, quote: "currently, my
function_node just stores a function pointer (declared as typedef
(void *)(*fptr)(void*) )", extracts the arugments from the argument
list, quote: "and arglist looks like node {node *next; enum argtype_t
argtype; union {int num; char *str; void* ptr;} body;}" for each of
the arguments the built-in function expects, does any run-time
conversions and error checking necessary, and then calls the desired
built-in function. Upon the return of the built-in function, the code
may do the same steps in reverse to approrpiately deal with the return
value, pass by call-return arguments, etc.

Let us look at a bridge function your specific example, quote:
"TurnOffWireless(interface1, interface2);"

void *TurnOffWireless_Bridge(void *arglist_as_voidptr)
{
node *arglist = (node *)arglist_as_voidptr; // turn into useful type

if ( ! arglist ) {
error("TurnOffWireless: two arguments expected--none found");
}

// say interface1 should be an int (for example)
if ( *arglist->argtype != int_argtype_t_enum ) {
error("TurnOffWireless: 1st argument should be int and was not");
}
int interface1 = arglist->num;

arglist = arglist->next; // now deal with 2nd argument

// say interface2 should be a ptr (for example)
if ( *arglist->argtype != ptr_argtype_t_enum ) {
error("TurnOffWireless: 2nd argument should be ptr and was not");
}
void *interface2 = arglist->ptr;

// call actual function

TurnOffWireless(interface1, interface2);

//do any clean up work need on return side

}


Having written this, you put a pointer to TurnOffWireless_Bridge in
your table of function pointers not a pointer to TurnOffWireless.

------------------------------------------------------------------------

A second look at your question suggests, that you might be asking a
simpler question, quote: "how do (or is it even possible) I write a
generic function pointer that can represent all my different
functions?"

In that, case the answer is the lowly, but all powerful, C cast.
Simply cast your functions to the *type of the table* on putting them
into the table and cast them back to their appropriate function type,
upon extracting them from the table and use.

fptr function_table[] = {
(fptr)&cpu_temp, // cast to insert into table
(fptr)&TurnOffWireless,
// more functions
};

// code here which uses the TurnOffWireless function, slot 2 in table

void (*this_function)(int, void *) = // appropriately typed function call var
(void (*)(int, void *))function_table[2]; // cast to extract

int interface1 = arglist->num; // extract args from arglist as above
void *interface2 = arglist->next->ptr;

(*this_function)(interface1, interface2); // call function

------------------------------------------------------------------------

Note, in both cases, you know all the argument types for your built-in
functions. If you have a function like printf that is "variadic" (or
is it varadic?), you need some way of passing an appropriate argument
list. There are standard "macros" for va_list that handle this when
you call a variadic C function from C code, but to my knowledge there
aren't macros that allow you to build a va_list on-the-fly from "pure
cloth" (i.e. taking your arglist and converting it into a va_list).
However, for many machines the format of a va_list, is simply an array
of pointers to the actual arguments. Thus, you can build such a
structure yourself and pass it in with a reasonable expectation of
success, until you start porting your code.

A common thing to do in the bridge case is to make the bridge know
about calls with some fixed number of arguments (say 10), and then see
how many arguments are in the actual argument list and select with a
case statement the function call that passes those arguments in.


void *Variadic_Bridge(void *arglist_as_voidptr)
{
void *rslt;
node *arglist = (node *)arglist_as_voidptr; // turn into useful type

if ( ! arglist ) {
rslt = Variadic(); // no arguments
return rslt;
}

void *arg1 = arglist->ptr; // assume all args are pointers

arglist = arglist->next;

if ( ! arglist ) {
rslt = Variadic(arg1); // 1 argument
return rslt;
}

void *arg2 = arglist->ptr; // assume all args are pointers

arglist = arglist->next;

if ( ! arglist ) {
rslt = Variadic(arg1, arg2); // 2 arguments
return rslt;
}

// more code follows for 3 .. n arg cases

}

Finally, if you have a truly variadic function, you might again look
at printf for insight. Many compilers implement this via a doprint
function that processes 1 arugment at a time. So, that instead of
building up a call that has the correct number of arguments and
passing that as a whole to printf. The bridge function loops through
the argument list calling doprint on each one.

void *Variadic_Bridge(void *arglist_as_voidptr)
{
void *rslt;

Variadic_start(); // pre processing

node *arglist = (node *)arglist_as_voidptr; // turn into useful type

while ( arglist ) {
Variadic_arg(arglist->ptr); // handle each argument
arglist = arglist->next;
}

rslt = Variadic_end(); // post processing

return rslt;
}

------------------------------------------------------------------------

Note, these are just some simple sketches for what I believe you are
trying to do. In general, whatever you want to do, can probably be
done "efficiently" in C without resorting to assembly language (unless
you need access to some specific "hardware instruction").

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : (e-mail address removed)
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
19 Bronte Way #33M voice : (508) 435-5016
Marlboro, MA 01752 USA fax : (508) 251-2347 (24 hours)
 
J

jacob navia

Just do this:

typedef union _BuiltinUnion {
int (*FunctionNoArgs)(void);
Node *(Function1argReturnsNode)(Node *p);
...
etc
...
} BUILTIN_UNION; // Uppercase looks ugly but, why not.

Then, in your code you write:

u.FunctionNoArgs()
or
Node *a = u.Function1argReturnsNode(pNode);

Since the compiler has seen the prototypes, it will generate correctly
the
call.

Jacob Navia
http://www.cs.virginia.edu/~lcc-win32
 
U

Uli Kusterer

then I'm a bit stuck because of the need to
generate arguments for the builtin function. To be specific, how do
(or is it even possible) I write a generic function pointer that can
represent all my different functions. some that have multiple
promotable-arguments (chars, ints) and pointers. and then, how do I
Pass these functions their arguments?

There are some libraries that let you call functions with arbitrary
signatures. I'm successfully using libFFI (part of GCC, IIRC, but it's
not under the GPL), and I think there's also a libffcall or something
like that.

The "FF" part in both stands for "foreign functions".

Cheers,
-- M. Uli Kusterer
http://www.zathras.de
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top